I’ve always wondered why high schools bury students in calculus instead of teaching them the beauty of statistics and probability. A tiny fraction of these students will actually use calculus in their lives, but statistics are for everyone. And without clear statistical principles in our head, we get intimidated by numbers. I thought a small write-up on probability and statistics that touches upon some of their arcane concepts without sounding too technical was in order.
Randomized response
To reel you guys in, let me begin with a real world application of probability, nothing obscure. Just an interesting use for the concept.
Suppose you’re conducting a survey where you ask people whether they cheated on their spouse. In spite of repeated assurances of confidentiality, the participants could never be sure that their data wouldn’t be traced back to them. After all, it’s on a piece of paper or a file on some computer. Who’s to say some disgruntled employee wouldn’t release them to the world?
The randomized response method, that’s who. It allows us to obtain our data without causing a rip tide of punitive alimonies. Here’s how it goes. When the participant comes to a yes/no question about a sensitive issue, he flips a coin. If the coin comes up heads, he fills Yes. If it comes up tails, he answers truthfully. It’s that simple. No one watches him flip the coin, so his motivations for filling Yes are secret.
We know that if we flip a coin enough times, we’ll get heads roughly half the time. So let’s say 1000 people participated in the survey, and assume that 700 of them answered Yes to the damning question. And 300 answered No. There is only one reason to answer No—you didn’t cheat on your spouse. This means every person who answered No got tails on the coin flip. That means an equal number of people must have gotten heads (300). So, out of the 700 who answered Yes, 300 did so because of the coin flip, which leaves 400 people who definitely cheated—their spouses are none the wiser.
Bayes’ theorem
Thomas Bayes blew our minds on conditional probability, you know, those icky questions like, “If it rains tomorrow, what’s the probability that the bus will be late?” The Bayes theorem, if one’s unfamiliar with it, gives us some counter-intuitive answers to questions that we would otherwise take for granted.
Say 1% of women over forty have breast cancer. Assume that 95% of women with breast cancer will test positive for it. Also assume that 5% of those without breast cancer will also test positive—false alarms. If a woman tests positive, what’s the probability that she actually has breast cancer? 95%? 90%? It’s at least 50%, right? It’s actually about 16%, which, incidentally is the percentage of doctors who got this question right.
Whenever an event we test for is present in a small fraction of the population, however precise the test, any true positive will be drowned in the absolute number of false alarms. Welcome to the world of Bayesian probability. Simply put, if 10000 women were tested for breast cancer, and 100 of them actually have it, 95 of the 100 will test positive. And out of the 9900 who don’t have breast cancer, 5% or 495 will test positive. This means, for every 10000 tested, 590 will test positive, of which only 95 will actually have breast cancer—16%.
This is why doctors re-test the samples that test positive. In this case, if a sample tests positive twice, the probability of cancer rises to 78%. Fun, right?
Confidence limits and statistical significance
Whenever those of us in the science fields hear the word significant, we go, “Oh yeah? Prove it.” When we say ‘significant’ we mean statistically significant with a given probability value. Even those outside the sciences hear of confidence limits and statements like “We know this with 95% confidence…” So what does it mean to have statistically significant information or to have confidence in it?
If we conclude something from a study with 95% confidence, we mean that we allow for a 5% chance that our results were sheer dumb luck. In other words, even though scientific research follows an innocent until proven guilty principle, if we kill 5 out of every 100 innocent people, we call it a good day.
To elucidate this, let’s say I gave you a coin and told you that it favors heads, i.e. if flipped enough times, it will give more heads than tails. It’s up to you, the skeptic, to test it instead of just believing me.
So you flip the coin once and get heads. Eureka! This coin favors heads! Not so fast…there was a 50% chance of getting heads by pure chance anyway. At best, you can state with 50% confidence that this coin favors heads. So you ante up again and re-flip this coin. Another heads. Don’t call Stockholm just yet. There’s now a 50% of 50% i.e. 25% probability that these two results were pure chance. But your confidence has increased now. You can state with 75% surety that there’s some funny business with the coin.
You flip it again. Another heads. Now your confidence has gone up to 88%.
Flip again. Another heads? You’re now 94% confident that the coin is biased. With the next flip, your confidence rises to 97%, which is more than enough for most scientific experiments.
Of course, I give this example to explain the intuition behind the % confidence concept. This experiment takes for granted a lot of things that change with every flip—how high you flip, air resistance, which side faces up when you flip, etc. In reality, you don’t accuse a coin of bias after five flips.
Expectation, Law of large numbers, and the Gambler’s Fallacy
Consider an unbiased six-faced die with the faces numbered 1 through 6. If you roll a 1, you get $1 and if you roll a 2, you get $2…you get the idea. We all know that the probability of landing any particular number is 1/6. If you threw enough times, what’s the average amount of money you’d make per roll?
Expectation simply means the probability of an event multiplied by the reward or punishment associated with that event. There’s a one-in-six chance of rolling any given number.
The law of large numbers says that if you roll this die enough times, your expectation per roll winds up around $3.5. Every number is equally likely, so the reward expected from any particular roll is the average of the rewards for each number—
(1/6 X 1) + (1/6 X 2) + (1/6 X 3) + (1/6 X 4) + (1/6 X 5) + (1/6 X 6)
= (1+2+3+4+5+6)/6
= 21/6 or $3.5
We must remember that this averaging out happens over many many rolls…nearly approaching infinity. If we ignore this, we commit what’s known as the gambler’s fallacy. Every number on the die is equally likely, and each roll is independent of any other. If you rolled 1, 2, and 3 in succession, it doesn’t mean that 4, 5, and 6 are due. Every roll has 1/6 likelihood of yielding a particular number. Yes, if you rolled the die 60000 times, you’ll most likely end up with equal rolls for each number.
People who buy lottery tickets based on numbers that are due are fooling themselves. Then again, people who expect to make a lot of money on lottery tickets wouldn’t be swayed by statistics and probability anyway.
So there it is. A small primer on statistics and probability with some real-world examples. Some of this is oversimplified here and more nuanced in actuality. Some of the intuitive explanations are based on how I understand them and subject to further exposition.
Related articles
- Statistics or Calculus? Do both! (learnandteachstatistics.wordpress.com)
- Yudkowsky — Bayes theorem
- An Intuitive (and short) explanation of Bayes Theorem (betterexplained.com)
- Wikipedia entry on law of large numbers
OMG! Never thought that He had helped all the players of Dice in the past till I read this.
Haha! The title was a play on Einstein’s comment on quantum mechanics—”God doesn’t play dice.” thanks for reading!
Nice write-up…loved the breast cancer calculations!
I first toyed with the idea of giving some other example because it’s hard to get a guy to focus on probability when the subject of boobs comes up, but that’s life.
I love shooting craps, where the odds depend on a pair of dice. We know there are more ways to come up with a 7 than any other number, but that’s the thrill of the game. And, the way craps players make their money is by betting that other numbers will come up first. That’s why it’s called gambling.
Fair enough. When you’re gambling, you’re betting that something is likely whereas the person who takes your bet is betting that it isn’t as likely. Thanks for commenting.
So where were you when I was slogging through statistics in college? This would have been a big help!
Haha! For some reason, statistics is taught as this dry subject, when, given proper attention, it comes alive!
I remember some of this from psychology stats but I still got lost on a couple! I’m [almost] tempted to get those old text books out. Very interesting post Bharat!
Thanks Meeka! I’ve been meaning to write a simple primer on the meanings of a few statistical concepts—in a humorous form. Finally I’ve gotten around to it. Maybe I’ll write another one some other time.
I hope you do. Stats are like light bulbs – we all need and use them without having any idea how they work!
Bharat,
I actually learned a lot from your post, including:
1. It’s time for my annual mammogram.
2. I’m super-lucky to have gotten college credit for my pre-calc class in high school, thereby exempting me from having to take calculus in college. While all of my freshman friends were crying over their failing calc grades, I was testing my super-sweet fake ID at the bars.
3. Because I’m super-lucky, it’s time for another trip to Vegas, where, like Kitchen Slattern, I’ll park my hide at the craps table for the entire trip. But first I’ve gotta get that mammogram.
Thanks for enlightening me on a cold winter’s night,
Stacie
Well, cancer awareness wasn’t a primary goal of this post, but it’s always a worthwhile goal.
The 21-year drinking age in the USA just begs for fake-IDs.
Enjoy Vegas. Remember what they say about life in Deconstructing Harry,
True. Especially when you’re playing roulette. =p
okay, i had an inkling you were brilliant. thanks for proving it so thoroughly. lool. and i think if you were my math teacher way back, maybe i would’ve hated it less. in other words, why can’t math teachers use more real world examples. it makes it all the more interesting. lastly, have you read freakonomics? i sorta loved that book/ movie and your post reminded me of it. oxoxo, sm
Thanks! You’re too kind, SM. And I agree–the world is full of mediocre math teachers who complicate the subject and turn more and more students away from it. While that’s okay for abstract math, stuff like statistics have to be reachable to all. We all need it.
Sure, I’ve read Freakonomics and Superfreakonomics. I even listen to their podcast every now and then. It’s awesome. And I can’t describe how honored I feel that my post reminded you of those great works.