Back to school I go. It's been a while since I took notes, studied for exams, and was made to take attendance (Really? Are we still forcing adults who willingly pay boatloads of money for an education, only to infantilize them by ensuring they are physically present at the start of class? I digress...).
Turns out there are some interesting things happening in modern Statistics. I'd like to pull back the curtain and share some lessons with you all. Today, I'm going to (re)introduce the concept of probability in a more modern way than you probably remember learning in school, beginning with the canonical example: the coin flip.
Say we flip a coin 3 times and get 3 heads. If I asked you if this coin is weighted or fair, what would you say? More specifically, what's the most likely estimate for the probability of getting a heads on a given toss? Most people would still say 50%. Sure I got 3 heads in row, but thats perfectly reasonable (1/8 chance in fact). Big deal, dataman.
And you'd be exactly right.
Classical statistics, however, would say the most likely value for the probability of heads is not 50%, but rather 100%. This method naively says, "Since all I've seen are heads, then I'm putting all my money on the next toss to be heads." This method would clearly lose all its money after a few rolls at the casino. Here is what's happening graphically:
This is about the point where rational, non-math nerds get off the stats train and we lose you guys forever. Because that doesn't make sense.
Even if the coin was biased, I'd need more than 3 measly tosses to convince me that it's 100% in favor of heads. Enter modern data analysis methods.
Let's pretend I was a reasonable person who saw a coin on the table and figured it was - like every other coin I've ever seen - not weighted. That is, I thought that the probability of landing heads was 50%. Then I flipped the coin 3 times and got 3 heads. This probably doesn't do much, if anything, to change my views. The coin is probably still fair. Maybe if I saw 6 heads in a row, or 10 heads in a row, I'd be more likely to believe something is fishy. So in light of my 3 tosses landing heads, my belief remains that the coin is most likely fair. In other words, I would NOT put all my money on heads.
If you understand the logic I applied above, then you basically understand Bayesian inference. It all boils down to*:
1) having a belief about something before it happens (in this case, believing strongly in a 50/50 fair coin)
2) seeing something happen (3 heads in row)
3) updating your belief about that thing in light of the evidence
Check it out:
It's pretty intuitive isn't it? It's also the hottest new thing in statistics, if such a thing exists. This type of statistical reasoning, which has been around for centuries but for technical reasons has not become mainstream until the last 20-30 years, underlies innovations such as image processing, recommendation systems, signal processing and spam filtering, to name but a few.
What a great time to be in data!
* the other key ingredient is defining a loss function but that's not necessary to drive the point home. Technically speaking, your belief (or action taken) should be that which minimizes the posterior expected loss.