Snakes and Ladders: The nerdification of a children's favorite
Welcome to the second part of my multi-part series on statistics with a modern twist. Today I'm going to introduce a very powerful and widely used method to model complex, random phenomena, known as Stochastic Processes. Naturally we begin with Snakes and Ladders (S&L) (or Chutes and Ladders for my American friends).
In S&L, the object of the game is to roll a die and move that many squares up the board until you reach the 100th square, and the game is won. If you land on the foot of a ladder, you go straight up to the top. If you land on the head of a snake, you presumably get swallowed and excreted at the end of the snake (perhaps this is why they changed it to Chutes and Ladders?).
Simple enough. So, assuming you're playing alone, how many rolls do you think, on average, a typical one player game takes? 20 rolls? 50 rolls? 100 rolls? Is it even possible to get an average?
Before rushing to answer, let's take a minute to think about what kind of model we can build to describe this game.
The fate of your next turn is only dependent on your current roll and where you are rolling from. In other words, your next "state" (i.e. your square) in the game is only related to you current "state". Over here in the stats community, we say S&L is memory-less. You know what else is memory-less?
A Markov Chain.
Markov Chains basically say that the next state (your square) of a random process is only dependent (via rolling a dice) on its current state. Pretty simple stuff. But powerful.
A key component in any Markov Chain is whats called the transition matrix, which describes with what probability you transition from ones state to another.
Read left to right, this says that the probability of rain tomorrow, given that it rained today is 1/2. Given that it was sunny today, it will be cloudy tomorrow with probability 3/8. Ok, but reading off numbers in an admittedly fabricated table is not very interesting.
What's more interesting is that we can figure out the probability of rain in 10 days, given that it rained today. We just multiply this matrix by itself 9 times. So, given that it rained today, it will rain in 10 days with probability 42%. You can actually do a lot more than that, but let's keep this light.
In the case of S&L, we can set up a 100 x 100 transition matrix with probabilities of going from any given square to any other given square in one roll. For example, if you're on square 20, (ignoring snakes or ladders) you can land on squares 21, 22, ..., 26 with probability 1/6 each. Guess what your probability of going from 20 to 80 is in one roll? It's 0. And from 20 to 10, is also 0. So you get a big matrix with a lot of 0's (we call this sparse), but nonetheless it completely describes the game for a single roll.
This heatmap indicates where you are most likely to be after a certain number of rolls. After 1 roll, you're equally likely to be on squares 1 through 6 - hence the same color hue at the beginning. The more you roll, the more you progress up the board until you reach the end and stay there.
Turns out the expected number of rolls it takes to finish this (boring) game is 100/3.5 = 29 because there are 100 squares and each roll takes us an average of 3.5 squares forward.
Figuring out the averages for the true snakes and ladders scenario is a bit trickier, but not much. All we have to do is figure out how many times we hit square 100 after rolling once, twice, 3 times, etc...
Eventually, we'll find that 50% of the time, we hit 100 after a certain number of rolls. In the case above, it's just 29. Here's a more technical way to see it:
Keen among you will probably have realized that there are no snakes or ladders on this board. Let's see what happens when we change that. Here's what happens when we only have ladders, only snakes, and finally, both snakes and ladders.
- On the original snakes and ladders game, we hit 50% around 30 rolls. To be exact, it takes 32 rolls on average
- The snakes only version ends around 82 rolls, on average, but it has what we call a "long tail" - it'll take much longer to guarantee that you finish the game. You can see the impact of the snakes but the way the heatmap stalls mid way through.
So what did we learn today?
- Markov Chains are used to model phenomena that is memory-less
- You can multiply matrices a bunch of times to figure out probabilities far into the future
- Animated heatmaps make cool visuals
- Snakes and Ladders can be made more nerdy than you ever imagined
So other than simple children's games, where are Markov Chains used in the real world? The most well known application of Markov Chains is in something called Monte Carlo Markov Chain (MCMC) Simulation. This is a very powerful type of analysis method in which we can figure out the distribution of many kinds of random phenomena in a way that wasn't possible just 30-40 years ago (thanks computers!). This allows us to do text analysis, signal processing, prediction, A/B testing, etc, ... So know you know one part of the machinery that powers many kinds of large scale, computer based algorithms.
Until next time!