The GIF below is made up for 72 still images taken from different angles.
What would you describe is the biggest difference between all of the images? Maybe the location of the hearts? Or the horizontal length of the pig in each image? Nah, it's the rotation - each image is taken with the pig at a different angle.
Note that each image is made up a grid of 400 x 400 pixels. So, theoretically, there are 160,000 different ways (aka dimensions) possible to describe each image. Now clearly we can save dimensions by removing all the black pixels - they don't add much value in distinguishing between pigs. Notwithstanding, there are thousands of meaningful pixels in each image which describe it. And yet, we (humans) can identify the difference between each image with a simple observation - its rotation. To train a computer to be able to make this observation however is not trivial.
Being able to reduce the complexity of an image from 160,000 dimensions to just a few is a powerful tool in any data scientists' toolbox, and today we'll briefly describe a technique called Isomap - a modern, non-linear dimension reduction algorithm.
In general, we can reduce dimensionality in a dataset by removing dimensions that don't add much value. In the images above, we can easily remove almost all of the surrounding black pixels and not lose any meaningful information about the image. That's one approach. Another is to use SVD to compress the image. However, that doesn't help us in differentiating between the images.
In general, one way to think about differentiating is to identify data points that are similar (or dissimilar) to each other.
Usually we measure similarity by the straight-line distance between points. The shorter the line, the closer (and more similar) the points. Methods that use this logic are known as linear dimension reduction methods.
Points A and B are more similar than points A and C because they are closer (in Euclidean distance).
But what happens when data is clustered in such a way that this type of distance metric fails?
Clearly these two points are less similar than their straight line distance would suggest. So linear dimension reduction methods are out.
Instead we need non-linear methods. One example is Isomap. This technique allows us to learn shapes (aka manifolds) like the swiss roll above. Without getting into the detail, Isomap looks at the swiss roll and says:
- This object is plotted in three dimensions, but intrinsically, it's really just 1 dimensional.
- Instead of connecting any two points with a straight line, we only look at the neighborhood of each point to determine which points are close and which are far.
- Neighborhood is defined as those points that are really close to one another.
- In this manner, we can connect points that are close to each other without "jumping" across. In a sense, we can unroll it.
After unrolling it, you can use linear dimension reduction techniques (which are in general less computationally intensive) to find the lower dimensions. These lower dimensions are the intrinsic - most meaningful - dimensions of the shape.
But the funny thing is, we don't know the intrinsic dimension. For the swiss roll, it's 1. But when we have 1000's of dimensions, we can't easily visualize it. We just assume that there is a lower dimension that better explains the data. When looking at images (which are 1000's of pixels/dimensions), sometimes after we reduce the dimensionality, we can uncover that intrinsic dimension.
Let's apply this concept to the piggy bank data set (72 images total):
Above we plotted each image onto just 2 dimensions. Each dimension "explains" something about the images. Can you guess what?
- The x-axis differentiates between the pig facing towards or away from us.
- The y-axis differentiates between left and right facing.
In fact, I really only need 1 dimension instead of 2 to capture this difference. Down from over 160K!
Since the major difference between the images is a rotation, it only takes 1 dimension to describe this difference. But the crux is that we never told Isomap that the key difference between these images is a rotation.
We just inputted a list of pixels and it figured out that the single biggest difference among these groups of pixels is somehow a rotation of the image.
That's pretty cool!