Messing with d3
So naturally, I figured it would make sense to include some more interesting charts, maps, and force-directed graphs (huh?) into my blog.
As a warm up, here's a basic force-directed graph which is based on the classic student dataset that includes information such as if a student smokes, skips class, cheats on their partner, etc... I chose to start with a force-directed graph because:
- It's a neat way to visualize groups of interacting data and,
- I think it's badass
Note that if you're using Firefox the graphic does not work perfectly. Get Safari or Chrome.
Let's briefly chat about what's going on above.
Without getting into the details of force-directed graphs, you should note that each circle is called a node. Each line which connects a node is a link. The size of a ball represents the number of students that have partaken in that activity. So of the 900 students in the data set, most have a medium GPA , but only a few have cheated on an exam.
Since this blog is about data analysis, let's nerd it up a notch. By default, the links that connect each node are set to some thickness, tension and length. In order to better visually identify relationships between nodes, let's imbue the links with some data-attributes.
The thickness of each link represents the correlation between students belonging to different nodes. The thicker the link, the stronger the correlation. So people who cheated on their partners are likely to also have fake ids. The reverse is true of cigarette smokers and exam cheaters. Note that this is just the correlation between 2 variables viewed independently (not addressing the multi-collinearity which you correctly would have expected me to do). Let's give our brains a break. This guy did.
Another interesting feature of force-directed graphs is the strength of the link. The tighter a link is, the more related the nodes are. The looser a link, the less related the nodes. So you can pull apart (and drag together) fake id's and smoking a cigarette, but it's a bit harder to bring together smoking MJ and smoking cigarattes. Go ahead and try - I've spent many hours playing with these balls of fun. If you recall my conclusion about the relationship between smoking cigarettes and smoking MJ, this might not shock you.
And as it relates to length of a link, this is also based on the correlation between two variables. You might wonder why lowGPA, medGPA and highGPA are so close together. Well, no student can simultaneously have a highGPA and a lowGPA (or medGPA), so theoretically their correlation should be negative. That is, the more highGPA students there are, the less medGPA students (it's a zero-sum thing). So what happens to the length of a link? If you expected the length to be 0 or negative (whatever that means), then you'd be 100% correct! However, for the purposes of this example, I forced the correlations to be between 0 and 1 by normalizing such that the minimum and maximum correlations are 0 and 1 respectively. This is why the GPA nodes are bunched close together in the middle - they were the most negative and therefore are closest to 0.
Hope you enjoyed!