Messing with d3

Messing with d3

I recently began messing around with d3.js, which is a Javascript library that makes it possible to create lovely visualizations based on data. In fact, D3 stands for Data Driven Documents. More likely than not, many of the cool graphics and interactive visualizations that you've encountered online, most notably those that are driven by complex data, are powered in part by d3.

So naturally, I figured it would make sense to include some more interesting charts, maps, and force-directed graphs (huh?) into my blog.

Implementing d3 graphics is made easier (although not strictly required) by knowledge of HTML, CSS and most importantly, Javascript. If none of that means anything to you, that's okay. Just know that it's part of the d3 experience. For me this is tough since I don't know any of these languages. I'm much more comfortable performing a singular value decomposition, than making a fancy design. Given that, most of my visualizations might be pedestrian compared to the pros, but hopefully they'll evolve over time.

As a warm up, here's a basic force-directed graph which is based on the classic student dataset that includes information such as if a student smokes, skips class, cheats on their partner, etc... I chose to start with a force-directed graph because:

  1. It's a neat way to visualize groups of interacting data and,
  2. I think it's badass

Note that if you're using Firefox the graphic does not work perfectly. Get Safari or Chrome.

Let's briefly chat about what's going on above.

Without getting into the details of force-directed graphs, you should note that each circle is called a node. Each line which connects a node is a link. The size of a ball represents the number of students that have partaken in that activity. So of the 900 students in the data set, most have a medium GPA , but only a few have cheated on an exam.

Since this blog is about data analysis, let's nerd it up a notch. By default, the links that connect each node are set to some thickness, tension and length. In order to better visually identify relationships between nodes, let's imbue the links with some data-attributes.

The thickness of each link represents the correlation between students belonging to different nodes. The thicker the link, the stronger the correlation. So people who cheated on their partners are likely to also have fake ids. The reverse is true of cigarette smokers and exam cheaters. Note that this is just the correlation between 2 variables viewed independently (not addressing the multi-collinearity which you correctly would have expected me to do). Let's give our brains a break. This guy did.

Another interesting feature of force-directed graphs is the strength of the link. The tighter a link is, the more related the nodes are. The looser a link, the less related the nodes. So you can pull apart (and drag together) fake id's and smoking a cigarette, but it's a bit harder to bring together smoking MJ and smoking cigarattes. Go ahead and try - I've spent many hours playing with these balls of fun. If you recall my conclusion about the relationship between smoking cigarettes and smoking MJ, this might not shock you.

And as it relates to length of a link, this is also based on the correlation between two variables. You might wonder why lowGPA, medGPA and highGPA are so close together. Well, no student can simultaneously have a highGPA and a lowGPA (or medGPA), so theoretically their correlation should be negative. That is, the more highGPA students there are, the less medGPA students (it's a zero-sum thing). So what happens to the length of a link? If you expected the length to be 0 or negative (whatever that means), then you'd be 100% correct! However, for the purposes of this example, I forced the correlations to be between 0 and 1 by normalizing such that the minimum and maximum correlations are 0 and 1 respectively. This is why the GPA nodes are bunched close together in the middle - they were the most negative and therefore are closest to 0.

There's more that can be done with this, but this is a first stab. I'll be back with more magic in the coming posts! As usual, here's a link to the code (which you could also get by opening the Javascript console).

Hope you enjoyed!

Want to work with me? Count to 3.

Want to work with me? Count to 3.

Understanding U.S hospital billing practices - part 2 - chest pain

Understanding U.S hospital billing practices - part 2 - chest pain