Dice Data: How Tech Skills Connect
I wanted to follow up on this post by Simon Hughes, our chief data scientist, and share an experimental visualization we created from Simon’s work. It offers a graphical way to explore the relationships amongst skills. Let me first describe how to use it, and then what went into building it. Every circle (or node) of the visualization represents a skill. Hovering over a node will reveal the skill and its associations. Colors designate different “communities” that coalesce around skills; for example, the sky-blue cluster (bottom left) is mostly composed of skills related to customer/tech support, whereas the light green group (top right) includes “Big Data” skills: Try clicking “Java”, for example, and notice how many other skills accompany it (a high-degree node, as graph theory would call it). As a popular skill, it appears to be present in many communities: Big Data, Oracle Database, System Administration, Automation/Testing, and (of course) Web and Software Development. You may or may not agree with some relationships, but keep in mind, it was all generated in an automatic way by computer code, untouched by a human. For those interested in how we built this visualization, it involved multiple steps. We started with Gephi, an open-source network analysis and visualization software package, by importing a pair-wise comma-separated list of skills and their similarity scores (as Simon described in his article) and running a number of analyses: Force Atlas layout to draw a force-directed graph, Avg. Path Length to calculate the Betweenness Centrality that determines the size of a node, and finally Modularity to detect communities of skills (again, color-coded in the visualization). Once the graph was the way we wanted, we exported it as an XML graph file (GEXF) and converted it to JSON format, with two sets of elements: Nodes and links. It was then a matter of leveraging D3 to visualize the data as a network. We would love to hear your feedback and questions.