Jump to Results We are anxious to share some pieces that are directly relevant to today’s NBA players and teams. In this post, we explain some of our methods so that you can better follow along.
At the root of any compelling analytics story is an interesting, and often large, data set. So before we started mapping stories or counting stats, we first had to cull data on NBA games. As much data as we could electronically grab.We started in a naive way: Scraping the HTML spit out by various sports websites (e.g. NBA.com
, Sports Illustrated
) and breaking it apart to find all the meaningful numbers and text. The Python modules re, BeautifulSoup, & urllib2 were adequate for the job. But in the end the approach was time-consuming and easily broken by missing data and other inconsistencies.A little bit of Internet sleuthing brought some exciting news when it showed just how deeply the NBA has embraced the era of big data: not only does the Association publish loads of data online, they also use the modern and elegant JSON
format. And to top if off, they’ve developed an API for easy querying!So we built a lightweight script savejson.py
, which collects the following types of data for each NBA game:
- Box Scores
- Shot Charts
- SportVu motion-capture frames (optional)
Since the NBA has formatted and published nearly 20 years worth of JSON, games are available as far back as the 1996-97 season. This means that there are plenty of historical questions one can ask with this plethora of data. However for upcoming posts, we’ll be focusing on analyzing the 2014-15 regular season and, as much as possible, its associated motion capture data.
Using SportVu motion capture data scraped from NBA.com, we can map a whole lot of movements and ask how each one effects a basketball game. For instance, with this data and the script svmovie.py
, we can make movies like this one: The first rebound of the 2014-15 season happens in real-time as Evan Fournier cleans up a missed jumper by Anthony Davis.
We are truly fascinated by NBA’s second (and sometimes third and fourth) chances: rebounds. These are vitally important to the game but understudied using motion-capture data, and we plan to change that soon as we continue to roll out posts.