I have procrastinated long enough on writing this blog post, so today is the day I finally get it off my queue. So, thank you everyone for your patience! This article will explore the data and technical steps needed to replicate the shot chart data visualization seen above in the featured image. A second blog post will focus more on the overall design and user experience.
Prior to reading this, I encourage you to explore the dashboard again, Why does Stephen Curry Hate Mid-Range Jump Shots?. I made several improvements and other small enhancements after the original submission. In addition, this renewed familiarity should help you with understanding the different steps in the instructions below.
If you are tracking my #MakeoverMonday work, then you know I am going through all of the 2016 data sets, while still trying to keep up with 2017. One of my Tableau community goals is to add a new wrinkle/improvement to each work I share with the Tableau community, and the Steph Curry dashboard achieved this by using several new techniques. My direction was clear early in the process, where three areas of focus guided my design decisions:
- Fun interactivity
- Intuitive navigation
- Balanced design
Before thinking too much about these big picture goals, I needed to endure through a significant amount of data prep work.
Hack #1, The Grid Technique
Many Tableau enthusiasts made tremendous progress with spatial hex heatmaps, @sirvizalot and @aeldridge spring to mind, but I never found these complex calculations easy to work with. So, I tried a data solution to the same problem.
This data set from #MakeoverMonday seemed to be an excellent opportunity for me to test this more advanced geospatial method against a custom background setup. For our purposes, the two main benefits of defining a grid in the data set are aggregating dense data and supporting cross-data/chart interactivity. A number of additional benefits exist from using this technique, but these will not be explored in this blog post.
Please be aware that steps 1-6 are completed in Excel.
Step 1: Determine your x/y bounding box
Before you can build a grid, you need to know the range of x/y values, which helps determine the number and size of each box. So, whichever data set you are using, then you need to find the max/min x and max/min y.
Your results with this data set should look something like this.
Step 2: Determining the size of each grid square
This step requires the creation of a grid box, which can then be used to build a grid row. In the above table, we start by taking a combination of a max/min x with a max/min y. To me, it was my preference to increment the x-values before trying to increment the y-values.
Before creating a rows, you need to decide the optimal size for each square. Larger squares will result in bigger aggregations, while smaller squares are more detailed representations of the data. Guessing and testing are typically needed here, but it should be consistently applied for whatever size is selected. The size used in this example is a 10 x 10 square.
To create your first square, you need three pieces of information:
- path id
We will use -250 for the starting-x and -50 for the starting-y. Four plots need to be created with a corresponding path id, and each combination needs to be unique. In total, four different numbers are used to create four unique plots for each grid square. You should have something similar to the picture below.
Great work! We have our first grid square!
Step 3: Creating the first grid row
Now let’s duplicate this square by adding 10 to the x-coordinate and keeping the y-coordinate the same. Essentially, row 6 should use =A2+10 for column A, =B2 for column B, and =C2 for column C. Once row 6 is completed, then you just need to drag the calculations down until the x-value is past the max x-coordinate of 240. This should take you to somewhere near row 201.
Our first row of grid squares is now defined.
Step 4: Creating the full grid
After the first row is defined, we need to make a slight adjustment to the logic before continuing. We need to simply reverse the rules for the x and y coordinates.
For row 202, column A will now use =A2, column B =B2+10, and column C should mirror column A with =C2. We are now ready to increment until the max y is reached. You should be around row 15,401 with an x-coordinate of 240 and y-coordinate of 720.
The full grid is now defined.
Step 5: Develop an identifier for each coordinate
Create two columns with labels of x-id and y-id, as seen below:
Where path id equals one, multiply x by 1,000 for x-id and y by 1,000 for y-id. If path id is greater than one, then the x-id and y-id values should match the values for the first point, as seen below:
Next, copy these columns as values to remove the formulas, and change the data type to text.
Step 6: Develop a unique identifier for each grid square
Create a new column called box id with the following logic:
Your grid data definitions are now complete.
Step 7: Create aggregations, based on the grid increments
Using the #MakeoverMonday data set this time, we need to create the same x-id and y-id columns with the same rules.
So, divide x and y by 10, wrap with an integer data type, and then multiply by 10,000.
Copy and paste these as values to remove formulas and convert to a text data type.
Create another box id column and concatenate the x-id and y-id.
Step 8: Create an inner-join between these data files on box id.
Open Tableau and connect to the grid map and shot chart data set.
(Optional) creating a box id isn’t strictly necessary, you could join on x-id and y-id, but I typically like to work with a single unique identifier.
(Warning) make sure the data type is text/string for both unique identifiers.
Step 9: Create the grid in a Tableau worksheet
Hooray! We get to use Tableau again!
In a blank worksheet, place [X] from the grid data file onto the columns shelf and place [Y] from the grid data file onto the rows shelf. You should see something similar to the picture below:
Next, select the polygons marks type. Put [Path Id] on the path marks card and drop [Box Id] on detail.
Thanks to our data prep, the heat map is already good to go. Place SUM([Number of Records]) on the color marks card.
Step 10: Embed the grid into supporting histograms.
First, place [X-Id] on the columns shelf. Next, create a calculation called “Shot Count Calculation” that divides [Number of Records] by 4, and place this on the rows shelf.
Warning! Be aware the inner-join to the grid coordinates duplicates records, and it’s easy to forget. I made this same mistake when originally publishing this dashboard. This is why the number of records needs to be divided by 4.
Next, place [Box Id] on the details marks card and sort by [Number of Records] ascending. Lastly, reverse the scale, so the histogram is upside down. This embeds the grid’s definitions into the histogram, which now supports dashboard interactivity between the shot chart and the histogram. My example uses highlighting actions, but filtering actions are also possible.
You should have something similar to the example below:
Hack #2, Dual-Color Scheme for Continuous Measures
This is a pretty simple technique.
First, go back to the worksheet with the heat map. While holding CTRL, left click and drag [Y] to the right on the rows shelf. You should now see two separate charts.
For the top chart, place [Shot Type] on color. Hide three-point shots, then place this [Shot Type] on detail.
Next, duplicate [Shot Type]. On the lower chart, place [Shot Type (copy)] on the color marks card. Hide two-point shots, and drag [Shot Type (copy)] onto the details card.
Place [Number of Records] on the color marks card in the top chart (two-point shots) and select a blue gradient. Next, duplicate [Number of Records]. Place [Number of Records (copy)] on the color marks card for the bottom chart (three-point shots) and select a red-gold gradient.
Dual axis these grid maps and synchronize. You should have something that looks similar to below:
Hack #3, Blended Discrete and Continuous Color Scheme
This is a similar technique to the last example.
First, go back to the histogram worksheet. While holding CTRL, left-click and drag your [Shot Count Calculation] to the right on the rows self. Once again, you should see two charts. As a reminder, your shot count calculation should be [Number of Records] / 4 for this specific chart.
Place the shot count calculation onto the color marks card for the top chart. Choose a standard grey gradient with 5 stepped colors. Use white borders.
Next, place [Shot Type] on the color marks card for the bottom chart. Use a blue color for two-point shots and a yellow color for three-point shots. Reverse the order of 2PT and 3PT shots in the color legend. Decrease opacity to 55%.
Dual axis the histograms and synchronize. Make sure the chart type stays as a bar chart.
You now have an interesting discrete and continuous color mixture, which should look similar to the picture below:
Once these three hacks were created, the rest of the dashboard used more familiar techniques. A follow-up blog post will show how I incorporated these different elements to support my original design goals. If you have questions or other feedback, then feel free to reach out to me on Twitter @DavidAKrupp. Thanks!