Thursday, January 1, 2015

How to Think Like a Basketball Analyst

When it comes to trying to derive strategic insights from basketball statistics, the approach is very similar to the role of a business analyst. A good analyst will make great efforts to understand (a) the industry (in this case, the game of basketball and the ins and outs how it is played) (b) the goals of the organization (in this case, most likely a team) and (c) the exact purpose of the task at hand - a clear understanding of the questions that need to be answered in the context of the industry and how they tie back to the goals of the organization (for example: answering the question of who are the best free agent rebounders in the NBA, so that a team can use that information to help bolster their rebounding capabilities)

Sports analyses are never easy, especially with basketball because of its incredible number of factors in play at any given moment. Thus the goal of basketball the analyst has basically become to try and account for all of these factors in the process of creating an analysis that will allow him/her to accurately answer a business question.

A good example of this is ESPN's Real Plus-Minus metric, a variant on the traditional Plus-Minus (+/-) you see in a typical box score. Plus-Minus measures the net score of the game when a player is on the court - so a player who enters the game with his team losing 10-3 and leaves with the score tied at 17 was +7 because his team closed a seven point gap. Conversely, a player whose team is up 10-3 and leaves with the score tied at 17 will be -7. But this metric gave analysts pause because it was not clear what business question it answered. Does it determine who the best players are? Not necessarily, because certain players could be beneficiaries of their teammates performance and that could inflate an individuals plus-minus (nobody is going to ever make the claim that Andrew Bogut is a better player than LeBron James, but that's what +/- shows this year). Also, since the amount of minutes and/or possessions played will differ from player to player, it makes it tough to directly compare two players in this category.

Analysts did not feel this metric could accurately answer a question such as "who are the best players in the league". So a group led by Jeremais Engelmann (formerly) of the Phoenix Suns developed the Real Plus-Minus metric (which had a sister metric in Adjusted Plus Minus to work off of) to try and provide some more clarity. The program that calculates this metric looks at hundreds of thousands of possessions to try and determine the "real" performance (meaning adjusted, think real inflation vs. nominal inflation) of a player. For example, it could look at how an average player (per the eye test) who often plays alongside superstars (such as Matt Bonner on the Spurs, playing alongside Tim Duncan, Manu Ginobili and Tony Parker) would fare in the plus-minus category if surrounded by average players.

Now, this is not a perfect metric either for various reasons I do not need to go into here. But I think it does a good job of attempting to answer the business question in a way that will help contribute information towards answering questions that will help the organization achieve its goals.

In my mind, a proper analysis must follow these steps:
1) Start with a question, making sure to understand the context of the question
2) Use industry knowledge to develop a method for answering the question
3) Create data gathering process
4) Determine appropriate visualization
5) Develop an effective presentation
6) Determine next steps such as follow-on research

Here's an exercise in this:

1) Start with a question, making sure to understand the context of the question
Question from coach Erik Spoelstra of the Miami Heat to the analyst: As a coach I think we need help on difference, specifically rebounding. I feel like we've been out-rebounded in at least 2/3 of our games this year. Can you give me a list of names the advanced stats suggest are good rebounders - guys who we may want to consider targeting in a trade?

This is a pretty straight forward question. One contextual thing to consider here, which the analysts will need to make use of when answering the question, is this - the Heat are known as being a team (along with any Doc Rivers coached team) to not emphasize offensive rebounding. The system is basically that if a player is in position to get the rebound then he should go for it. Otherwise, he should not go out of his way to crash the boards if it will take him out of position to prevent a disadvantage on the other end of the floor. The statistics back this up - if you rank teams by their offensive rebound rate (defined as (offensive rebounds) / (offensive rebounds + opponent defensive rebounds)) the Heat are in the bottom five in the NBA (note - the strategy may not be working, the Heat are third to last in the NBA in opponent's effective field goal percentage, or eFG%). 

Team
OREB Rate
Rank
Atlanta Hawks
20.8%
30
Orlando Magic
21.1%
29
Los Angeles Clippers
21.5%
28
Charlotte Hornets
21.6%
27
Miami Heat
21.6%
26
  courtesy of NBA.com/stats

So when it comes to being out-rebounded, it could mean that Spoelstra's request was specifically focusing on defensive rebounds, since the Heat are a team that by design will likely be out-rebounded on offense. It will be important to work with the coach to understand his goals as they relate to rebounding, in case there are other factors he would like to be included in the analysis.

2) Use industry knowledge to develop a method for answering the question
This is the fun part.

The key here is to understand that simple box score metrics may not be the most effective method of determining who the best rebounders are - because it doesn't get at the heart of what makes someone a good rebounder from a skills perspective. What do I mean by this - if you look at rebounds per game, that statistic does not account for the fact that players will play different amount of minutes per game. Even if you look at rebounds per 48 minutes rather than per game, there are still issues regarding rebounding opportunities - players will have different numbers of opportunities to rebound based upon where they are asked to play, and to some extent based on how good their teams shot defense is (see chart below that I came up with - teams that hold opponents to a low FG% generate more rebounding opportunities).


When it comes down to it, the best rebounders are the ones who win the contested battles - a combination of boxing out appropriately and getting to the right spots on the floor that allow them to get rebounds. To some extent (the full extent I'll explain in the third section below) you have this type of data available now with the SportVU optical player tracking data. For example - take a look at the screenshot I snagged from SportVU below on a DeAndre Jordan rebound:

courtesy of nba.com/stats

Jordan leads the NBA in rebounds per game. The blue dots are the Clippers and the red dots are the Jazz. DeAndre Jordan got this rebound off a missed shot from the Jazz's Derrick Favors. This rebound could have just as easily gone to JJ Reddick, and was also uncontested. In that sense, it's not what I would consider to be a high skill rebound. I will give Jordan some amount of credit for being in the right spot on the floor to get the rebound, but only a minimal amount of credit.

These are the types of things that need to be considered when developing a method to answer this question.

In order to come up with an accurate metric - here are the tactical steps I would take based on the logic above:

1) Determine the degree of difficulty associated with each type of contested rebound when broken down into categories of contesting defenders. For example, when one defender is contesting, what is the average win rate for the defensive player/offensive player. When two defenders are contesting, what are the win rates, etc.
2) Use those degrees of difficulty to assess how valuable certain types of rebounds are. For example, if across the league on average a defender only wins 25% of the time when he has two offensive players contesting the rebound, and a defender wins 75% of the time when he has one offensive player contesting, I would weight the former three times as heavily as the latter.
3) I would create a weighted average for each player using the weights from #2 above
4) An additional weight - uncontested rebounds would get a small amount of weight up to a certain point, but very small relative to contested rebounds. This is to give credit to these players for being skilled enough to find the spots to get wide open rebounds.

There are a few other important things that need to be looked at here

1) By Position - Post players will be the primary source for rebounds as they are the players most likely in position to gain rebounds to begin with. But that is not to say swing forwards and guards cannot be valuable rebounders (think Kawhi Leonard and Rajon Rondo). But the rebounding skills need to be understood in terms of the position. For example, further analysis could find that the post players on the Heat are slightly below average, but the swing forwards and shooting guards are way below average. It could be an issue of bolstering the rebounding abilities at those positions. Not saying this is necessarily the case, analysis would need to be undertaken to come to that conclusion, but it is something to consider
2) Offensive Rebounds vs. Defensive Rebounds - Because the Heat value defensive rebounding relative to offensive rebounding, I'm going to more heavily weight players for this analysis for their defensive rebounding stats relative to their offensive rebounding stats.

3) Create data gathering process
The NBA has done a formidable job of making data available through their SAP HANA portal. But the big issue is that they have made it consumer only, not giving the public author abilities (for those of you who have ever consulted or worked on a BI project, not allowing users to have author abilities is not necessarily a best practice - I bet the reason why is that they are not confident in the data yet and are worried that this will expose errors). I can get a league-wide sortable list of players that includes contested rebounds gathered and rebounding chances, but not the other pieces I mentioned above. Even using a program in R to scrape rebounding logs for individual players would not work because it does not have the data on rebounding chances that were not converted.

So, to some extent, I do not have answers right now and this remains a work in progress.

One step to consider after the data and calculations are done is to see whether it passes the "eye test". There are a few ways to do this. One is to talk to basketball minds and see if they agree (or if you agree). Another is to run a correlation. I would think, just hypothetically, that the results of this would correlate pretty well (I'm thinking a correlation coefficient of at least 0.60) with the rebounds per game metric. This is because the players who coaches know to be the best rebounders (which should match with the results of our analysis) will correlate with who gets the most playing time, and thus correlate with volume of rebounds - which translates to rebounds per game. So if that correlation exists I would think the metric we developed passes the "eye test".

4) Determine appropriate visualization
Kirk Goldsberry has done some great things with his "spread %" and the visualization of the metric. Spread % is roughly the ability of a player to score well from multiple areas on the floor. His visualizations of this have become the gold standard of sports visualization:


For the analysis we are working on, I would think that a simple ranking would do to some extent, but what would be a good thing to show is, especially if doing this by position - guards, swing players, post players - who are, for example ,the swing players that are above the median post player in terms of rebounding skill. Similar to this chart from Pew Research, but with rebounds instead of political ideologies.



5) Develop an effective presentation
There's a great quote from Einstein "If you can't explain it simply, you don't understand it well enough". They key with analysis of this type is to keep it simple but make sure you cover the key points - in this case (a) the impetus for the analysis (b) the approach to determining who the best rebounders are (those that do the best in contested situations) (c) the methodology (weighting for more difficult contested situations (d) how the data was gathered and where it was gathered from (a big point of concern for coaches - especially Stan Van Gundy) and (e) the results (using the data visualizations above).

6) Determine next steps such as follow-on research
The analytics are just one piece of the puzzle. It could very well be the case that this was used to just validate existing thoughts the scouts had, or potentially identify someone they had missed. Based on this, some potential next steps might be to look at the offensive stats of certain players to see who might not only be able to be an asset in rebounding, but could fit in well with how the Heat run their offense.

No comments:

Post a Comment