Saturday, April 23, 2011

Stat of the Day: NBA Scoring Trends


Understanding the affect that experience has on a pro-basketball players scoring potential is an important question for both team owners, coaches and players themselves. Although years of observation can provide fairly accurate estimates, a statistical analysis may still reveal useful information about this relationship.

Using a data set from a popular econometrics textbook (Wooldrige) that contains salary and career statistics for 269 players in the National Basketball Association (NBA), this post's goal is to explain how experience affects a player's points per game. A non-linear regression could be used to quantify the relationship between points per game and experience after controlling for  years played in college and age. To see the results, hit the jump!

We found that experience does increase the points per game for the average NBA player, but that after the 13th year in the league experience becomes detrimental as aging begins to hamper a player's ability to score. It appears that as players enter the league, experience increases their scoring productivity by about 2 points per game per year well into their 6 and 7th year in the league. This of course isn't sustainable--once players have climbed the learning curve and begin to age, they gain less from their time in the league until their scoring begins to suffer. Once a player passes his 13th year in the league, players actually begin to lose about 2 points per game. Even with the small sample of players with more than 13 years experience in the data set, the trend tends to speak for itself (see Figure 2) and the regression statistics validate these results.

Figure 1:  Frequency distribution of points per game with number of players


Figure 2: Correlation between points per game and years as a professional player. Note: This is a two dimensional representation; thus the decline looks like it happens after year 8 but after controlling for more variables a different picture emerges, and the decline occurs after the 13th year.
The regression pictured above explains about 12% of the variability in points per game. This should not be surprising because experience should only account for a small fraction of the many variables that contribute to scoring success. One can image how coaching, team winning percentage, number of all-stars, and a host of other factors can affect any given players ability to score. These topics are surely of interest and this data set can support the testing of further hypothesis about the driving factors behind scoring in the NBA. Although, the overall model explains only a fraction of the variability, we can confidently say that experience is a driving factor in explaining the rise and fall in a typical players points per game. There is no question that years in the league is statistically significant in explaining points per game within a 99% confidence interval (linear t-stat = 4.67 and non-linear t-stat  = - 2.44).

4 comments:

  1. "A non-linear regression be used to quantify the relationship between points per game and experience after controlling for other factors such as years in the league, years played in college, and age."

    Good topic, but how are you measuring experience and controlling for experience at the same time?

    Depending on if this is a time series or cross-sectional analysis, there also may be a great deal of survivor bias. The improvement of the guys who were good enough to stay in the league may be overstated because the scrubs are gone and not included in the data.

    ReplyDelete
  2. An interesting point...maybe next we'll look at a similar study for Hall-of-Famers, or at least multiple-time All-Stars, with 10+ year careers. JJ's the intense math guy, however, so we'll see what he says on the matter.

    ReplyDelete
  3. @lqswrds: Good catch, actually that was a typo. I controlled for age and years as a college player. Experience was defined as years in the NBA. Along the same vein though, age and experience might be too similar to include in the same regression, but I conducted an F-test to test this hypothesis (p = .0009) and found that age did belong in the regression, excluding age would not have been ideal.

    The data was cross-sectional, but to your point, a set of panel data could better address the concerns about survivor biased. Using panel data something like a Heckman-Error correction model could be used to remove the biased. Unfortunately, I don't have panel data, but would love to.

    Given that we have cross-sectional data, I believe a simpler way would be to run two separate regressions one for players with experience greater than, say 5 years, and compare it to the regression with players of all experience levels (similar to what Merlin suggest above). Then conduct an F-statistic to see if there is a difference between the point/experience relationship when we only look at veteran players, scrubs are gone. I believe this would work, right?

    Using time series/panel data can remove some of the fixed heterogeneity (things that don't change across time) while at the same time reducing survivor biased.

    Thank you for your thought provoking comment. I had planned on torturing this data set for several more weeks, so stay tuned. If you would like a copy of this data, please email me at espin086@ucla.edu

    ReplyDelete
  4. You have this well-thought out, I'm looking forward to your "tortured data set". Good stuff.

    ReplyDelete