SportStatistics

Monday, July 4, 2011

Effects of Race on Baseball Players Salaries

BACKGROUND AND SUMMARY OF RESULTS

Using data from over 300 major league baseball players we found that a player's salary is statistically effected by the interaction between his race and the racial composition city in which he plays. Whites earn less than blacks and Hispanics in cities populated by minorities. When the population in a city grows to 20% black, a black player will earn about 5.2% more than a white player even if they perform equally good on the field. This may or may not be a sign of racial discrimination in player's salaries. According to economist and statistician Jeffrey M. Wooldrige:

"We cannot simply claim that discrimination exist against blacks and Hispanics, because the estimates imply that whites earn less than blacks and Hispanics in cities heavily populated by minorities. The importance of city composition on salaries might be due to player preferences: perhaps the best black players live disproportionately in cities with more blacks and maybe the best Hispanic players tend to be in cities with more Hispanics."

The econometric estimations that follow show a strong correlation between race, racial composition of player's cities and earnings of Major League Baseball players, but we cannot distinguish between the hypotheses of racial discrimination versus player preference as the driving factors.

DATA AND VARIABLES

The analysis of a city's racial decomposition on player's salaries will be done via a multivariate regression analysis. The idea behind a regression analysis is that it allows us to compare apples to apples. There are many variables that can effect a players salary, but the most obvious is performance. We want to control for a players performance in our regression analysis. The following variables will be our control variables, with the exception of blackpb and hispph which are the variables we are testing. The variables blackpb and hispph are interaction variables between race and the racial composition in a player's city. The estimates and statistical significance of blackpb and hisph are the subject of this post. A full list of all variables included in the regression analysis are described below:

Data Source: Companion website for Introduction to Econometrics by Wooldridge.

This next table provides some descriptive statistics for the variables above...

The table above shows that the average player in our data set has 6.3 years of experience with a minimum of 1 and maximum of 20 years in the league. The batting average is 258.98 hits out of a thousand at bats with about 7.1 home runs per year. The percentage of players that are black is 30.5% .While Hispanic players account for 18.1 % of the population.

The table below is a simple scatter plot that shows the salary of players and the games per year they played. There is an obvious positive correlation with games played per year and salary:

Not surprisingly there is also a strong positive correlation between the percentage of years a player has been an all-star and wages as shown in this graph below:

Although these scatter plots are descriptive of the performance to salary relationships they are far too naive for any real comprehensive analysis. In order to capture the true essence of the effect of performance and race on a baseball player's salary a multivariate regression analysis needs to reconcile all the driving factors. This is what is done below.

REGRESSION ANALYSIS

The following table is the output from a multivariate regression describing the correlations between our control variables, variables we are testing (race and city racial composition interaction), and wages. The top section of the table describes some statistics of the model, but focus your attention on the column labeled "Coef." and "t" in the bottom table.

Reading Regression Results Above

The variables in the regression are under the "lsalary" column. The column "Coef." can be interpreted as the percentage change in salary given a one unit change in our explanatory variables from the first table. For example, look at the "year" row above and the its value under "Coef." of 6.7%, this means that for every year a player is in the league you can expect his salary to increase by 6.7% on average after controlling for all the other variables in the regression. The phrase, "after controlling for all other variables in the regression" can be included after the interpretation of any variable in this model! This is what makes regression analysis so powerful.

Next focus on the column labeled "t". This is a t-statistics and when it is greater than two in absolute value we say the interpretation of the "Coef." variable is statistically significant. In other words, if "t"is say 10 or -10, then we say the effect of the corresponding "Coef." x on a player's salary is statistically significant. If however the "t" column is contains -.5 or 1, for example, then we say that the effect of the corresponding variable in the table is statistically insignificant in explaining MLB salaries. In the regression table above, only years, gamesyr, allstar, and the interaction between race and city composition are statistically significant.

Interpretation of Statistically Significant Race and a City's Racial Composition

The regression table above suggest that after controlling for player characteristics and performance Hispanic and black players get paid more as the percentage of their race in the city they play in increases.

Being black in a city with zero percent black people means you earn about 19.8% less than white players after controlling for player ability (this comes from the black coefficient above). However, as the percentage of blacks increases this changes discrepancy rapidly changes. If a city has a population of 10% black then one can calculate the effect by multiplying coefficient of blckbp ( which is 0.125 in the regression table above) by 10 and adding it to the negative 19.8 % to see what effect this change racial composition has on salaries for a black player:

-19.8 + 0.125(10) = -0.73

The calculation above shows that when the population of blacks within a city increases black players get paid only 7.3% less even if the players have identical performance statistics (regression...controlling for performance variables). When the population in a city grows to 20% a black player will earn about 5.2% more than a white player even if they perform equally good on the field. The population with the largest percentage of blacks is Detroit with about 74% black residents.

Tuesday, June 21, 2011

The Deepest Draft Classes Since 1981.

The Number One Pick?

The 2011 Draft class seems to be one of the weakest in a long time; but which classes were the strongest? The following series of charts and articles will be an attempt to determine the answer. The depth of each draft class from 1981 to 2010 was measured by the number of players they placed in the top 50 during each season of their career. The top 50 players are measured by minutes played; while this measure may seem inadequate, it is probably the best measure of talent that can be used to compare one large group of players to another large group of players. I understand the limitations of using minutes played; comparing one player to another player using only minutes played would be idiotic, but large groups smooth out the differences in minutes played that are due to coaching preferences, team injury situations, and other distortions of talent evaluation. Another thing to mention about the method is that the draft classes from 2006-2010 cannot be really measured accurately, as their careers are still unfolding. The results from those years should be disregarded.

Here’s the rating system: Each draft class that leads the league, during a given season, in number of top 50 players gets 5 points, the second and third place classes get 3 points, and the fourth and fifth place classes are given 1 point. Remember, this study is attempting to find out the deepest draft classes, not the most top heavy.

I think the 1984, 1996, and 2003 drafts had the best top talent, but were they the deepest? The table shows the top ten draft classes since 1981. On this table is total points (explained earlier), number of times the class led the league, the year the class had its best season and how many top 50 players it had that year, and what season of its career did the class peak.

Notable Observations:

Most of the best draft classes seem to have the ability to simultaneously take over the league from their predecessors and prevent the younger classes from pushing them from the top.

Two NBA Finals' MVPs

The class of ’98, who had to wait until January ’99 to start their lockout-shortened rookie year, did pretty well for themselves and did not peak until their 11th year. The average class peaks around their 6th or 7th season, then the younger classes start to take over. Once you get past one of the worst number one picks in NBA history, Michael Olowokandi, it was a really good draft. Dirk Nowitzki, Paul Pierce, Antawn Jamison, Vince Carter, and Rashard Lewis were all drafted that year; and there were at least 5 or 6 more good players drafted in 1998.

And With The Third Pick...

Class of ’84: Jordan, Olajuwon, Barkley, and Stockton: The greatest player; a top 2 or 3 center; an undersized, versatile, rebounding machine, all-time great power forward; and the NBA’s all-time leader in steals and assists. Don’t be fooled by the top heaviness; there were some other good players in this draft: Big Smooth Sam Perkins, Steal Machine Alvin Robertson and even great coach, but not-so-great player Rick Carlisle. I don’t want to turn this into a list of the 1984 draftees, but you get the idea.

The 1992 class was very interesting: Shaq, Zo, Spree, and others were in this draft, but that’s not the interesting part – this class(as a whole) peaked after only its 3^rd season.

We'll Miss You.

The class of ’96 is one of my favorites; Kobe, Iverson, Ray Allen, Steve Nash, and the undrafted Ben Wallace highlight this class.

What A Class, But Where's AI?

Tuesday, May 31, 2011

Dirk With A Ring: Better Than The Ringless Wonders, Ewing, Malone, and Barkley?

Does Dirk jump ahead of Patrick Ewing, Karl Malone, and Charles Barkley on the all-time greats list if he wins a ring this year? In other words, does winning a single championship necessarily make one superstar better than another superstar who hasn’t won? The Bulls, Lakers, and Spurs have dominated the last 20 years; as a result, some players have been denied rings that they may have won otherwise. The table below is an attempt to distinguish between the guys who should have won a championship and just had some bad luck and the guy who just didn’t get it done in the playoffs. I measure them using something called “Consolation Points”; the player who loses to the eventual champion in the first round gets 1 point, second round gets 2 points, conference finals gets 3 points, and NBA finals gets 4 points. I had to make the “superstar cutoff” somewhere, so I only included guys who had met one of the following criteria: 20,000 career points, 8,000 career assists, a career average of 20 points per game, or a career average of 8 assists per game. The cutoff could be made at many different milestones, but I don’t think they’re unreasonable. Defensive stats are conspicuously absent from the criteria; that is because in the context of talking about great players and winning championships, almost all the talk is about offensive players, Bill Russell notwithstanding.

I don’t think Dirk moves ahead of Ewing and Malone if he wins a ring. I do think he is already of Barkley just by virtue of making this year’s finals; Barkley has lost to the eventual champ only once in the Conf. Finals or later. However; my eyes, heart, and those Right Guard commercials still say choose Barkley over Dirk.

Some Notables About The Table:

Jason Kidd is third on the list. I’m rooting the most for him to win this year; he is a winning player. Nobody was beating the Lakers and Spurs during early 2000s – except for the Lakers and Spurs.

The top seven guys on this list kept running into dynasties; they all lost to the Lakers, Spurs, or Bulls at least once. Malone, Ewing, Kidd, and Stockton really got hit hard against those dynasties; each of these guys would have a ring if not for Jordan, Shaq/Kobe, or Duncan.

Jordan’s Bulls almost single-handedly prevented 3 all-time greats from winning a ring, that’s greatness.

The guys in this year’s finals; Kidd, James, Nowitzki, and Bosh, each get 4 points for this year. Obviously 2 of these 4 guys won’t be on this list anymore.

Of the top 10, three of the players really squandered golden opportunities: Malone, Ewing, Nowitzki lost to teams they probably should have beaten, the 2004 Pistons, the 1994 Rockets, and the 2006 Miami Heat, respectively. In this just-ended 20-year era of dynasties, a player has to take advantage of every opportunity.

Only 38 players met the criteria to make this list; the table lists the top 20 players. Four of the remaining 18 players have no “consolation points”: Walt Bellamy, I don’t know enough about him to form a opinion; Gilbert Arenas, not a winner; Tracy McGrady, not a winner; and Chris Paul, it’s too early in his career.

Dominique Wilkins, Vince Carter, and Tim Hardaway have less than 4 “consolation points”: I was surprised by Tim Hardaway; I saw him as a winning player, but his Miami days were only a small portion of his career. Vince Carter has never really been thought of as a winning player, no surprise there. Dominique was never seen as a winning player either and he had many chances to play to the NBA Champ; the Sixers, Pistons, and Celtics won 5 championships in seasons beginning in the 80s. I have to cut him some slack though; the East was tough in the 80s. In addition to those 5 championship teams, he had to deal with the early to mid 80s’ Bucks and the mid to late 80s’ Bulls.
There's A Stat For That

Monday, May 23, 2011

Leadoff Rankings: Week 8

Previous Rankings: Week 1 | 2 | 3 | 4 | 5 | 6 | 7

The eighth weekend of the 2011 season has come and gone, and we're changing up the way we do our rankings. For the first time, our Leadoff Rankings have been determined not by my all-knowing power, but by the use of a formula. Similar to what we did on Thursday with our Rotation Rankings, we set all the statistics relative to league average, and then weighted each statistic to mean more or less than others (OBP means more than slugging percentage, for example). This gives us an overall average of 1.000, with lower scores correlating to above-average performance. More a fan of pitching? Check out our Rotation Rankings, published every Thursday, where we rank each team by the performance of their starting pitchers. The rankings are based on season performance (90%), with a small bias towards recent performance. The stats come from each team's first and second batters every game, regardless of the name on the back of the jersey. To see how things turned out this week, hit the jump!

Stat Line of the Day: May 22nd

Oklahoma City Thunder: 1-17 3PT, 32-36 FT vs. Mavericks

Harden (L) and Durant shot a combined 0-12
from downtown in Game Three

In a close Game Three loss to Dallas, the Thunder's shooters were something of a mixed bag. OKC as a team shot 36.5% from the field--not great, but far from Butler-in-the-title-game-esque. The real striking stat from Scott Brook's squad was their three-point shooting. The Thunder shot only 1-17 from downtown, including an 0-8 from Kevin Durant and 0-4 from James Harden. In fact, until Russell Westbrook hit the team's lone three-pointer with 35 seconds remaining, the Thunder were about to set the record for postseason three-point futility. If the Thunder had ended the game 0-16 from downtown, they would have set an all-time playoff record for the most three-point attempts without a make. Instead, they shot 5.9% from downtown--just a horrifying mark. Granted, the Thunder were a mediocre three-point shooting team during the regular season: 19th in the NBA at 34.7%. The Mavericks were also above-average at defending the three-ball this year: 7th in the NBA at 34.3% against. However, 1-17 is just an impressive level of futility--one that made the difference in a close game.

Not to be overly negative with this post, though, I want to highlight how impressive the Thunder's free throw shooting was last night. OKC was the NBA's best free-throw shooting team during the regular season (82.3%), and they were second in the league in free throws attempted (29.3/game). Last night, the Thunder shot 32-36 (88.9%) from the line, including 10-12 during a furious fourth quarter comeback. Though the Thunder fell short, the free throw shooting of their big stars got them back into a game that had looked lost when they were down 27-12 after the first quarter. Durant shot 10-11 and Westbrook shot 13-14: 92% combined. Though they were only 1-10 from three-point range, 23-25 from the free throw line for Westbrook and Durant should at least give Thunder fans a little hope going into Game Four.

Friday, May 20, 2011

Can the Thunder Win without Durant?

Durant didn't score 40 in Game
Two, but OKC got the win

In dramatic, bounce-back, season-saving fashion, the Oklahoma City Thunder pulled out a win last night over the red-hot Dallas Mavericks at the American Airlines Center in Dallas. It was the first time all postseason that the Mavericks lost a game at home, and it tied the Western Conference Finals up at one game apiece. A huge reason that the Thunder were able to win Game Two was that they limited Dirk Nowitzki to 19 fewer points and 14 fewer free throw attempts. By keeping the Mavs' best player from dominating the game, Scott Brooks' crew forced the rest of the Dallas team to hit shots--and they didn't. However, the Thunder did not just perform differently on the defensive end on Game Two--their offense was completely changed as well. Despite scoring only six fewer points in Game Two than in Game One, the Thunder got 28 more points (50 instead of 22) from their bench. They also got 16 fewer points from NBA scoring champ Kevin Durant (see above) and 11 more points from the team's third-highest scorer, James Harden. This got me thinking...do the Thunder really play better when Durant scores fewer points? Any team obviously wants its star players to score as many points as possible, but do the Thunder really succeed when Durant is such a huge part of the offense? Or do they need contributions from Harden and others to succeed? Well, we ran the stats, so hit the jump to check out the results.

Phil Jackson: Really Great or Really Lucky?

Legendary coach Vince Lombardi once stated that “Winning isn’t everything, it’s the only thing.” Although Charlie Sheen would agree and most sports are predicated on this adage, it appears that in the art of coaching even those with the most impressive of track records may be met with forceful skepticism that their merits do not warrant an

anointment of greatness. Phil Jackson, the “Zen Master” himself, has developed an impeccable resume of winning throughout his twenty years as a head coach for the Chicago Bulls and the Los Angeles Lakers. Winning eleven championships during this two decade time period, this means that Jackson has won an NBA Title in an incredible fifty-five percent of the seasons that he has been a head coach. To put this in perspective, when coaching an NBA team, Phil Jackson is roughly 10 percent more likely to win a championship that year than Ben Wallace is to make a free throw. Despite this unparalleled success, Jackson (right) has not gone without his fair share of criticism. Blessed with the commonly accepted (although Bill Russell should be) greatest player of all-time for his first six-pack of championships, Jackson followed up this first act by fortuitously stumbling upon two arguably top ten players of all-time enabling him to win five more rings and reach his currently dauntingly impressive eleven championship rings. There is no doubt that this string of luck is quite possibly the greatest in all of the history of coaching. But how great of a coach was Jackson? Hit the jump to take a look.

The NBA Playoffs: Does Defense Really Improve? (Part 3 of 3)

(This is the final part of a three-part series on defense in the NBA Playoffs)

Part One: The 1980s
Part Two: The 1990s

We often hear from NBA analysts that defense improves, or at least intensifies between game 82 of the regular season and Game 1 of the playoffs. It seems intuitive enough, the playoffs start and there’s a lot to play for: pride, fame, and winning the ‘ship. All that tends to lead to more aggressive play. The LeBron James's drive harder to the basket while the Andrew Bynum's go up for the block harder. It's human nature-- the more that's on the line, the harder they play. But does that aggression lead to better defense during the playoffs? Play-by-play guys and experts like to say it does. Coaches and players like to say it does. But do the stats agree? They didn't for the 1980s...but now we're looking at the 90s. Let's take a look.

Stat Line of the Day: May 20th

J. Giambi (COL): 3-5, 3 R, 7 RBI, 3 HR vs. Phillies
When Jason Giambi won his only MVP, at the age of 29, he hit 43 home runs and drove in 137 RBIs. That was eleven seasons ago. Now 40, the veteran is still playing, though he hasn't had more than 400 at-bats in a season since he was 37, back in 2007

Giambi went yard three times in Philadelphia

with the New York Yankees. In fact, coming into last night, Giambi (left) hadn't even gotten a hit since April 10th, a series of 11 personal games (but 34 Rockies games). Then, for some reason, he remembered his MVP ways for one game in Philadelphia. Giambi hit a home run in his first three at-bats. The first one, against Kyle Kendrick, came in the first inning with two men on base. The next long ball, also off Kendrick, was a two-run shot in the third inning, scoring Carlos Gonzalez. The final long ball came off Danyz Baez in the fifth inning, with Troy Tulowitzki on base. Coming into the game, Giambi had just one home run and four RBIs on the season. The three home runs alone added 40.7% to the win for Giambi, the 79th time this season that a batter added more than 40% to their team's win probability.

Thursday, May 19, 2011

The Great Debate: 5/19

Welcome to the May 19th edition of the Great Debate. Today, Andrew Leff and Jake Adams discuss some of the most interesting sports topics of the day. Today, we talk about the never-ending NFL lockout, the recent Heat-Bulls contest, a frustrating start from the NL East favorites, and a surprising comeback in the NL Central. Hit the jump for the discussion!