January 12, 2006
More Hilbert Questions
At the beginning of a new year, it seems commonly accepted to start thinking about things anew. Personally, I've yet to start going back to the gym, but that doesn't mean that I don't waste a lot of time thinking about unanswered baseball questions that we could potentially quantify. Many times--usually when I'm struggling for column topics--I wonder if we're beginning to run up against the limits of baseball analysis. After all, how many more ways can we quantify player performance?
Whenever I hit that wall, I pick up my copy of Baseball Prospectus 2000. In that edition of BP (and reproduced here in 2004), Keith Woolner outlined some of the key questions left unanswered with regards to baseball research. Some are questions of amazing complexity, such as #13: Assessing the "coachability" of players and #19: Optimizing the competitive ecology of the game. Others have been researched since the article's original publication: catcher defense, the relationship between wins and dollars, the value of draft picks, and game-theoretic frameworks. But there are still many great questions left unanswered.
I have a few more logs to throw on the fire. This is by no means an exhaustive list, and I'm happy to take suggestions for additional seminal questions that need answering. But this is a start. Keith listed numbers 1-23, I'll pick up with number 24.
24) Determine the correct methodology for removing the effects of the ballpark on player performance.
Park factors have been around for a long time and they're finally making a dent in the mainstream thinking, thanks to a big push from Coors Field. However, while comparing home and road statistics to get a general idea of the way a park plays in relation to the other parks in the league is a good start, there are vast areas for improvement in the way that players are analyzed given their environment. Tom Tippett did some excellent work on this problem at the SABR convention a few years ago, discussing why using a percentage overlooks the fact that parks affect different players in different ways.
Dayn Perry reminded us that failing to break down park factors by handedness removes key information from the analysis. The Rangers are commissioning a study on wind patterns in their ballpark, understanding that while the fences are the aspect of the park that we focus on the most, there are a wide variety of other factors that influence the game. We don't know for certain if players begin to tailor their games to their particular home park, or how that affects players stuck in Coors Field, Colorado Springs, Asheville, New Orleans or any number of other extreme environments.
Most importantly, there is little vetting of results to determine if one methodology is correct or not. How do we determine how a player would have done in a neutral park in a given season if we cannot remove him from all those factors that influence his particular game? Does comparing the performance of players who changed parks from one year to the next reveal how accurate a park factor is? How much should we rely on previous seasons of data when parks change both in physical appearance and weather conditions every year? This problem may be one that cannot be answered with our current data; we may have to wait until more accurate ball-trajectory and weather information becomes available before a suitable solution can be identified.
25) Assessment of ideal development decisions for prospects.
As players move up through the minor leagues, their advancement or demotion can be based on any number of factors: actual performance, the mastery of a required skill or roster need, to name a few. However, the impact of promoting a player "too soon" or demoting him has not been thoroughly explored. Recently, Nate Silver noticed that PECOTA projections change dramatically given the player's level of play, even when accounting for translations between leagues. There was a long discussion about the possible reasons behind this fact, but it highlights the concept that the impact of level of play, in the absence of performance differences, on future player performance is understudied.
Can we determine if a player will suffer long-term damage to his development if he struggles at a high level when he could have dominated at a lower one? How much do players focus on their performance if their development path is based on other factors such as mastering a new pitch or learning to hit a specific way or to control the strike zone? Is the act of keeping players in the minors longer than necessary to keep their service clocks stopped actually depriving them of needed development that will cost the team value down the road?
26) Quantifying the impact of injuries and the medical staff.
Tom Gorman, Will Carroll, Mike Groopman and others are doing some groundbreaking work in this area, but injuries and their impact are still impressively understudied. How do we determine what is a preventable injury and what is not? Can teams predict how many games a given player will likely lose to injury? How much difference does the medical staff have on player recovery and injury rate? What impact on player performance do injuries that don't require a trip to the disabled list or even missed games have?
Like the discussion of park factors, this question is more dependent on gathering the right information than anything else. Teams have real incentives to keep medical information close to the vest, and acting on incorrect data is a fault that has haunted every decision-maker since the beginning of time. If accurate information can be found and codified, we can begin to answer some of these questions. Until then, we can only speculate on what's going on in Curt Schilling's ankle and Gary Sheffield's shoulder and how those things affect their performance.
27) Valuation of player contracts, particularly options and other restrictive clauses.
There have been quite a few players signed to some interesting contracts in the past few years, contracts that include features like vesting options, multiple option years, and all sorts of no-trade clauses. Can we put a dollar figure on those options? How does a vesting option based on playing time change a player's approach to the game and the trainer's room? How do we prevent teams from benching players to prevent them from reaching those milestones in their contracts?
28) Quantification of the value of front office personnel.
One of the original Hilbert questions discussed identifying and quantifying good coaching; another discussed the impact of the manager on winning games. These questions can also be posed about front-office personnel. While the analysis of front-office performance is largely a subjective effort for the time being, there may be a way to begin to place a dollar value on decision-makers. How much is Billy Beane worth to the A's? What about John Schuerholz and the Braves? How much of a difference do they make? How much of it is actually the coaches they hired and the players they acquired? Are there people who are distinctly better at the arbitration process than others? How much do they save a team?
29) Team- and situation-specific player valuation.
As noted by many people in the performance analysis community, the process of assigning a single number value to players has been virtually exhausted. People still sometimes fight over that last little bit of accuracy, but for the most part, whether it's MLV, EqR, VORP, WARP or any of the multitude of metrics publicly available, virtually all of them are going to value most players very closely. That said, each player brings something different to the game and matching that skill set to usage is something that teams often fail at and performance analysts sometimes overlook.
Because all players are paid in dollars and are up for the same awards, it's very handy to have a single number that contains their total value. But when constructing a roster properly--question #21--being able to deploy players of equal value in different situations increases the overall value of the team.
Additionally, most metrics involving player valuation use average run values of different events or the change in run scoring by putting a player in an otherwise average lineup. However, inserting different players into different lineups may generate vastly different run outputs. How much does a speedster add to the plodding Boston Red Sox of 2003-04 (other than Dave Roberts and his famous steal)? Would the same player be worth more or less to a team like the 2005 White Sox or 1985 Cardinals? We have the data to determine if the Red Sox would be getting diminishing returns by signing another high-OBP, low-AVG player in 2003, and how that same player would add more to a different team. While it's easy and often highly informative to slap a single number on a player and call it his value, it's possible to quantify the value a player has specifically to his team.
This list could go on for quite some time, but considering the rate at which we've been answering the original Hilbert questions posed by Keith, it's probably best to stop here, lest we head into the 22nd century. It's a new year; let's see what other questions we can answer.