Throughout the past week, Corey Dawkins and Marc Normandin have been using the Comprehensive Health Index [of] Pitchers [and] Players [with] Evaluative Result, otherwise known as CHIPPER, to break down the expected health of teams. If you've missed any installments and want to page back through them, you can visit the Team Injury Projection homepage by clicking here. We've heard your questions about what CHIPPER means, where the projections came from, and how they differ from what others provide, and it's time for us to answer them.
Let's take the last one first. What makes CHIPPER different from the other injury projections out there? First off, our injury database contains not only major league injuries from the past eight years, but also minor league, spring training, and winter league data. We're even starting to collect injury data from colleges. All told, we have over 400,000 player days missed to injury in the database. Secondly, CHIPPER does more than just project whether a player is going to miss time: it also tries to provide a ballpark figure for how much time a player is going to miss.
So how does it work? Let's take a look at the Boston Red Sox Team Injury Projection, which is free for all readers, and we'll cover the details.
BOSTON RED SOX Team Audit | Depth Chart |
Dashboard
2010 Recap |
2010 |
2009 |
2008 |
2007 |
||||||||||||||
Third in AL East 71 entries 21 DL trips |
||||||||||||||||||
|
1349 TDL 27th |
19 DMPI 12th |
1073 TDL 19th |
17 DMPI 10th |
939 TDL 13th |
14 DMPI 7th |
884 TDL 10th |
18 DMPI 4th |
The Dashboard gives you some context about the team's injury situation in aggregate. In the 2010 Recap section, you'll find that Boston had 71 entries in the CHIPPER database for 2010–all of the disclosed injury incidents for which we have a record. Of those entries, 21 were DL trips, or stays on the Disabled List. Boston had 1349 TDL, or Total Days Lost to Injury, and 19 DMPI, or Days Missed Per Injury, during the 2010 season. You can see Boston's historical TDL and DMPI across the rest of the dashboard; the numbers below the graph labels display the team's ranking league-wide, and the graphs are color-coded to reflect the team's injury performance relative to its competitors.
Hitters in approximate Depth Charts order at time of publication
Days Lost to Injury |
2011 Injury Risk |
||||||
Player |
Age |
2008 |
2009 |
2010 |
1-day |
15-days |
30-days |
Dustin Pedroia |
27 |
0 |
4 |
99 |
|
|
|
Carl Crawford |
29 |
50 |
4 |
9 |
|
|
|
Darnell McDonald |
32 |
0 |
0 |
1 |
|
|
|
CHIPPER's goal in life is to predict the chance that a player is going to miss time in 2011. It considers the likelihood of a player missing one or more games to injury, more than 15 games to injury, and more than 30 games to injury, and rates that risk as either green, yellow or red. We've represented these on the player lines with a color scheme you're used to but symbols you aren't.
Green: ~15 percent or lower chance; Yellow: ~15-85 percent chance; Red: ~ 85 percent and up chance of the player missing this many games with injury.
It's important to note that we're considering only games lost to a disclosed injury, not simply days off. While almost no one plays 162 games anymore, many players don't actually have any injuries reported in a given season. The database behind CHIPPER tracks injury reports beyond just DL visits, but there has to be a reported injury; we're not tracking or reporting routine days off.
Among the sample hitters above, Dustin Pedroia is the best bet to miss a significant amount of time coming off of his injury troubles last year, Carl Crawford is very likely to miss a few games, and Darnell McDonald's fairly unscathed past and positive markers give him a better profile for injury risk.
Pitchers in approximate Depth Charts order at time of publication
Days Lost to Injury |
2011 Injury Risk |
||||||
Player |
Age |
2008 |
2009 |
2010 |
1-day |
15-days |
30-days |
Jon Lester |
27 |
0 |
0 |
0 |
|
|
|
John Lackey |
32 |
53 |
50 |
0 |
|
|
|
Tim Wakefield |
44 |
19 |
62 |
0 |
|
|
|
In the sample of pitchers above, Jon Lester profiles as one of the least-risky aces in the majors, John Lackey was durable last year but missed considerable playing time in 2008 and 2009 and remains a risk, and Tim Wakefield's age and significant time lost in 2009 make him very likely to spend an extended period in the trainer's room in 2011.
CHIPPER uses logistic regression to determine whether a player is going to miss time at each of the thresholds we've set. For position players, we consider age, position, time lost to injury during the previous three seasons, and proxy variables to represent player type. For pitchers, the categories include age and time lost to injury during the previous three seasons. Surprisingly, including workload didn't make much of a difference in the results.
You're probably wondering what this means to you as a fan or a fantasy owner. If we say a player is a high risk, should you expect him to head to the DL? Well, yes and no. I expect at least 70 percent of the players we indicate as high-risk to hit their injury threshold, but I can't tell you which ones. If I could, I'd be making a killing in Vegas and not sharing my predictions with you. Since you're not going to get a firmer guarantee out of me, let's move on to what's still to come with CHIPPER this season.
The Team Injury Projections we've run to this point are based only on major-league injury history and contain only the players who saw time in the bigs last season. That should change this week, as we include the minor-league data from our injury database. We'll be sure to let you know when things are updated, and the Player Forecast Manager will always contain the most up-to-date projections. We're still just getting to know this data; looking further ahead, we're working on improving the specificity of our injury projections enough to use it as an input to our PECOTA projection system. We'll also be rolling out this information in team reports and player cards.
We're also planning to add more data about the previous injuries to the mix. Someone who's had hamstring issues is probably more likely to suffer from a reoccurrence than someone who broke a finger. The difficulty here is small sample size and proper categorization of injuries. That's why we hired an athletic trainer with extensive medical training and experience. Beyond that, we'll continue to refine the model where we see opportunities for improvement, and we hope to introduce additional tools to help you understand, measure, and react to injuries as they occur.
Our goal is to give you the best team injury reports in the business, backed by real-world injury experience, expertise in data analysis, and the only verifiable data set of its kind in the field. Data-driven injury analysis is a relatively untapped area, and there's plenty left to explore. If you have any ideas or suggestions, please let us know.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
Did you do any testing on the system? Input data through 2009 and see if it correctly predicted 2010?
Its that staff writers will *publicly* comment on and critique other staff writers' work.
It shows this is not a rubber-stamp shop.
Thanks guys!
And that sort of leads to my second question, which is how does the system perform with test data? I assume you did assessments based on cross-validation or data from previous years, and I think it would be informative for you to share how well it did. For example, if you ran system for the 2010 season (obviously using data from previous years), how accurate would it be?
The total number of expected injuries and actual injuries match up quite well, but we do see some discrepancies at the extremes as I mention above.
Basically, for pitchers at 30+ games (the worst estimate), we're over-estimating the red risk by about 40%.
For position players at 1+ games (the best estimate), we're over-estimating the red risk by about 15%.
I'll try to run the model against some earlier seasons later this week if I get a chance, but that's behind adding in minor league injuries in my to-do list.
I assume the latter but it seemed ambiguous.
It's closer to the second.
Let's say we have a hypothetical situation where we have 100 green, 100 yellow, and 100 red players. We'd estimate probably 7 green, 50 yellow and 95 red to get hurt. The actual results are more likely something like 30 green, 50 yellow and 66 red.
Is that more clear?
If we look very specifically at injury details - bucketing every strained groin together, for example, then our sample size for each category becomes really small.
The right answer I think is to more broadly categorize injuries - maybe muscle and tendon problems in the throwing arm, for example. That's going to take time, and a lot more medical knowledge than I have. But I do have help (http://www.baseballprospectus.com/article.php?articleid=13009).
The data that I've gathered is very detailed and has a separate field that covers the injury in greater detail than simply shoulder strain. As much information that is known goes straight into the database.
So for instance if it was an older player who had the open procedure it would state this (it would have to be a significantly older player) vs a younger player who would be counted as arthroscopy. This is the case for every injury I encounter.
Thanks for the feedback.
"We've heard your questions about what chipper means, where the projections came from, and HOW THEY DIFFER FROM WHAT OTHERS PROVIDE, and it's time for us to answer them."
How does CHIPPER differ from last year's system ... other systems? (NOMAR?)
I will leave the other parts of the answer to the smart guys who can explain the mathematical side of things much better than I can.
Keep up the good work -- Very useful information.
http://baseballanalysts.com/archives/2010/02/verducci_effect.php