Glossary: PECOTA
View Glossary Entries by
The 10Year Forecast is a player's weighted mean PECOTA forecast, taken over his next 10 seasons.
The process for generating a player's weighted mean line for a season some number of years into the future (e.g. 2014) is fundamentally identical to generating his forecast for the season immediately upcoming (e.g. 2012). The exception is that some players may have dropped out of the comparables database, in which case their performance cannot be considered. (See also
Jeremy Giambi Effect).
Note that the TenYear Forecast assumes that a player's team context remains the same for all years of the forecast.
The percent chance that a player's playing opportunities will decrease by at least 50 percent relative to his baseline playing time forecast. For hitters, playing opportunities are measured by plate appearances and for pitchers, they are measured by opposing batters faced.
Although it is generally a good indicator of the risk of injury, Attrition Rate will also capture seasons in which playing time decreases due to poor performance or managerial decisions.
The percent chance that a playerâ€™s production (measured by RA for pitchers, and True Average for hitters) will improve by at least 20 percent relative to the weighted average of his performance over his most recent seasons.
Every player of the same age in our database (historically) is assigned a similarity score to the player in question. For each such similar player, a "baseline" value is projected for runs allowed based on past performance and standard aging curves. The weighted average of "1" or "0" for each of these similar players is taken, where "1" is used each time a comp player accomplished the goal in questionâ€”improving his run prevention or True Average by 20 percent.
Collapse Rate can sometimes be
counterintuitive for players who have already experienced a
radical change in their performance levels.
It's also important to note that established major leaguers
are compared to other major leaguers only, while minorleague
players may be compared to majorleague or minorleague
players, with PECOTA strongly preferring the latter. All
comparables represent a snapshot of how the listed player was
performing at the same age as the current player, so if a 23
yearold hitter is compared to Miguel Tejada, he's actually
being compared to a 23yearold Tejada, not the decrepit
Giants version of Tejada, nor to Tejada's career as a whole.
Diagnostics are a series of metrics designed to estimate the probability of certain types of changes in production and playing time; see the individual entries for additional detail.
Fair Run Average differs from FIP in a few ways. While FIP is concerned only with what a pitcher is believed to controltypically strikeouts, walks, and home runs, though Prospectus includes hit batsmen in our FIP calculationFair Run Average takes things a step further. Pitchers receive credit for good sequencing, thus rewarding pitchers who seem to work out of jams more often than usual. Fair Run Average also considers batted ball distribution, baseout state, and team defensive quality (as measured by Fielding Runs Above Average).
Here is an example of the Fair Run Average spectrum based on the 2011 season:
Excellent  Clayton Kershaw 2.90
Great  Brandon McCarthy 3.42
Average  Ivan Nova 4.36
Poor  Brett Cecil 5.14
Horrendous  Jake Arrieta 5.88
A player who is expected to perform just the
same as he has in the recent past will have an Improve Rate of
50 percent.
The Jeremy Giambi Effect is a name given to the correlation between playing time and quality of performance. The Jeremy Giambi Effect has important implications for understanding a player's PECOTA forecast.
Following are Giambi's plate appearances and OPS for each year of his major league career
Year PA OPS
1998 70 .739
1999 336 .741
2000 302 .761
2001 443 .841
2002 397 .919
2003 156 .696
Note that the correlation between Giambi's PA and OPS is very strong (r=.72). He played more often when he played more effectively, and less so when he played less effectively. Eventually, his performance became so poor that he could no longer secure any major league playing time at all.
MLB% is the percentage of the comps who played in MLB the following season
PEAK refers to a series of metrics designed to evaluate a player's value in some statistic  most often WARP or nonnegative WARP (used by UPSIDE calculations)  over a series of consecutive seasons. It has had two variations. The one currently in use for UPSIDE on the player cards is the fiveyear variant referenced by Nate Silver:
The version of Upside that we’re using here is the peakadjusted variant, which measures a player’s most valuable fiveyear window up through and including his age 28 season (or simply his next five years of performance if he’s already age 25 or older).
 Nate Silver, 2007
Also used in some writings simply uses the next six seasons of a player's career.
In both cases, seasons which have yet to be played are projected using PECOTA instead of ignored, so young players will have the full complement of five (or six) seasons of data. See also: UPSIDE.
Stands for Player Empirical Comparison and Optimization Test Algorithm. PECOTA is BP's proprietary system that projects player performance based on comparison with historical playerseasons. There are three elements to PECOTA:
1) Majorleague equivalencies, to allow us to use minorleague stats to project how a player will perform in the majors;
2) Baseline forecasts, which use weighted averages and regression to the mean to produce an estimate of a player's true talent level;
3) A careerpath adjustment, which incorporates information about how comparable players' stats changed over time.
Check out the PECOTA section of the glossary for more on the system's intricacies.
The Percentile Forecast is a representation of the player's expected performance in the upcoming season at various levels of probability.
For example, if a pitcher's 75th percentile ERA forecast is 3.50, this indicates that he has a 75 percent chance to post an ERA of 3.50 or higher, and a 25 percent chance to post an ERA lower than 3.50. Higher percentiles indicate more favorable outcomes.
The Percentile Forecast is calibrated off two key statistics: TAv for hitters, and ERA for pitchers (although the ERA is a component ERA, and thus will not reflect the variance of sequencing in a player's performance).
PECOTA runs a series of regressions within the set of comparable data in order to estimate how changes in peripheral statistics are related to changes in equivalent runs. For example, if it first estimates that Carl Crawford will produce a .290 TAv next year, it then tries to determine what home run total, walk total, and so on are most likely to be associated with a .290 TAv season.
PECOTA then iterates this result to ensure that the peripheral statistics 'add up' to the right calibrating statistic (TAv or ERA). It is important to note that the Percentile Forecast is designed to work around the calibrating statistic only.
A player's forecast is adjusted to the park and league context associated with the team listed at the top of the forecast page. Team dependant stats like Wins, RBIs, and BABIP account for the projected performance level of a player's teammates
PECOTA forecasts playing time (plate appearances) in addition to a player's rate statistics. These forecasts are based on a player's previous record of performance, and the comparable player data, and do not incorporate any additional information about managerial decisions.
True Average incorporates aspects that other linear weightsbased metrics ignore. Reaching base on an error and situational hitting are included; meanwhile, strikeouts and bunts are treated as slightly more and less damaging outs than normal. The baseline for an average player is not meant to portray what a typical player has done, but rather what a typical player would do if given similar opportunities. That means adjustments made for parks and league quality. True Average's adjustments go beyond applying a blanket modifierplayers who play more home games than road games will see that reflected in their adjustments. Unlike its predecessor, Equivalent Average, True Average does not consider baserunning or basestealing.
Here is an example of the True Average spectrum based upon the 20092011 seasons:
Excellent  Miguel Cabrera .342
Great  Alex Rodriguez .300
Average  Austin Jackson .260
Poor  Ronny Cedeno .228
Horrendous  Brandon Wood .192

See: http://www.baseballprospectus.com/article.php?articleid=11717
0.9 (from the article) is no longer a stationary number, but a scale based on current season runs. It's all the way up to almost 1.07 now, due to run scoring being so much lower than when Colin wrote this (from the link above):
From 1993 to 2009, you can figure TAv simply as:
0.260 + (RAA/PA)*.9
Now, we will be tuning those values slightly to match the batting average for that season, but other than that, thatâ€™s the formula for TAv we will be using once the new stat reports are rolled out.
[...]
All that matters essentially is the computation of the initial R/PA values. When people ask about wOBA, most of the time what they really care about is the values presented on Fangraphs, derived from this set of linear weights developed by Tom Tango.
UPSIDE is determined by evaluating the performance of a player's top20 PECOTA comparables. If a comparable player turned in a performance better than league average, including both his batting and fielding performance, then his wins above average (WARP minus replacement value) are counted toward his UPSIDE. A base of two times wins above average is used for position players, and an adjustment is made to pitcher values such that they are comparable. If the player was worse than average in a given season, or he dropped out of the database, the performance is counted as zero.
Because it is far easier for a player like Ugueto to improve upon his production by 20 percent than it is for Alex Rodriguez; as a result, Ugueto's Breakout score is likely to be higher. This does not mean that Ugueto is a player you'd want anywhere near your roster.
Value Over Replacement Player. The number of runs contributed beyond what a replacementlevel player at the same position would contribute if given the same percentage of team plate appearances. VORP scores do not consider the quality of a player's defense.
Here is an example of the Value Over Replacement Player spectrum based on the 2011 season:
Excellent  Matt Kemp 95.2
Great  Robinson Cano 51.4
Average  Eric Hosmer 19.9
Poor  Derrek Lee 3.2
Horrendous  Adam Dunn 22.6
VORP for position players consists of batting runs above average (BRAA), position adjustment (POS_ADJ), baserunning runs above average (BRR  which includes  but is not limited to  stolen bases and times caught stealing ), and an adjustment for replacement level (REP_ADJ).
Perhaps no sabermetric theory is more abstract than that of the replacementlevel player. Essentially, replacementlevel players are of a caliber so low that they are always available in the minor leagues because the players are well below majorleague average. Prospectus' definition of replacement level contends that a team full of such players would win a little over 50 games. This is a notable increase in replacement level from previous editions of Wins Above Replacement Player.
Here is an example of the Wins Above Replacement Player spectrum based on the 2011 season:
Excellent  Jose Bautista 10.3
Great  Hunter Pence 5.2
Average Gaby Sanchez 2.0
Poor  Adam Lind 0.5
Horrendous  Adam Dunn 1.7
The Weighted Mean forecast incorporates all of the player's potential outcomes into a single average, weighted baed on projected playing time. In almost all cases, poor performances are associated with a reduced number of plate appearances. For that reason, they don't hurt a player's team quite as much as good performances help it; the weighting is designed to compensate for this effect (see also Jeremy Giambi Effect).
EXCEPTION: a player's projected PLAYING TIME (and therefore, his counting statistics that are incumbent on his playing time) is taken based on the median of his comparables' performance, rather than the weighted mean. This is designed to mitigate the influence of catastrophic injuries, which are better represented by Attrition Rate.
This exception does NOT affect a player's WARP and VORP forecast, which are calculated per the weighted mean method, treating players who dropped out of the database as having zero WARP/VORP.
