Baseball ProGUESTus: A Brief, Incomplete History of Replacement Level

Most of our writers didn't enter the world sporting an @baseballprospectus.com address; with a few exceptions, they started out somewhere else. In an effort to up your reading pleasure while tipping our caps to some of the most illuminating work being done elsewhere on the internet, we'll be yielding the stage once a week to the best and brightest baseball writers, researchers and thinkers from outside of the BP umbrella. If you'd like to nominate a guest contributor (including yourself), please drop us a line.

Brandon blogs at Walk Like a Sabermetrician.

This article does not discuss the various definitions of replacement level, the arguments made for and against its use for various purposes, or any such topic with a practical application. While the title might suggest that it will discuss replacement level through baseball history, it sadly does not do that either. Rather, it attempts to briefly describe the history of replacement level as a sabermetric concept up to the mid-1990s or so, when it came to the forefront of most analytical systems.

This history is incomplete because it will surely leave out some notable uses due to the limitations of my library and internet records. It is primarily focused on Bill James' use of replacement level, since this is fairly easy to track through the annual Abstracts. Whether the concept of replacement level originated with James is something that I cannot answer, but it can be traced back to James at the very latest.

I felt it necessary to write this overview because one will occasionally see references to Keith Woolner as the originator of replacement level. This is simply not the case, which is not in any way intended to be a slight to Woolner. His VORP certainly helped popularize replacement level and did much to increase the emphasis on the replacement baseline in the sabermetric community. It is possible to recognize that while also recognizing the contributions of those who used replacement level pre-VORP.

In his first nationally-published edition (1982), James used a system based on Offensive and Defensive Winning Percentage to rank players. If a player was considered to have played full-time (90 games in the field and 10 games at bat—based on outs made—for the strike-shortened 1981 season), then his ranking was just the Winning Percentage based on the sum of his offensive and defensive wins and losses. However, for part-time players, the missing games were filled in by a .333 W%—a de facto replacement level.

In 1983, James made it more explicit. He first found the player's total win-loss record, then calculated the chance that a .400 player would compile that record. James explained:

I reason:

1) That a replacement-level player is about a .400 player.

2) That the value of a player is measured by the extent to which he has proven that he is not a replacement-level player

3) The smaller the chance that a replacement-level player would post this record, the better the player.

For example, James figured Sixto Lezcano's two-year record to be 17-8, then calculated the probability that a .400 player would win at least 17 out of 25 games using binomial distribution.

In 1984, James used the Toronto team comment to discuss the distribution of talent in baseball. While the article does not mention replacement level explicitly, the message is clear: average players have value, and professional baseball talent is distributed like the far right tail of the normal distribution.

Later, in the player ratings section, James devotes several pages to a discussion of the pros and cons of comprehensive ranking systems (one that has much in common with the debates about competing win-value metrics today). In the course of this discussion, James offers his lengthiest explanation yet of replacement level, which he has now shifted to .350:

Why .350, rather than .500? Because a .500 ballplayer in baseball has value. In measuring value, it's not appropriate to choose a method that considers a 2-1 record better than an 11-11 record, or even a 10-12 record. A pitcher who goes 11 and 11 had presented far better evidence that he is an excellent pitcher than has a 7-2 pitcher.

I use .350 percentage (a year ago I used .400), because that is about where replacement level is. If you go 7-13, then you're replaceable; you have presented no real evidence that you're better than somebody we can get off the scrap heap.

…The .350 chance works against injury-prone players, and in favor of the durable player, a little more forcefully than the .400 chance that I used a year ago.

In the 1985 Abstract, James did not use any sort of replacement level in figuring his player rankings, instead making manual adjustments for playing time as he saw fit. However, the contemporary Historical Baseball Abstract included this discussion of replacement level vis-a-vis Pete Palmer's use of an average baseline in Total Player Rating:

The line against which a player's value should be measured is the replacement level. The replacement level is a very real, very tangible place for a baseball team or a baseball player; drop under it and they release you. But nothing happens to you if you're a little better than average or a litter worse than average; the difference between .490 and .510 is no different than the difference between .520 and .540, or any other similar distance. This is particularly true since baseball talent is not normally distributed, but skewed to the left (in other words, there are more below-average than above-average players).

…No player has negative value for more than about six weeks; once they decide that you're really worse than the guy they could replace you with, they replace you. As long as they figure that you're better than the guy they could replace you with, then you have value.

In 1987, James used replacement level in the manner in which we most see it today: runs and wins above replacement. In an essay on the 1986 MVP race between Roger Clemens and Don Mattingly (and a race he views as parallel, Jim Rice v. Ron Guidry 1978), James writes:

The replacement level can be estimated as being about one run per 27 outs worse than the league average at the position…another way to figure replacement level is to look at the three weakest regulars at the position.

His third step in the pitcher evaluation spells it out clearly: "How many runs was Clemens better than the replacement-level pitcher?" The fourth step translates this into wins—from RAR to WAR, although James did not use the term “wins above replacement” or the acronym “WAR”.

After James, the replacement level banner was picked up by the Big Bad Baseball Annual. While I have only scattered editions of the book (1992 and 1998-2001), they apparently used WAR as the centerpiece of their player rankings from the 1989 edition, and they were certainly using it in 1992. This also appears to be the origin of the "WAR" acronym. Their system used James' Offensive and Defensive Winning Percentages as its start and applied a .350 replacement W% to both components. In effect, then, their replacement level was lower than what is usually used today, and more similar in baseline to the old version of Baseball Prospectus WARP as developed by Clay Davenport.

In 1999, BBBA changed their methodology from James' metrics to the Extrapolated Wins system, but the replacement level remained at .350 for both offense and defense.

In the early days of the internet, the primary hub of sabermetric activity was the rec.sport.baseball newsgroup. While the archives of the group are difficult to search, one can find several discussions about replacement level dating back as early as 1992 (again, this is not to say that there are no earlier discussions, just that I did not find them). The couple of discussions that I found tended more towards the theoretical than the practical—that is, most of the debate focused on the theoretical justification for a replacement level baseline rather than debate about the particular numeric value at which it should be placed.

That is where the contribution of Woolner becomes very notable—offering a specific numerical value to rival James' estimates of a .350 W% or one run below average. Woolner's VORP apparently first appeared on rec.sport.baseball.analysis on October 10, 1995. Woolner explains that he had previously posted about his methodology on a Red Sox mailing list, and that he has changed the name to VORP as it is "catchier" (ironic, considering that VORP would go on to be one of the acronyms most often lampooned by those not disposed toward sabermetrics). VORP was based on Marginal Lineup Value, with the replacement level set at .035 points of BA, OBA, and SLG below the position average.

Woolner published VORP reports online, and later when he joined Baseball Prospectus it was incorporated into BP's toolkit, further increasing its visibility. As such, VORP became the most commonly cited and readily available replacement-baselined metric. This has caused a little bit of confusion about the origin of replacement level as a sabermetric concept, which can be traced to 13 years before VORP appeared in the work of Bill James—at least.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

You need to be logged in to comment. Login or Subscribe

SteveOSama

10/30

Outstanding.

Reply to SteveOSama

juniusworth

10/31

Thanks, Brandon.

Reply to juniusworth

KDynan

I remember reading this post by JC Bradbury a while back and it really made me question the usefulness of replacement level as it is currently constructed: http://www.sabernomics.com/sabernomics/index.php/2010/12/whats-wrong-with-replacement-level-valuing-of-players/

Did James (or Woolner, or anyone else) ever discuss or demonstrate that the distribution of talent is skewed the way replacement level statistics imply?

I was actually just thinking about this the other day. I was wondering what the history of the use of replacement level has been--so thanks for putting this together!

Reply to KDynan

marctacoma

11/01

Check out the comments under that post - BP's Colin Wyers has a great rebuttal, and this gets discussed at the book blog periodically.

Replacement level is crucial in determining value. Comparing a player to league average obliterates the importance of playing time, since a league-average starter with 220 IP would have the same runs above average as a league-average reliever who threw 10 IP.

As to the distribution of talent argument, this seems self-evident. There are more players at or below replacement level than above it. Baseball teams sagely employ those above replacement level to do most of their hitting/pitching. At times, they need a fill-in and they'll give some ABs or innings to replacement level players....but they rationally prefer to give playing time to good players. This makes the distribution of talent look 'normal' but that's because teams don't generally give lots of opportunities to replacement level players. They don't need to.

Reply to marctacoma

Baseball ProGUESTus: A Brief, Incomplete History of Replacement Level

Thank you for reading

Latest Articles

Deep League Landscape ’25: Week Six $

Five & Dive, Episode 458: Pope Leody XIV

Diving into the Pitching Narratives ‘25 $

Box Score Banter: Keep Riding B

MLU: Hess Gets Back to Trucking $

Brandon Heipp

More about:

Latest Articles

Deep League Landscape ’25: Week Six $

Five & Dive, Episode 458: Pope Leody XIV

Diving into the Pitching Narratives ‘25 $

Thank you for reading

Related Articles

Latest Articles

More about:

Latest Articles

Related Articles