Willie Mays famously started his career 0-for-12 before hitting a home run off of Warren Spahn. This season, Orioles über-stud catcher Matt Wieters has struggled to live up to expectations, posting a feeble .264/.310/.368 line since being called up in May. Talented rookies such as these present a twofold challenge to their teams: first, how to identify when they’re ready for promotion, and second, how to react when they fail to produce. These decisions can be driven by subjective considerations, such as a scout or manager’s evaluation of the player’s poise and confidence. Such things are certainly important, but it’s worth investigating what a purely objective mechanism for making these decisions might look like.
So, today we’ll try to answer the first question: How do you decide when a prospect’s ready? Let’s consider the common scenario in which a rookie player is competing with a veteran for the vet’s job. The veteran’s productivity is typically well established, while the rookie’s productivity is not known as precisely. Thus, we’re faced with a choice between a so-called sure thing, and an unknown but possibly superior alternative.
In the field of statistical decision theory, such choices are known as “multi-armed bandit” problems. They are so named because of an analogy to a slot machine with multiple levers, each of which has a different payoff rate. The case we’re examining can be modeled as the relatively simple case of a machine with two levers, one of which has a known payoff rate. In order to construct the model, we’ll need two values: the veteran’s productivity (the payoff rate of the known “arm”) and a probability distribution on the rookie’s productivity (the payoff rate of the unknown “arm”). To measure productivity, we’ll use on-base percentage. Obviously, OBP is not a perfect measure of productivity; it does not consider defense, power, or baserunning skill. As a measure of offensive performance, though, it is pretty good, correlating with run-scoring to the tune of .91 (per some 2006 research by Dan Fox). Furthermore, the fact that it measures a binary outcome-either a player reaches base, or he doesn’t-is extremely convenient for modeling purposes, as we’ll see shortly.
Since the veteran’s productivity is well established, we’ll quantify it as his aggregate OBP over the previous three seasons. We’ll assign the rookie an OBP distribution (a beta, for my fellow stats geeks out there) based on PECOTA‘s projections. Now we must compare two expected values: that of starting the veteran all season and that of provisionally starting the rookie. I say “provisionally” because the team always has the option of substituting the veteran if the rookie doesn’t perform well.
Assuming 600 PA in a season, we can compute the expected performance of the veteran by multiplying his aggregate OBP by 600; this will give us the number of successes (i.e., non-outs) that he should contribute. Computing the expected value of provisionally starting the rookie is much more complicated, since his performance is variable and a substitution can occur at any time. I wrote an algorithm that accomplishes this by starting with the final plate appearance and iterating backward, considering all possible outcomes (in terms of successes and failures; this is why OBP is a convenient number). At each juncture, the algorithm chooses whether or not to replace the rookie based on which player offers a higher total expected value.
Given the OBP of the veteran and the mean projected OBP of the rookie, the algorithm will determine whether or not the rookie should be given a shot. As it turns out, the algorithm recommends starting all but the very worst rookies who have very good replacements. Six-hundred PA is a lot of playing time, so the cost of possible early failures by the rookie is small relative to the long-term gain to be had if the rookie turns out to be highly productive. To get a better idea of how profitable starting the rookie can be, we can ask how good his veteran replacement would need to be in order for playing the veteran to yield the same expected value as giving the rookie a shot. If the veteran’s OBP is greater than this “break-even” value, then the rookie should not start; if it is less, then the kid definitely should be in the lineup.
This has all been rather abstract, so let’s examine some specific cases from this year to see the algorithm in action. The rookies we’ll look at are Wieters and someone who is perhaps a more typical prospect, Marlins center fielder Cameron Maybin. Below are their pre-season PECOTA projected OBP percentiles, along with the OBP of their initial likely veteran replacements and the “break-even” OBP produced by the algorithm:
Vet's Break-even Rookie Mean 90th 10th Veteran OBP (2006/2007/2008) OBP Wieters .392 .432 .356 Zaun .348 (.363/.341/.340) .415 Maybin .345 .388 .311 Amezaga .323 (.332/.324/.312) .370
This information is perhaps better presented visually. In the following diagrams, the rookie’s mean projected OBP is in blue, the veteran’s aggregate OBP is in red, and the break-even OBP is in green:


We can see that, based on expected OBP alone, both rookies should have started over their veteran counterparts, and it’s not even close. The Orioles would have needed a catcher who could be expected to post a .415 OBP in order to justify keeping Wieters in the minors; Zaun is adequate at the plate, but he’s nowhere near that level. Likewise, Amezaga’s aggregate OBP is well below the .370 needed to deny Maybin a shot. In point of fact, Maybin was the Marlins’ Opening Day center fielder, while Wieters was kept in the minors until the end of May; the latter decision was almost certainly motivated by service-time concerns, however.
This analysis is rather involved; fortunately, there appears to be a trend that leads to a simple rule of thumb. Notice that these break-even values are roughly .020 higher than the rookie’s mean projected OBP. I performed this analysis with several other rookies, and found this to be the case generally. Thus, it appears that a rookie should start unless a veteran can be expected to post an OBP at least .020 higher than the rookie’s mean projection.
Of course, this rule is not meant to be definitive. The analysis on which it is based considers only OBP, ignoring power, defense, and other relevant factors; in addition, the assumption that the veteran’s OBP is fixed, though reasonable, is clearly false. Rather, the rule should be seen as a starting point to be supplemented with other information. It also serves to highlight just how much a team stands to gain by giving a promising rookie a chance. Even if there’s only a small chance that a rookie will be more productive than an established player, it is usually worth investing a few games of baseball’s long season to find out if this is the case. After all, the rookie can always be benched or sent down to the minors if the experiment doesn’t work out. In the next part of this diptych, I’ll attempt to determine just how badly the rookie needs to perform in order to justify such a decision.
Dan Malkiel is an intern for Baseball Prospectus.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
I'd guess you'd have to compare Elvis' OBP to Blalock's.... which has worked out with Elvis' .340 to Blalock's .278
but the real gain here was on defense, which was improved at 2 positions. Elvis was much better than Mike Young at shortstop, and Young was better than Blalock at third. You can go even further because Blalock wasn't the only one to man third in 2008. You had Ramon Vazquez (ugh), German Duran (help), Chris Davis (shoot me), and Travis Metcalf (gack) all log 10+ games at third. Also because of that, 5 players manned first base (Davis, Blalock, Chris Shelton, Frank Catalanotto, and Ben Broussard).
In contrast, the 2009 infield has been relatively set, with Elvis at short, Young at third, Kinsler at second and Davis (or Blalock) at first. With Omar backing up at second, third, and short.
The Wieters example should show the failing in your methodology...the break-even OBP is only that high because you continually expect Wieters to have his PECOTA distribution despite the mounting evidence that shows that the forecast is wrong. As of today, Zaun's .319 would be marginally better than Wieters .310, yet your analysis would still expect Wieters to mash going forward and would not take into account the .310.
I think you'd have a better analysis if you used Wieters expected curve and adjusted it as time went forward. Once it was pretty clear that Wieters wasn't going to have a .400 OBP the difference between him and Zaun should have looked much smaller.
Even the .370 OBP needed by Amezaga to avoid being replaced by Maybin is wonky. A .370 OBP would have made Amezaga the 6th best center fielder in baseball when Maybin's 90th percentile is only .388. That seems nonsensical as well.
I think because the veteran is modelled as a single probability, the upper tail of the distribution for the rookie is weighting everything in favor of the rookie. In the given model the veteran established player has no chance of having an amazing career year but the rookie does. So of course the rookie should get a chance at having that year, especially when if it goes south he can just get replaced by the veterans solid OBP. Which is the point you were trying to investigate, I know, but I think the way the model is set up its falling out of the simplifications of the model and not the data per se. That being said I think you will still see a shift in favor of the rookie starting, but I think the type of model you chose is magnifying the difference between what-could-be with the rookie and the veteran.
Finally, the shape of the beta distribution seems not right to me. The beta distribution looks almost normal out there and it seems intuitively that you would expect it to be left shifted for a prospect having to adjust to the major leagues.
Fun idea! Looking forward to seeing part 2 and seeing if it gets tweaked.
You're right that it's incorrect to model the veteran as a single probability. However, the problem is MUCH more difficult to solve without this assumption, which is not entirely unreasonable for a stable veteran. Besides, if one were to give the vet a probability distribution, it would necessarily be quite narrow, and so the updating (which I'll discuss in part 2) would have little effect.
The beta is tailored to match PECOTA's projections. In case you're wondering, Wieter's distribution is a beta(107, 166) and Maybin's is a beta(88, 166).
Also in the model, how long is the expected time to establish a level of performance? I would image it has to be at least a few weeks before you bench/send down a prospect?
In your comments though, you mention that this would be much more difficult to do if the veteran was not modeled as a known parameter. This isn't really true though. It's been shown in the statistical literature that the multi-armed bandit problem can be reduced to the single-armed bandit, which is actually the case you're considering. You just end up getting a 'cut-off' point for each of the players, and the team would (in theory) want to choose whichever one was higher. As we would surmise, higher means and greater variance will both increase this cut-off.