Notice: Trying to get property 'display_name' of non-object in /var/www/html/wp-content/plugins/wordpress-seo/src/generators/schema/article.php on line 52
keyboard_arrow_uptop

My grandfather used to say that in heaven, everyone was 25. He figured that was the perfect age in life. You’re old enough that you’re not a kid any more, but young enough to enjoy everything. Grandpa lived to age 93, and more than six years later, I still miss the guy. This one’s for you, Grandpa.

So what’s the perfect age to be if you’re a baseball player? For a while now, there’s been a small brouhaha going over those who say that the peak age is 27 and those who hold out for age 29. Now that I’m past both of those landmarks myself, it doesn’t seem like that big a difference, but in a profession where a player might play six years if he’s good, knowing which of those six years will be his best is vital to a team.

The problem with doing this sort of work is that baseball is a logistical nightmare in terms of doing well-controlled research. Players are not selected at random (like I used to teach my research methodology students) and there is a severe bias in who gets to play and who doesn’t. Indeed, we have an entire genre of radio which exists for people to call in and complain when a manager plays the not-so-good guys. Still, the joy of doing research … and yes, there is joy in doing research … is being able to crack some of these issues, despite the fact that they drive you nuts.

As someone who has dealt with more children than I care to mention (and that was before I became a dad), this question of “peak age” struck me as a development question, the same way that I’m often asked questions about whether Junior (no, not Ken Griffey Jr.) is on target with his developmental milestones. But, I wasn’t comfortable with one of the hidden assumptions, one built very deeply into how our culture perceives development, which people tend to make in this line of research. We assume that players develop in a gradual and relatively uniform manner consistent with their age. It’s a one-size-fits-all approach that’s reflected in the other major developmental measurement that’s a common feature in our society, schools.

Kids who are 12 belong in sixth grade and are all roughly at the same point in life, right? Maybe not. Kids develop in different ways and at different rates. Go to any sixth grade classroom, and you’ll see that the idea of uniform development is preposterous. Kids hit puberty at different ages; girls hit puberty before boys, and it’s all on display for you right there in your average sixth grade homeroom. Sure, in the aggregate, kids at 12 years old are “middle school” material. But what about this individual kid?

In education and in child development more generally, if kids aren’t learning or developing as quickly as we’d like them to, it’s not legal to just remove them from the population. But in baseball, that’s exactly what happens. Players who develop quickly are politely invited to be part of the team. Players who don’t develop so well are simply sent packing.

So I propose that we first look at this question of peak age from the other direction. When do players generally become just good enough to become regular players in MLB and when do they stop being good enough? I took all players who started their careers after 1980 and ended their careers before 2009. Only seasons in which the player had at least 200 plate appearances counted. It left a sample just shy of 1,000 batters (997 to be exact.) It’s not a surprise that most players debut some time between their age-23 and age-27 seasons (using April 1 age). What surprised me was the distribution for when players left the game. Take a look:


Chart 1

There’s a spike at age 27 for players leaving the game, but then after that, the rate of attrition falls for a few years and then spikes again at age 31. Odd.

I took a look at when players left the game as a function of what age they debuted at. I wanted to make sure that these twin spikes were some sort of selection artifact based on debut age. Looking individually at every debut age group, there was a similar pattern. Generally, there were attrition spikes around age 27 or 28, and then again around 31 or 32. And then there was one other spike I noted. The most dangerous year for attrition for a batter is his first year. For example, more than 30 percent of players who debut (i.e. have their first 200-plate appearance season) at the age of 26 don’t have another season in which they get regular reps. So, there seem to be three major winnowing periods in baseball for batters. The first year, age 27, and age 31.

Not shockingly, players who made their debut younger tended to be the guys who stuck around longest and were most likely to clear those three hurdles. They also tended to be better players. Indeed, a quick stroll through the “survival rates” for each of the groups in the study is enlightening. You can read the chart below as “of all of the players who debuted at age 24, X percent of them survived (had another season of 200-plus plate appearances) past their first year, X percent survived past age 27, and X percent survived past age 31.”


                    Survived Past
           -------------------------------
Debut Age  First Year   Age 27      Age 31
  0-22      93.8%        72.5%       47.8%
   23       88.1%        64.9%       39.3%
   24       81.7%        64.0%       35.5%
   25       82.9%        61.7%       33.6%
   26       69.1%        55.5%       29.2%
   27       72.0%        72.0%       33.3%
   28       67.3%         --         33.7%

The fact that the players survived past these hurdles says something about their relative quality. Teams do not hang on to 30-year-olds with no skills. But, do members of these groups peak at different times? To find that out, I went into the statistical toolbox and pulled out one of my favorites. Remember, if you don’t like statistical gore, just say “and then a miracle happened” and skip to “the results.”

Warning! Gory Methodological Details!

I used a mixed linear model, with one fixed factor: age. I also used an AR(1) covariance matrix (auto-regressive, first order). This type of covariance matrix comes in very handy in this type of research, because it specifically corrects for the fact that we have several repeated observations for the same player. This is important because there are some players who are in the sample at age 27, but not age 28 (because they “retired,”) The covariance matrix sniffs out the fact that the group still present at 28 was better at age 27 than the retirees and corrects for it when spitting out the relevant output.

My dependent variable was OPS. (Yes, I know I didn’t use your favorite No.1 measure for a player; fire when ready.) The output that comes out the other end can be read “if you took an average player from the sample, and only told me his age, I would expect that his OPS for the year would be X.” Of course, we’d know more than just a player’s age, but the point is to come to some sort of aggregate conclusion.

The Results

I split the players in my dataset up, again by debut age and by what the last talent-age “hurdle” (first year, 27, 31) they cleared was. So, we may have a player who debuted at 24, and made it past 27, but not to 31. If he didn’t clear the “first year” hurdle, then he only played one year, which is by definition, his best (and worst) year. I found the year in which the model had the predicted OPS as highest. The numbers here are peak ages.


              Last Hurdle Cleared
          --------------------------
          First    
Debut Age  Year    Age 27    Age 31
 0-22       24       26        31
  23        25       26        30
  24        26       27        31
  25        25       27        28
  26        26       29        29
  27        --       27        29

Players who stay in the league longer have later peaks, roughly around the age of 29 or 30, which is what J.C. Bradbury found using a sample that included players with longer careers (minimum 5,000 plate appearances, which is roughly eight years at 600-ish PA per year). Those players who play only into their early 30s and who comprise the plurality of players in MLB have peaks around age 27. Those who espouse the age-27 model (Mitchel Lichtman being only the most recent) generally use models that are variations on “what’s the most common age to hit the high point?” No wonder they get 27.

Another surprising finding is that good-but-not-great players (those who made it to age 27, but not 31) and who debut later tend to peak later. There’s no such thing as one magical age where all forward motion stops. Some guys are later bloomers, and have the same sort of arc as others … they just do it later in life. Those who stick around for a long time, however show the opposite pattern. In that group, those who debut earlier have later peaks. What to make of this two-trajectory model?

It’s tempting to say that players who come up early on the phenom track are a riskier lot. If they have a long career, they’re likely to have a longer arc of improvement. But, if they have a short to mid-range career, they’ll peak quicker. However, we’ve seen that earlier debut generally heralds a greater chance of a longer career. When they do flame out, it’s generally a bigger fireball, but the chances of a fireball are actually lower. It’s a tradeoff.

The phenom track can be compared to the Brook Jacoby track. (For those who didn’t spend the ’80s in Cleveland, Jacoby was the good, serviceable corner infielder for the Indians who actually made a couple of All-Star teams.) In general, it seems that some players come up in their mid-20s, have a two-three year period where they improve, and then fall back to earth (and out of baseball) by the time they hit age 30. The two-three year period appears to be constant. It’s just a matter of when they bloom. Of course, the problem is that when a player is coming up, we have no way to know which track he will fall on. His debut age does give us some idea, but it’s not a guarantee.

The Conclusion

When Bill James originally took up this question, he suggested that players generally peak earlier than is generally thought, and decline more rapidly than is generally thought. He might have inadvertently been picking up on a wrinkle in how people think about the game. The good players do peak around 29, and those are the players about whom we first think. The great unwashed mass of players peak earlier.

The obvious take-home from this study is that method and sample will affect the answer to the question “at what age does a player peak?” I’d argue that this very fact means that the discussion of the one age for player peaks is actually kinda silly. Even beyond the usual cries that “You have to treat everyone as an individual!”, assigning one number to “peak age” vastly oversimplifies the situation. Sure, if we’re playing a probability game of “given no other information than his age, when can we expect this guy’s peak?”, then 27 is the best guess.

But to a team making a multi-million dollar bet on a free agent, it’s also the type of number that has the illusion of being a lot more informative than it really is. There are some concepts that can be reduced to a simple rule of thumb, and while the rule obscures the details, it’s easier to employ than having to sort through the mess of data. I don’t think this is one of those cases. Player development works in a much more complicated way than is generally thought.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
theduke11
2/15
What are the statistics like for top 10% of the major leaguers? That is, what are the equivalent statistics for the A-rods, Pujols, Jeters, Mauers? I assume they get in early, maybe peak early, and then hold their peak for quite a while. This is the most interesting group for me because these are the guys that get 5-8 year contracts.
pizzacutter
2/16
I don't have that handy. There's also the issue of top-10% when? At the beginning of the contract? At age 27? In a specific year? It's something of a moving target. My guess is that they are best represented by the early debut/long career group.
Richie
2/16
Duke Dude's question is the real one. Who cares when marginal pros peak? What's the rate for the guys who do get offered multiple-year contracts? And according to Bill James, you then better adjust for the now-gone aging-delaying aspects of the 'roidster era. Tho' your BP colleagues would burn you at the stake for that.
pizzacutter
2/16
Mmmmmm, steak...

Actually, I think we have a lot to learn from studying the margins. People are fascinated by the exceptional cases, but there's so much more to be learned by figuring out why it is that some players just don't make it.
jdtk99
2/16
Do you think the peak would be flatter if you considered a player's total value, ie hitting + fielding + baserunning?
pizzacutter
2/16
It's possible. There are a small number of guys who played to age 35 with no discernible offensive talent, but who could field at short like a dream. Clearly, the team figured there was some value in keeping them around.
georgeforeman03
2/16
Great article.

Technical question: did you experiment with any other covariance matrices before settling on AR1? Particularly, if we believe that players have performances that vary around some "true" talent level (a nontrivial assumption), a compound symmetric model might be more appropriate.

Just curious. And once again, I really liked this.
pizzacutter
2/16
Matt, no I didn't experiment with covariance matrices. I'm not as familiar with my covariance structures as I could be. I've heard of compound symmetric before, but never had the time to fully study it. Can you give me (and the rest of BP reading this) a quick summary? I like AR(1) specifically because I have repeated measures and I know that the year-to-year measures are going to be correlated in some ways.
georgeforeman03
2/16
Both the auto-regressive-1 (AR(1)) and compound symmetric (CS) correlation structures assume that a batters performance for a given year is related to his performance from other years. The difference occurs when we look at how this correlation changes when we "get further away" in time. AR(1) assumes that these correlations will decay over time. Observations from year (n) will be more closely related to those from year (n-1) and (n+1) than years (n-2) and (n+2), etc. Compound symmetric assumes that this correlation is constant over time.

So if you think players have underlying skill levels that change substantially over time, and AR(1) structure is pretty reasonable. If you think players have a basically stable underlying skill level and random variability of outcome is the main cause for changes in year-to-year performance, a CS structure might be better. There are lots of other options for covariance structures, some more restrictive than others. For example, an "unspecified" structure lets you fit unique correlations for any year-to-year differences (corr(n,n-1)=.2, corr(n,n-2)=.5, corr(n,n-3)=.1, etc.). This can be useful if you have a large sample size and can afford to estimate a lot of different parameters. (In contrast, the main of AR(1) is that it requires you to estimate only one parameter.)

One other thing to consider is estimating unique covariance matrices for different categories of player. It's possible that, say, catchers have weaker (or stronger) year-to-year correlations than, say OFs. Allowing for heterogeneity in your covariance matrices could allow you to pick up on this.
nosybrian
2/16
@Russell: this is an outstandingly compelling analysis. The only thing that I'm still not quite sure you've explained is whether the twin peaks is necessarily the product of a heterogeneous sample, or whether there is some external factor that induces the peaks.

One possibility is the contract, free agency cycle -- or the tendency to write contracts in certain standard units. A high performance player may get a new multiyear (4-5 year) contract. A "survivor" may get a 1 or 2 year contract renewal. And so the outstanding player who has a strong quick start is, in effect, "guaranteed" a multiyear renewal.

Another way to look at this also involves the idea of heterogeneity. In mortality models, we imagine an underlying (unmeasured) trait that distinguishes people by their innate "frailty." Those who are "frail" are going to get beaten down by events -- influenza or other diseases or injuries -- while those who are "strong" will survive these threats. To link my previous paragraph with this one, I would propose that the frail (injury prone, game missers) also are the ones who get one-year renewal contracts while the strong get multi-year contracts. There are some frailty models in actuarial and demographic (mortality) research that might be applicable here.
dianagramr
2/16
I have a feeling, based upon my reading of other similar studies, that players in the tougher defensive spectrum positions would tend to have shorter careers (wear and tear of turning the DP, squatting behind the plate, the CF diving for the flare over the 2Bs head).

I also wonder what impact the 6th year of a career has on long-term career possibilities, as it is after this year that a player gets his 1st crack at free agency, and long-term $ committment (excepting those arb-eligible cases who have their arb and maybe some FA years "bought out" early in their career, like Longoria). Would Gary Matthews Jr. still be in the majors if he hadn't had that career year in his FA-to-be season?
Tom9418
2/16
How did guys who flamed their first year get a higher peak age than their debut age?
pizzacutter
2/16
Those are guys who "survived" their first year, but not to age 27. So they played beyond their debut season.
Tom9418
2/16
Isn't the problem still that if we are trying to predict peak for a player, we don't know if he's a guy who will flame out at 35 or 27? So, assuming we're not using hindsight, 27 is still correct?
pizzacutter
2/16
Predicting peak is a whole 'nother series of articles. Clearly, we don't have a crystal ball to see when his career will end. I think this work gives us direction (better players seem to peak later, and some guys may be late bloomers.)

If I hope to get any point across in this article, it's that this mantra of "peak at 27" is over-played. If you absolutely forced me at gunpoint to say one number, 27 is the best number, but it vastly over-simplifies things. Human development isn't that linear or precise. It's messy, and I think that the bulk of the work is in getting in and cleaning up the mess.
morenobasspro
2/17
I totally agree with that. People here and elsewhere are guilty of saying something like, "Well, he sucked at 26, but next year he should peak." It gets too easy to apply the rule of thumb. We all do it. It's easy to just take a young career and draw it as the start of a nice parabola. But that's what gets you things like Jackie Rexrode and wishcasting that Josh Willingham could be "Bondsian."
MHaywood1025
2/16
If you are a solid prospect on a weak MLB team and you get the call-up early, could this possibly throw off the numbers for players who start at age 22? 23? 24?

Or is it not enough of an effect to challenge the assumption that only the best of the best start at an early age?
FlagrantFan
2/16
Excellent article. One thing that would be interesting to study is if any variable exists for players who go through the entire arbitration/one-year contract cycle versus those who get tied up for a few years during their arbitration eligible years. Some teams approach those years differently and it would be interesting to see if it creates a difference.
metty5
2/16
Russell, this was a great read. I would love to see someone with an baseball economic back ground try to explain your findings.To me (and I have no economic back ground), it appears that these three mile stones signify today's baseball economic system. I think someone could make a very interesting flow chart from your findings.

When a player gets called up he is either a prospect or filling in for an injured veteran. From the big league club's perspective they are asking themselves, "Can this player hack it in the major leagues." If yes,we'll keep giving you PA. If no, pack your bags. Once a player has established his PA are worth the league minimum then next question for the big club is, "should we offer this player arbitration or non-tender him?" For most of these players (it appears from your findings),even during arbitration years players will continue to be good value for their team. Once a player reaches the free agent market (I think 27 is a good age for that but have no evidence) the same question arises. Is paying this player x$/per win during his supposed prime an upgrade over my 21-25 year old cost controlled player? If he is worth it, he'll have a job, if he isn't he wont. The last question obviously combines the first two. Is this player who is past his peak, more valuable than a 21-23 year cost controlled player or a free agent who should be entering his "prime"?

I'm no economist, but I'd love to see how economics play into player peak studies.

pizzacutter
2/16
I'll call Matt Swartz.
swartzm
2/17
Hi! Okay, firstly I will say I loved this article.

Next, I thought about exactly what JDSussman mentions, but I don't think it's an issue. Ultimately, when a player is non-tendered, they are not banned from baseball. They can receive their market rate. If they can receive their market rate, that means that if they are approximately a quad-A player, they can get a minor league deal and still will get a chance to play if they can sneak above replacement level. For players who reach six years of service time, they will face a similar situation, where once they are a free agent they can get paid accordingly to their quality, and if they are above replacement level, they can play.

What is possible is that the investment of playing a player below replacement level in hopes that he will learn something starts to become less and less valuable as he gets closer to six years service time. That could be playing a role, but probably not a very large one like the one we see in the graph above. There could also be team bias factoring in somehow where the team that drafted the player is the only one who thinks he can perform above replacement level, but I don't know if that's much of an issue. It certainly is less of an issue with the latest CBA allowing teams to sign their own free agents.

Basically, I really doubt it's a big issue, even if I think it's possible I'm missing something. I guess the most obvious thing to check is if there are a pair of modes for number of years of service time.
brucegilsen
2/21
"What is possible is that the investment of playing a player below replacement level in hopes that he will learn something starts to become less and less valuable as he gets closer to six years service time."

When I was reading the article, I thought of the rookie performance of players like Greg Maddux, Tom Glavine. Clearly they stayed in the major leagues because teams saw the development potential.
nosybrian
2/16
This is sort of what I had in mind in a previous comment. In my own line of work, I see people expiring after 3 years, then after 7 years, because of the way probationary contracts and long-term contracts are written. And this system plays into when there are peaks or valleys in attrition rates. (To be honest, it's the way academic appointments typically go, with a probationary 3 year appointment, then consideration for tenure at 6-7 years. Those tend to be attrition points. Fortunately for baseball, there is no such thing as a tenure system! Still it's amazing sometimes to watch some players hang on for many years past peak productivity -- both in academia and baseball.)
macolyte
2/16
I think this is a fabulously insightful comment, JD. It wouldn't surprise me at all if industry economics were driving this issue. Easy to get caught up in data points - and Russell's analysis is both convincing and compelling, but just as easy to forget the 'shadow forces' which bring a real, but unintuitive influence to bear.
jwferg
2/16
I enjoyed the article and analysis very much, but I couldn't help looking at the first graph and seeing just a regular curve with a bit of anomaly. Are the peaks statistically significant? The 'valley' between them is not very deep. Perhaps applying the same criteria to major leaguers from 1950-1979 would confirm (or not) the shape of the initial curve.
irablum
2/17
I still don't get why this issue is limited to major leaguers. it seems by only studying them, you are a) artificially limiting your sample size and b) dealing with selection bias. and c) not utilizing your Davenport Translations to the fullest.

It seems to make much more sense to start with draft age and then plot each players translated stats from that point. With that, you don't have to worry about selection bias and you don't have to worry about guys who simply fall off the map. By using translated minor league stats, a guy who gets demoted to AAA isn't gone at all.

And you defeat most of your selection bias since there are guys who hang around forever in the minors because they aren't good enough to make the majors and are too stubborn to retire.
pizzacutter
2/17
Gasp! You've uncovered my kryptonite. I don't have a good minor league database on hand. But you're right. It is limiting the sample. I'm willing to live with that limitation for now.
ScottBehson
2/17
Another great article, Russell. Excuse my snark, but could we have one article without you referencing your academic credentials/teaching? We profs get a bad enough reputation for self-puffery as it is (I have a colleague who lists 11 letters after his name!!!!)
pizzacutter
2/17
I have recent Ph.D. syndrome. I just spent 7 years jumping through hoops to get those letters and I'm going to enjoy them!
nosybrian
2/17
Don't apologize too quickly! One hallmark of this paper is the high quality of the writing -- for a technical paper it is superbly well written. You are wearing your higher education well, Russell.
swartzm
2/17
You're entitled, Dr. Carleton ;-)