Notice: Trying to get property 'display_name' of non-object in /var/www/html/wp-content/plugins/wordpress-seo/src/generators/schema/article.php on line 52
keyboard_arrow_uptop

I got BP rolling many years ago in large part because of a forecasting system I had created called Vladimir. Vlad was basically a two-step system. The first step was categorization: What type of player is this? What is the shape of his performance? Is he a slow masher? A waterbug? A power-and-speed guy? How old is he? The second step was a neural-net system, which basically "walked" the player in question down their expected career path. I used Clay Davenport's DTs as the inputs for the system, because it helped me out in terms of removing park and league effects.

There are a lot of problems with a system like this.

First, you have to categorize different types of players. I messed around with SPSS (a statistical package I had) a lot, and assessed a lot of things like rates of singles, doubles, triples, stolen bases, missed games, walks, etc. You don't just look at those things individually, or during a single season. You track effects across different performance metrics, and across adjacent seasons. After a lot of equivocating, I basically decided that my best option was to create 26 different career paths, and use those as sort of the baseline for figuring out what a guy's development would be like.

Which, of course, leads me to the second problem. The really interesting guys, the guys whose likely development you'd like to know more about, are the guys who are atypical; the guy whose stat line is different than any you've seen. How do you forecast Rickey Henderson at age 20? Alex Rodriguez at age 19? Tony Phillips at age 35?

I've started to work–in fits and starts–on a new forecasting system. It's very difficult to do well.

One of the best features about the old Vlad system came about because of an error. When I asked it to try to forecast playing time, it kept trying to give crappy players zero or negative plate appearances, and kept trying to find a way to get players like Barry Bonds 1,200-1,500 at bats. While this might be great if you're a Giants fan, it pretty much sucks if you're trying to use your forecasting tool to predict team outcomes.

But before I get down to the hard, cranky numbers that I no longer write about very often, I need to figure out exactly what a forecasting system should try to do. I don't want to just minimize the root mean square error (RMSE) of the BA, OBP, SLG and SB/CS for the players I'm forecasting. There are a lot of good systems that can do a reasonable job of that. Eyeball the previous three years' stats yourself, check the player's age and make a guess, and you'd be surprised how well you can do.

I want to be able to identify those guys who are going to improve, decline or fail to develop more than could be expected by a reasonable person eyeballing the stats, or a simple trend algorithm based on three to five years of performance and age data.

So I've started to work on it backwards, with a qualitative bent. I took a few minutes and thought about a few guys who haven't developed as well as I expected them to, or suffered a more rapid or noteworthy decline. Let's take a look at some of these players:

Delino DeShields

Season Age Tm G AB R H 2B 3B HR RBI BB K SB CS BA OBP SLG
1990 21 Mon 129 499 69 144 28 6 4 45 66 96 42 22 .289 .375 .393
1991 22 Mon 151 563 83 134 15 4 10 51 95 151 56 23 .238 .347 .332
1992 23 Mon 135 530 82 155 19 8 7 56 54 108 46 15 .292 .359 .398
1993 24 Mon 123 481 75 142 17 7 2 29 72 64 43 10 .295 .389 .372
1994 25 LA 89 320 51 80 11 3 2 33 54 53 27 7 .250 .357 .322
1995 26 LA 127 425 66 109 18 3 8 37 63 83 39 14 .256 .353 .369
1996 27 LA 154 581 75 130 12 8 5 41 53 124 48 11 .224 .288 .298
1997 28 StL 150 572 92 169 26 14 11 58 55 72 55 14 .295 .357 .448
1998 29 StL 117 420 74 122 21 8 7 44 56 61 26 10 .290 .371 .429
1999 30 Bal 96 330 46 87 11 2 6 34 37 52 11 8 .264 .339 .364
2000 31 Bal 151 561 84 166 43 5 10 86 69 82 37 10 .296 .369 .444
2001 32 126 351 55 82 17 5 5 37 59 77 23 2 .234 .344 .353
2002 33 ChC 62 143 19 28 6 1 3 10 20 37 10 1 .196 .293 .315

Let's focus on the first few years of DeShields' career. He demonstrated he could hit for average, had a year with a fair number of doubles, showed lots of speed–albeit not used optimally–and a batting eye in the neighborhood of 80-100 walks per season. In the 1990 and 1991 seasons, that's even a pretty reasonable amount of pop, especially for someone in the majors at age 21. It looked like a truly superlative start. Then there's the failure to develop, and the broad decline across all aspects of offense.

Todd Walker

Season Age Tm G AB R H 2B 3B HR RBI BB K SB CS BA OBP SLG OPS
1996 23 Min 25 82 8 21 6 0 0 6 4 13 2 0 .256 .281 .329 .610
1997 24 Min 52 156 15 37 7 1 3 16 11 30 7 0 .237 .288 .353 .641
1998 25 Min 143 528 85 167 41 3 12 62 47 65 19 7 .316 .372 .473 .845
1999 26 Min 143 531 62 148 37 4 6 46 52 83 18 10 .279 .343 .397 .740
2000 27 80 248 42 72 11 4 9 44 27 29 7 1 .290 .355 .476 .831
2001 28 151 551 93 163 35 2 17 75 51 82 1 8 .296 .355 .459 .814
2002 29 Cin 101 386 49 111 26 1 6 40 30 51 3 4 .288 .340 .407 .747

Here's another second baseman, and that makes sense. One of the factors I'm going to have better data on this time, in terms of creating the model, is my hunch that guys who play positions where they occasionally get ground into hamburger, and get a lot of nicks and cuts, will have a rough development or decline slope.

Again, we see someone who showed a lot of promise as a young player, with demonstrated abilities to hit for average and power. Walker's strikeout rate was pretty low, and if memory serves, his platoon splits were pretty much Tony Gwynn against righties and Stevie Wonder against lefties.

Could Walker's relationship with Tom Kelly, such as it was, be a major factor in his rather restrained development? Possibly. Is there any way to model that? Not without a lot more data, but those results would be far more interesting and valuable to potential clients. Find a hitting coach that makes a 20-point difference in OPS? Pay him a fortune. Those age-27 and age-28 years have a nice Coors Field bounce.

Jose Cruz Jr.

Season Age Tm G AB R H 2B 3B HR RBI BB K SB CS BA OBP SLG OPS
1997 23 104 395 59 98 19 1 26 68 41 117 7 2 .248 .315 .499 .814
1998 24 Tor 105 352 55 89 14 3 11 42 57 99 11 4 .253 .354 .403 .757
1999 25 Tor 106 349 63 84 19 3 14 45 64 91 14 4 .241 .358 .433 .791
2000 26 Tor 162 603 91 146 32 5 31 76 71 129 15 5 .242 .323 .466 .789
2001 27 Tor 146 577 92 158 38 4 34 88 45 138 32 5 .274 .326 .530 .856
2002 28 Tor 101 380 49 88 21 3 15 57 43 86 6 1 .232 .308 .421 .729

Like DeShields and Walker, Cruz was promising as a youngster. His walk rate was very high, up among the league leaders during his first three years in the league, and then you find him on The Jim Rome Show during the 2001 season making jokes about how he likes being a hacker. While his overall level of performance is pretty flat, Cruz has demonstrated almost all the offensive skills except hitting for average. He's even a very efficient basestealer. The performance curve, overall, shows little development.

Shannon Stewart

Season Age Tm G AB R H 2B 3B HR RBI BB K SB CS BA OBP SLG OPS
1995 21 Tor 12 38 2 8 0 0 0 1 5 5 2 0 .211 .318 .211 .529
1996 22 Tor 7 17 2 3 1 0 0 2 1 4 1 0 .176 .222 .235 .457
1997 23 Tor 44 168 25 48 13 7 0 22 19 24 10 3 .286 .368 .446 .814
1998 24 Tor 144 516 90 144 29 3 12 55 67 77 51 18 .279 .377 .417 .794
1999 25 Tor 145 608 102 185 28 2 11 67 59 83 37 14 .304 .371 .411 .782
2000 26 Tor 136 583 107 186 43 5 21 69 37 79 20 5 .319 .363 .518 .881
2001 27 Tor 155 640 103 202 44 7 12 60 46 72 27 10 .316 .371 .463 .834
2002 28 Tor 87 355 56 105 27 2 5 31 30 36 7 0 .296 .361 .425 .786

I include two Blue Jay outfielders for two reasons: one, Raul Mondesi was traded, and two, several weeks back, Keith Law inflicted a pun on my e-mail inbox which has, to date, gone unavenged.

Stewart's stagnation may be the most frustrating of all. He has significant defensive limitations because of his shoulder, which means he can really only play left field or DH–so he has to hit. His minor-league numbers indicated great plate discipline, both in terms of a high walk rate and a low strikeout rate, including 89 walks in just under 500 at-bats at Double-A Knoxville in 1995. If you have Stewart after his 1998 season, you have to be pretty happy. He looked like Rickey Henderson lite: does everything Rickey did, but not quite as well. That's an awfully good ballplayer. So what happened?

This isn't meant to be definitive. One of the worst things about statistics-based performance evaluation and forecasting is that you're at the mercy of the information. Think about all of the things that are outside the scope of the data collected on player performance in the majors and minors: injury information, coaching information, travel information; the pitchers a guy faced. Just as the run support for pitchers doesn't even out over the course of one season, skewing win totals, neither do these factors, and they're not trivial.

Yes, all this data exists, but it's fairly problematic to get the data, put it all together, then go through the necessary rigor to develop the forecasting models. Finally, if you do put all the data together, you start running into problems in terms of degrees of freedom. In short, if you're trying to slice the data into 900 different categories, you need to have a fair amount of data in each category in order to have any confidence in your model. (Then again, there is a fair amount of artificial turf involved with these guys, isn't there? Hmmmm…something to think about, anyway.)

With these limitations in mind, what's the best approach? Good question. One reason I'm writing this column is because I'd like to receive some wish lists from people. Some people want a range of forecasts, kind of a probability density function, like "Frank Thomas is 75% likely to hit between .284/.362/.448 and .320/.402/.513." That's probably a good approach, perhaps with information on whether or not a player is an outlier in terms of his likely breakout or collapse.

Baseball is exciting and gets into your blood in large part because it's a combination of the highly predictable and highly unpredictable. Over 162 games, you get a much lower chance of a fluke team clawing its way to the top on the back of a few odd bounces or bad calls; there is some semblance of meritocracy here. But you still get to see things like Freddie Patek McGwiring the snot out of the ball, Doug Dascenzo coming in to shut down the opposition for an inning, Jose Canseco turning F9 into HR, or even Brady Anderson hitting 50 bombs.

In many ways, forecasting is my personal windmill. I know it's impossible to do it exceptionally well, but I also believe that putting some parameters around expected performance is crucial to the successful management and operation of a major-league club. We do forecasting implicitly when we comment on whether we like a trade or a free-agent signing; we're making assumptions about the likely future performance of those involved in the deal. The nature of the game–and the universe–makes forecasting a task that is both somewhat unattainable, and even more irresistible because of that. When you learn the answer to a question, you end up with two questions that follow the answer, so you're compelled to keep following the string.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe