Flu-Like Symptoms: Dusting Off a Bill James Postseason Prediction System

October 14, 2016

In his 1984 Bill James Baseball Abstract, the third mass-market Abstract, Bill James introduced what he called “The World Series Prediction System.” Actually, he re-introduced it—the section in the Abstract was entitled “The World Series Prediction System, Revisited.” He’d developed it in 1972 and updated it in a 1982 Inside Sports magazine article that ran shortly before Inside Sports folded.

James’ system, he reported, picked 70 percent of World Series winners. His system was a Franken-stat that combined hitting, pitching, and fielding features, assigning points to various metrics, and selected the team with the most points as the likely winner.

His system was:

Give the team with the better record one point for each half-game difference in won-lost percentage.
Give three points to the team that scored the most runs.
Give 14 points to the team that hit fewer doubles.
Give 12 points to the team that hit more triples.
Give 10 points to the team that hit more home runs.
Give 8 points to the team with the lower batting average.
Give 8 points to the team with fewer errors.
Give 7 points to the team that had more double plays.
Give 7 points to the team that allowed more walks.
Give 19 points to the team that threw more shutouts.
Give 15 points to the team whose ERA was more below the league average.
Give 12 points to the team with most recent postseason experience. (In case of a tie, give the points to the team that had greater success.)
For intraleague series, give 12 points to the team with the better head-to-head record.

I know some of those weights seem screwy, but that’s how the numbers worked out. He looked at every postseason series and checked how often the winning team exhibited certain characteristics. Shutouts got a weight of 19 because, among the series he considered, the team with more shutouts won 19 more times than it lost. The team with the fewer doubles won 14 more times, the team with the lower relative ERA won 15 more times, etc. And there’s an element of intuitive sense; high-average offenses may be dependent on stringing a lot of singles and doubles together, while scoring in the postseason is often long ball-dependent.

As an example of James' system, consider the famous 1969 World Series between the Mets and the Orioles. The Mets had fewer doubles (14 points), more triples (12 points), a lower batting average (8 points), more double plays (7 points), more shutouts (19 points), and allowed more walks (7 points). The Orioles had a better record by nine games (18 points), scored more runs (3 points), had more home runs (10 points), fewer errors (8 points), a lower relative ERA (15 points), and more recent (i.e., ever) postseason experience (12 points). That’s 67 points for New York and 66 for Baltimore. The Mets won the World Series in five games.

Now, there’s a significant limitation to James’ system. His formula was printed in the 1984 Abstract, which means he had data through the 1983 season. That’s only 30 Championship Series to analyze (two each from 1969 through 1983), all in a best-of-five format. (The CS expanded to seven games in 1985.) The Division Series didn’t start until 1995, unless you count the oddball split-season 1981 postseason.

So James’ system, which is based on actual postseason results, is missing:

31 World Series from 1984 to 2015, excluding 1994
62 Championship Series from 1984 to 2015, excluding 1994
84 Division Series from 1995 to 2015

That’s a lot of data!

So I decided to freshen up James’ formula, using data through the 2015 season. I included only seasons in the divisional-play era, from 1969 to present, for two reasons. First, I think one can make a persuasive case that the game has changed a lot since, say, the 1916 season, when the Brooklyn Robins got 10 points under James’ system for out-homering the Boston Red Sox, 28-14. Second, there’s an argument that the multiple-tier playoff system—Championship Series plus World Series beginning in 1969, with the Division Series added in 1995 and the Wild Card play-in starting in 2012—creates different determinants of postseason success, as fatigue and depth become factors.

I also added a few categories that weren’t in James’ initial formula (batter walks, batter and pitcher strikeouts, on base percentage, and slugging percentage), just to see whether they worked out. (By and large, they didn’t.) And I excluded the strike-shortened 1981 split-season. I’ll present the results as a series of questions.

Have the weights changed?

Yes, they have, by quite a bit. Here are the categories James identified, with their original weights and those calculated by looking exclusively at 1969-2015:

Hitting category	Original Weight	Revised Weight
More runs	3	14
Fewer doubles	14	13
More triples	12	(17)
More homers	10	13
Fewer walks		14
Fewer strikeouts		26
Lower BA	8	(23)
Higher OBP		13
Higher SLG		11
Defensive category
Fewer errors	8	11
More DPs	7	(2)
Pitching category
More strikeouts		3
More walks	7	8
More shutouts	19	18
Lower relative ERA	15	13
Overall category
Better overall record	1/.5 gm	8
Recent experience	12	22
Better head-to-head	12	(3)

Some weights have changed significantly. For example, when James did his analysis, teams with a lower batting average had done better in the postseason than teams with a higher batting average, by a little. Since 1969, teams with a higher batting average have done better, by a lot. Where applicable, head-to-head record was meaningful; it’s not so much anymore. Hitting more triples was a good thing, now it isn’t. Scoring more runs was mildly positive, now it’s a big positive.

But before we go too far with this, let’s move on to our second question.

Does the type of series make a difference?

Yes, it turns out, it does. James lumped together World Series and Championship Series, because there weren’t many of the latter. He didn’t have sufficient data to break them apart. Since 1969, there have been (excluding the 1981 and 1994 strike seasons) 45 World Series, 90 Championship Series, and 84 Division Series. How do they differ?

Hitting category	Original Weight	Revised Weight	Div. Series	Champ. Series	World Series
More runs	3	14	(5)	10	9
Fewer doubles	14	13	12	6	(5)
More triples	12	(17)	(13)	(14)	10
More homers	10	13	4	13	(4)
Fewer walks		14	12	(1)	3
Fewer strikeouts		26	21	2	3
Lower BA	8	(23)	(6)	(12)	(5)
Higher OBP		13	4	10	(1)
Higher SLG		11	(2)	8	5
Defensive category
Fewer errors	8	11	10	(3)	4
More DPs	7	(2)	4	(9)	3
Pitching category
More strikeouts		3	4	(10)	9
More walks	7	8	0	(7)	15
More shutouts	19	18	23	(3)	(2)
Lower relative ERA	15	13	6	8	(1)
Overall category
Better overall record	1/.5 gm	8	0	12	(4)
Recent experience	12	22	10	0	12
Better head-to-head	12	(3)	(5)	2	NA

That’s a lot of variance. The team with more triples has won the World Series more often, but has been at a disadvantage in the Division Series and Championship Series. Having fewer batter strikeouts and more shutouts are a big advantage in the Division Series, but aren’t much of a factor beyond that. Recent postseason experience has translated into more success in Division Series and World Series, but not Championship Series. Those and other differences, it would seem, augur in favor of different formulae for different postseason series.

James' original formula had 12 variables, plus one for intraleague head-to-head records. So let’s develop new formulae with roughly the same number of inputs, based on the table above.

DIVISION SERIES:

Give 5 points to the team that scored fewer runs
Give 12 points to the team that hit fewer doubles
Give 13 points to the team that hit fewer triples
Give 12 points to the team whose batters had fewer walks
Give 21 points to the team whose batters had fewer strikeouts
Give 6 points to the team that had a higher batting average
Give 10 points to the team that had fewer errors
Give 23 points to the team that pitched more shutouts
Give 6 points to the team with the lower ERA
Give 10 points to the team with more recent postseason experience
Give 5 points to the team with the worse head-to-head record

CHAMPIONSHIP SERIES:

Give 10 points to the team that scored more runs
Give 6 points to the team that hit fewer doubles
Give 14 points to the team that hit fewer triples
Give 13 points to the team that hit more home runs
Give 12 points to the team with the higher batting average
Give 10 points to the team with the higher on base percentage
Give 8 points to the team with the higher slugging percentage
Give 9 points to team that turned fewer double plays
Give 10 points to the team whose pitchers had fewer strikeouts
Give 7 points to the team whose pitchers allowed fewer walks
Give 8 points to the team with the lower ERA
Give 12 points to the team with better record

WORLD SERIES:

Give 9 points to the team that scored more runs
Give 5 points to the team that hit more doubles
Give 10 points to the team that hit more triples
Give 4 points to the team that hit fewer home runs
Give 5 points to the team with the higher batting average
Give 5 points to the team with the higher slugging percentage
Give 4 points to the team that committed fewer errors
Give 9 points to team whose pitchers had more strikeouts
Give 15 points to the team whose pitchers allowed more walks
Give 9 points to the team with the higher ERA relative to league average
Give 4 points to team with the worse overall record
Give 12 points to the team with the more recent postseason experience

Two small notes: I found almost no evidence that postseason success related to overall record is scaled by the magnitude of the difference, so I didn’t assign more points, as James did, to teams based on the size of the difference in won-lost records. And I ignored interleague won-lost record for World Series contestants, since the sample sizes, if nonzero, are tiny.

Before we see how what the revised system says about 2016, let’s backtest:

Division Series: Correct 59, incorrect 24 (no selection in one series due to a tie)–71.1 percent
Championship Series: Correct 57, incorrect 33–63.3 percent
World Series: Correct 32, incorrect 13–71.1 percent
Overall: Correct 148, incorrect 70–67.9 percent

That’s pretty good! The 68 percent overall success rate compares to James’ 70 percent reported in the spring of 1984. Let’s apply it to this season.

What does the system say about 2016?

Well, the system got both ALDS series right, favoring Toronto over Texas and Cleveland over Boston. It got the Giants-Cubs series wrong, assigning 71 points to San Francisco (fewer runs, fewer doubles, fewer walks, fewer strikeouts, higher BA, fewer errors, worse head-to-head) and 52 to Chicago (fewer triples, more shutouts, lower ERA, more recent postseason), but it saw the Dodgers beating the Nationals.

For the ALCS, the system gives a narrow edge to the Blue Jays over the Indians. Toronto gets points for fewer doubles, fewer triples, more homers, higher OBP, fewer pitcher strikeouts, and a lower ERA. The Indians get credit for a better record, more runs, higher BA and SLG, and fewer double plays, and the two teams issued the same number of walks.

It favors the Cubs (better record, more runs, more homers, higher BA, OBP, and SLG, fewer pitcher strikeouts, lower ERA) over the Dodgers (fewer doubles, fewer triples, fewer pitcher walks, fewer double plays) in the NLCS.

In the World Series, there are four possible scenarios. The system likes the Cubs against both the Indians and the Blue Jays. It favors Cleveland over Los Angeles. And the Dodgers over the Jays.

We’ll see how the system works as the postseason moves forward. And it goes without saying that any complaints should be addressed to Bill James, c/o Boston Red Sox.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Rob Mains

@Cran_Boy

More about:

Bill James

Latest Articles

You need to be logged in to comment. Login or Subscribe

lipitorkid

10/14

Was it refreshing to assign stats to fit a percentage goal? While time intensive, seems easier (and a little more fun) than coming up with a brand new statistical model.

Thanks for updating this.

Reply to lipitorkid

mainsr

10/14

Yeah, lipitorkid, this may be the most unwieldy spreadsheet I've ever developed. Not the largest or most complex by any means, but man, a lot of false starts and error correction and whatnot. Once I had all the data, though, it was easy to figure out what variables to use and what the coefficients to use. The nearly 70% success rate was the last thing I calculated and a nice dividend.

Reply to mainsr

jdg995

10/14

Fun article. Thanks.

Reply to jdg995

mainsr

10/14

I appreciate it!

Reply to mainsr

Grasul

10/14

It would be amusing to know how James' original system would have worked for the 31 seasons after it was published.

Reply to Grasul

mainsr

10/14

That's a good idea. Now that I have all the data, I'll work on this. Check back here in a few days.

Reply to mainsr

mainsr

10/18

I ran the numbers. Here's James's original system vs. the three revised figures in this report.
DS: James 46-38 (54.8%), new 59-24 (70.2%)
CS: James 50-40 (55.6%), new 57-33 (63.3%)
WS: James 24-21 (53.3%), new 32-13 (71.1%)

None of this, of course, should be viewed as an indictment of James. He had the creativity to come up with a cool Frankenstat, and I'm sure that if he would've updated the weights similarly to the way I did if he felt so inclined.

Reply to mainsr

collins

10/14

Rather than supposing that it is an advantage in the division series to have scored fewer runs while an advantage in the championship series and world series, isn't it likelier that this is a small sample size issue? I think it would be better to lump all the postseason series together to get a better sample size.

Reply to collins

mainsr

10/15

Yeah, John, I thought about but that, particularly with regards to the WS, as I had 45 WS in my study. But I had a pretty robust dataset for the DS (84 series) and CS (90 series). By contrast, if I did my math right, James did his study in '84 based on 105 series, total, and seven of those are from 1981, which, as you could gather from my text, I don't view as a legitimate postseason. Combining the series together would absolutely yield a larger sample size but with lower accuracy, of course. I'm OK with the DS and CS sample sizes compared to what James had, and since the results are in some cases so different (suggesting, to me, e.g., that contact hitting is important in the DS, while overall offense is more important in the CS), I decided to separate them. As with Grasul's comment, I'll try to run the numbers using the combined numbers and let you know what I get; check back early next week.

Reply to mainsr

collins

10/15

Thanks.

Reply to collins

mainsr

10/18

OK, using this this equation implied by the first table above:
14 points for more runs
13 points for fewer doubles
17 points for fewer triples
13 points for more homers
14 points for fewer batter walks
26 points for fewer batter strikeouts
23 points for higher BA
13 points for higher OPB
11 points for higher SLG
11 points for fewer errors
18 points for more shutouts
13 points for lower relative ERA
22 points for most recent experience

Here's what I get:
DS 55-29 (65.5%) using this method, 59-24 (70.2%) using mine
CS 51-39 (56.7%) using this method, 57-33 (63.3%) using mine
WS 24-21 (53.3%) using this method, 32-13 (71.1%) using mine
Overall 130-89 (59.4%) using this method, 148-70 (67.9%) using mine

You're right, not that far apart. I'm not surprised the one-in-all formula breaks down for the WS, which is the smallest sample size.

Reply to mainsr

Flu-Like Symptoms: Dusting Off a Bill James Postseason Prediction System

Thank you for reading

Latest Articles

Fantasy Starting Pitching Planner ’24: Week Four $

Next Man Up: Week Four $

Something’s Off $

MLU: ‘Tugboat’ Wilkinson is Cruising $

TA94: April $

Rob Mains

More about:

Latest Articles

Fantasy Starting Pitching Planner ’24: Week Four $

Next Man Up: Week Four $

Something’s Off $

Thank you for reading

Related Articles

Latest Articles

More about:

Latest Articles

Related Articles