August 25, 2005
Not unlike the old Sports Illustrated Jinx, it seems that as soon as we talk about something here at BP, things turn around. Jonah Keri covered Sunday's A's game yesterday in his Game of the Week column, but it's safe to say that my last two columns--about the Royals' losing streak and the A's winning ways--have made large U-turns in the last week. The Royals' managed to finally break out of their near-record slump and it's this subject that deserves a little more of our attention.
Last week's column was a protracted discussion of the Royals' losing streak, its historical place, and a discussion of its likelihood. Unfortunately--as many readers and one fellow BP author pointed out--there was an error in the discussion, specifically this paragraph:
"Now, let's assume for a minute that the Royals are actually a .319 team (their current winning percentage). What are the odds that they'll lose any given 18 games in a row? By binomial distribution, we know that that probability is .000984 or approximately 1015.5:1. That seems very impressive, but that's only the probability that they'll lose any given stretch of 18 games. A 162-game season can be viewed as 144 separate 18-game opportunities to lose 18 games in a row. While the Royals chances of losing any given 18-games in 1015.5:1, their chances of losing any stretch of 18 games over the course of a 162 game season is actually closer to 6.6:1. What's more, the Royals, given their .319 winning percentage, had a 50:50 chance of losing 13 games in a row at some point during the season."
The probability of the Royals' losing any particular 18 games in a row is right, it is .000984. Furthermore, the odds of them losing 18 games in a row given 144 chances is actually 6.6:1. So if those are both right, where's the error? The problem comes from looking for streaks of exactly 18 games versus streaks of 18 or more. If a team loses 18 games in a row and then loses their next contest, they now have two 18-game losing streaks, overlapping by 17 games. Thus, if you view any losing streak of 18+n games as n+1 streaks of 18-games--as my calculations did--then you're vastly overestimating the likelihood that a given team will lose at least 18 games in a row.
As Rany Jazayerli pointed out, "In other words, it's not accurate to say that the odds of not losing 18 in a row on Day X is (1-.000984) = .999016, and (.999016)^144 = the odds of not losing 18 in a row over an entire season. On Opening Day, the odds of starting an 18-game losing streak is .000984; from that day on, the odds are (.000984) * (.319)." (He also pointed out that a 162-game season provides 145 opportunities to lose 18 games, not 144.)
A more accurate formula to answer the question we were asking--how likely is a team of a given winning percentage to lose a certain number of games in a row at some point during the season?--would be this:
Where W% is the team's "true" winning percentage and G is the number of games in the streak. Let's break this down into pieces to get a better idea of what's going on. 1-W% is the odds a team will lose a game, so (1-W%)^G is the odds they will lose the required number of games in a row. Then, 1--((1-W%)^G)*W% is the likelihood that a team will start a losing streak of the required number of games after winning the game before. This is the key component that Rany identified that prevents us from counting streaks of G+1 games as two streaks of G games and thus inappropriately doubling the odds of a team losing that many games in a row. Raising 1--((1-W%)^G)*W% to the power of (163-G) gives us the odds that a team will not encounter the required streak and thus we finish by subtracting from one. (We use 163 instead of 162 because any number of games X is X+1 chances to have the required streak; for example, if we wanted to know how many chances a team has to lose one game in two games, we wouldn't raise it to 2-1, we'd raise it to 3-1 since there are two chances to lose.) In effect, it's the same formula from last week, but we've multiplied the odds of the streak in any given number of games by the team's winning percentage before running it for the full season.
Whew... Got all that? Now, what can we say about the Royals and their eventual 19-game losing streak? Using this formula, let's see what chances the Royals have of losing a given number of games or more in a row over the course of a 162 game season given their shiny new .325 winning percentage:
W% Streak Odds Season Odds .325 1 67.480% 100.000% .325 2 45.535% 100.000% .325 3 30.727% 100.000% .325 4 20.734% 99.998% .325 5 13.992% 99.933% .325 6 9.441% 99.229% .325 7 6.371% 96.104% .325 8 4.299% 88.562% .325 9 2.901% 76.550% .325 10 1.958% 62.121% .325 11 1.321% 47.800% .325 12 0.891% 35.304% .325 13 0.602% 25.305% .325 14 0.406% 17.757% .325 15 0.274% 12.278% .325 16 0.185% 8.404% .325 17 0.125% 5.713% .325 18 0.084% 3.865% .325 19 0.057% 2.607% .325 20 0.038% 1.754% .325 21 0.026% 1.179% .325 22 0.017% 0.791% .325 23 0.012% 0.531% .325 24 0.008% 0.356% .325 25 0.005% 0.239%
As opposed to last week when I stated the odds of the Royals losing at least 18 in a row was 6.6:1, now we can see that it's 3.865%, or closer to 24.9:1. Given the relative infrequency of losing streaks of this magnitude, those odds fit much better with the actual results over the history of baseball.
Since last week's article came out, there was another streak of interest: the White Sox, owners of the best record in baseball for most of the season, suddenly lost seven in a row and saw their division lead over Cleveland shrink from 12.5 games down to 8.5 games. During the same period, their odds of making the playoffs shrank from 98.5% to 95.5%; since the streak began, their odds of winning the division have fallen from 94.6% to 80.2% yesterday. While the explanations for the slump go from Scott Podsednik's injury to general regression back to their third-order winning percentage, let's see how likely a team like the White Sox--a .615 team--is to lose seven in a row at some point during the season.
W% Streak Odds Season Odds .615 1 38.525% 100.000% .615 2 14.841% 100.000% .615 3 5.718% 99.662% .615 4 2.203% 88.399% .615 5 0.849% 56.008% .615 6 0.327% 26.935% .615 7 0.126% 11.313% .615 8 0.049% 4.490% .615 9 0.019% 1.743% .615 10 0.007% 0.671% .615 11 0.003% 0.257% .615 12 0.001% 0.099% .615 13 0.000% 0.038% .615 14 0.000% 0.014% .615 15 0.000% 0.006% .615 16 0.000% 0.002% .615 17 0.000% 0.001%
(This chart ends at 17 losses since all streaks longer than that have a probably of less than 0.0005% on the season and thus show up as 0.000%.) Note how quickly the odds fall off between a four and seven game losing streak, from 88.4% to 11.3%, virtually guaranteed to be a very rare occurrence. But if we look at the White Sox through their third-order winning percentage (.521), their odds of losing seven in a row at some point during the season jump to 37.4%. It appears that we should not be surprised at all by the White Sox' recent turn for the worse; after all, the odds that they would drop seven in a row were better than one in two. It's just their bad fortune that it comes in the middle of a pennant race.
There's still more than can be done with understanding streakiness in baseball. Keith Woolner pointed out that "Another problem with the binomial model of game outcomes used in the article that implicitly assumes that game outcomes are independent within a streak." Given the nature of baseball scheduling, starting pitcher rotations, injuries, and any number of other factors, a team's winning percentage is likely to vary wildly when looking at each individual game or series. Determining the improbability of a streak depends heavily on a team's winning percentage and absent that, it's difficult to say just how unlikely a sudden nose dive is. Regardless of the actual probability, the Royals' 19-game losing streak was one of the longest in baseball history and it's a great deal more improbable than we found last week.