April 27, 2012
The BP Wayback Machine
While looking toward the future with our comprehensive slate of current content, we'd also like to recognize our rich past by drawing upon our extensive (and mostly free) online archive of work dating back to 1997. In an effort to highlight the best of what's gone before, we'll be bringing you a weekly blast from BP's past, introducing or re-introducing you to some of the most informative and entertaining authors who have passed through our virtual halls. If you have fond recollections of a BP piece that you'd like to nominate for re-exposure to a wider audience, send us your suggestion.
The Royals ended a 12-game losing streak on Wednesday, but that wasn't nearly their longest in recent memory. To refresh your memory on the Royals' futility and the odds of long losing streaks, take another look at the article reproduced below, which originally ran as a "Crooked Numbers" column on August 18, 2005.
There's bad, there's the Colorado Rockies, and then there's the Kansas City Royals. If you're into the Jayson Stark "Useless Info" columns, you could easily notch thousands of words about how bad the Royals have been for the past decade or more, a situation only highlighted by their recent losing streak. It's a tough time to be a Royals fan and if you're one of the few, the proud, read on and perhaps you'll feel a little bit better about your team.
To start, let's get some perspective. The Royals' streak of 18 straight losses is not the worst run of baseball of all time. The worst losing streak in the major leagues since 1901 was the 1961 Phillies who managed to lose 23 games in a row from 7/29/61 to 8/20/61. Interestingly, it could have been a lot worse; the Phillies lost five in a row just before the streak, so they actually lost 28 of 29 games in what may very well be the worst month any team has ever had. Here are the rest of the worst:
Year Team Games 1961 Philadelphia Phillies 23 1988 Baltimore Orioles 21 1969 Montreal Expos 20 1943 Philadelphia A's 20 1916 Philadelphia A's 20 1906 Boston Red Sox 20 1975 Detroit Tigers 19 1914 Cincinnati Reds 19 1906 Boston Braves 19
The Royals are on the cusp of greatness, but they're not quite there yet. But how bad is this streak? People have a tendency to grasp onto streaks because they're easily quantified. A team that's lost 15 games in a row is clearly worse than a team that's lost 12 or 10 games in a row. But streaks are as easily broken as they are quantified. Take baseball's greatest streak of all time:
To determine this, we need to get into some binomial distributions. If we assume that DiMaggio had a "true" probability of getting a hit in a game, then the question becomes quite simple. However, that's not quite true because we should instead assume he had a probability of getting a hit in an at bat; as such, things becomes much more complicated. Instead, let's get back to teams and winning games. The odds that a team with a winning percentage w will win any x number of games in a row is simply w^x. Conversely, the odds that they will lose any y number of games in a row is (1-w)^y. This is a binomial distribution, but a very simple one.
If, however, the goal is to determine if a team will win at least 2 of 3 games, the formula becomes more complex because more situations meet the standards for success. For example, if the team wins all three games, wins the first two, wins the second two, or wins the first and last games, all three situations must be counted. The odds of the team winning exactly two of the three games--w2 * (1-w)--must be added to the odds that they will win all three--w3. But since there are three ways in which the team can win two out of three games, that result has to be multiplied by three.
However, the key to the puzzle is Pascal's Triangle, a tool that reveals the binomial coefficient by which each result must be multiplied--three, in the case above. Essentially, the triangle shows how many different ways the final counted result can be achieved by different distributions of the binomial choice. There are three different ways the team can win two games and lose one, but only way in which they can win all three. This is also referred to as "x choose y"--essentially, if one is faced with the decision to choose y games out of x total games, how many possible combinations add up to y.
Getting back to the Royals, we run into another problem when estimating the probability of their losing streak or comparing it to other losing stretches: what is the Royals' probability of winning an individual game? In baseball, we typically assume that this probability is a team's winning percentage. Isn't this was the whole regular season is about, determining who's the best team by who has the highest probability of winning a baseball game? But as Keith Woolner reminded us before the season, 162 games isn't enough time to properly discern a team's "true" winning percentage, the probability that they will win any given game.
This is to say nothing of the fact that a team's probability of winning any given game is not a constant. Winning probability is affected by any number of factors: whether the team is home or away, who the starting pitchers are, who's in the lineup, who the opposing team is, and any number of other factors. We generally like to assume that those kinds of breaks even out of the course of 162 games, but if they did, then BP's Quality of Batters Faced and Quality of Pitchers Faced reports and all that hang-wringing about the unbalanced schedule and the wild card would be for naught.
While a team's winning percentage over 162 is the best guess we have about their true probability, we must be admit that there is an overwhelming probability that that number is wrong. It's going to be close, but the odds that the next 162 games would fall exactly the same as the previous 162 is miniscule. It's possible that by using the full season's winning percentage as a guide for a team's true probability of winning any single game, we're making such stretches of losing appear easier, but using only those games not involved in the streak would be an arbitrary removal of data. Thus, the full-season winning percentage is as close as we can get, so we'll stick with that.
Caveats aside, let's see what we can do about estimating just how bad the Royals are. First, let's take a look at the probability that a team with a given probability of winning each individual game will lose a certain number of games in a row.
In this graph, there are five hypothetical teams with winning percentages between .500 and .300. Note that their odds of losing the first game are exactly the inverse of their winning percentages, as we'd expect. As the losses pile up, the probabilities decrease dramatically, to the point that by the time we get to 13 or 14 losses, it's nearly impossible to tell the difference between a .500 team and a .300 team. This is encouraging because it means that with streaks of the Royals' magnitude, the winning probability of the team doesn't make that much difference and we can continue knowing that our errors will be small in this regard.
Now, let's assume for a minute that the Royals are actually a .319 team (their current winning percentage). What are the odds that they'll lose any given 18 games in a row? By binomial distribution, we know that that probability is .000984 or approximately 1015.5:1. That seems very impressive, but that's only the probability that they'll lose any given stretch of 18 games. A 162-game season can be viewed as 144 separate 18-game opportunities to lose 18 games in a row. While the Royals chances of losing any given 18-games in 1015.51:, their chances of losing any stretch of 18 games over the course of a 162 game season is actually closer to 6.6:1. What's more, the Royals, given their .319 winning percentage, had a 50:50 chance of losing 13 games in a row at some point during the season.
How does that compare to historical streaks? Obviously it's not as bad as the '61 Phillies, but it's possible that the Royals are breaking up several losing streaks with lone wins to make things look better. So instead of looking simply at streaks, let's see how bad the Royals are over a given stretch of games. For example, the Royals are 14-40 over their last 54 games, but let's round it off to 50--in which they were 13-37. Compare that to the worst 50 game stretches since 1901:
YEAR TEAM W L Win_Pct Prob Ratio InSeason Ratio 1916 PHA 4 46 .235 0.41% 240.6 37.15% 1.7 1937 PHA 7 43 .358 0.06% 1796.7 6.04% 15.6 1932 BOS 8 42 .279 3.74% 25.7 98.60% 0.0 1915 PHA 8 42 .283 3.31% 29.2 97.70% 0.0 1961 PHI 8 42 .305 1.51% 65.2 81.80% 0.2 2004 ARI 8 42 .315 1.05% 94.1 69.38% 0.4 1943 PHA 8 42 .318 0.92% 107.3 64.60% 0.5 1949 WS1 8 42 .325 0.71% 138.9 52.22% 0.8 1996 DET 8 42 .327 0.65% 153.5 51.67% 0.9 1979 OAK 8 42 .333 0.50% 197.7 43.17% 1.3 1907 SLN 8 42 .333 0.50% 197.7 43.17% 1.3 1923 BSN 8 42 .351 0.24% 413.6 23.70% 3.2 1982 MIN 8 42 .370 0.10% 1011.8 10.47% 8.5
The Royals are not even close. Getting back to our original question, what's more impressive: the Philadelphia A's going 4-46 over 50 games or the Philadelphia Phillies losing 23 in a row in 1961? Getting back to our binomial distributions, the probability of a .235 team--the '16 A's--winning 4 games or fewer in a given 50 game stretch is about 240.6:1 and over a season is a mere 1.7:1. This year's Royals--by virtue of their .319 winning percentage--have a 148.9 chance of matching that feat any time in a season (16,733.9:1 in any-50 game stretch). As mentioned, the odds of the Royals losing 18 games in a row at any point in the season is 6.6:1. Expand that to 50 games and the Royals would have to lose 44 of 50 to match those odds. Furthermore, while there have been several streaks longer than the Royals', only the '37 Athletics and '82 Twins can claim more improbable stretches of bad baseball since 1901.
The Royals' streak is already more improbable than all but 2 stretches of 50 games since 1901 as well as those few teams that notched longer pure streaks than they did. But each game that the Royals' lose makes their stretch more and more improbable, likely vaulting them past those few teams remaining ahead of them. Is there some solace to be taken in the fact that the Royals' improbably bad stretch was over one 18-game stretch and not a 50-game valley? Maybe, but if you're looking for the most improbable losing streak in baseball, the Royals' are certainly making a case.
Note: the following is an excerpt from James Click's follow-up article, "Going Streaking," which appeared as a "Crooked Numbers" column on August 25, 2005.
Not unlike the old Sports Illustrated Jinx, it seems that as soon as we talk about something here at BP, things turn around. Jonah Keri covered Sunday's A's game yesterday in his Game of the Week column, but it's safe to say that my last two columns--about the Royals' losing streak and the A's winning ways--have made large U-turns in the last week. The Royals' managed to finally break out of their near-record slump and it's this subject that deserves a little more of our attention.
Last week's column was a protracted discussion of the Royals' losing streak, its historical place, and a discussion of its likelihood. Unfortunately--as many readers and one fellow BP author pointed out--there was an error in the discussion, specifically this paragraph:
"Now, let's assume for a minute that the Royals are actually a .319 team (their current winning percentage). What are the odds that they'll lose any given 18 games in a row? By binomial distribution, we know that that probability is .000984 or approximately 1015.5:1. That seems very impressive, but that's only the probability that they'll lose any given stretch of 18 games. A 162-game season can be viewed as 144 separate 18-game opportunities to lose 18 games in a row. While the Royals chances of losing any given 18-games in 1015.5:1, their chances of losing any stretch of 18 games over the course of a 162 game season is actually closer to 6.6:1. What's more, the Royals, given their .319 winning percentage, had a 50:50 chance of losing 13 games in a row at some point during the season."
The probability of the Royals' losing any particular 18 games in a row is right, it is .000984. Furthermore, the odds of them losing 18 games in a row given 144 chances is actually 6.6:1. So if those are both right, where's the error? The problem comes from looking for streaks of exactly 18 games versus streaks of 18 or more. If a team loses 18 games in a row and then loses their next contest, they now have two 18-game losing streaks, overlapping by 17 games. Thus, if you view any losing streak of 18+n games as n+1 streaks of 18-games--as my calculations did--then you're vastly overestimating the likelihood that a given team will lose at least 18 games in a row.
As Rany Jazayerli pointed out, "In other words, it's not accurate to say that the odds of not losing 18 in a row on Day X is (1-.000984) = .999016, and (.999016)^144 = the odds of not losing 18 in a row over an entire season. On Opening Day, the odds of starting an 18-game losing streak is .000984; from that day on, the odds are (.000984) * (.319)." (He also pointed out that a 162-game season provides 145 opportunities to lose 18 games, not 144.)
A more accurate formula to answer the question we were asking--how likely is a team of a given winning percentage to lose a certain number of games in a row at some point during the season?--would be this:
Where W% is the team's "true" winning percentage and G is the number of games in the streak. Let's break this down into pieces to get a better idea of what's going on. 1-W% is the odds a team will lose a game, so (1-W%)^G is the odds they will lose the required number of games in a row. Then, 1--((1-W%)^G)*W% is the likelihood that a team will start a losing streak of the required number of games after winning the game before. This is the key component that Rany identified that prevents us from counting streaks of G+1 games as two streaks of G games and thus inappropriately doubling the odds of a team losing that many games in a row. Raising 1--((1-W%)^G)*W% to the power of (163-G) gives us the odds that a team will not encounter the required streak and thus we finish by subtracting from one. (We use 163 instead of 162 because any number of games X is X+1 chances to have the required streak; for example, if we wanted to know how many chances a team has to lose one game in two games, we wouldn't raise it to 2-1, we'd raise it to 3-1 since there are two chances to lose.) In effect, it's the same formula from last week, but we've multiplied the odds of the streak in any given number of games by the team's winning percentage before running it for the full season.
Whew... Got all that? Now, what can we say about the Royals and their eventual 19-game losing streak? Using this formula, let's see what chances the Royals have of losing a given number of games or more in a row over the course of a 162 game season given their shiny new .325 winning percentage:
W% Streak Odds Season Odds .325 1 67.480% 100.000% .325 2 45.535% 100.000% .325 3 30.727% 100.000% .325 4 20.734% 99.998% .325 5 13.992% 99.933% .325 6 9.441% 99.229% .325 7 6.371% 96.104% .325 8 4.299% 88.562% .325 9 2.901% 76.550% .325 10 1.958% 62.121% .325 11 1.321% 47.800% .325 12 0.891% 35.304% .325 13 0.602% 25.305% .325 14 0.406% 17.757% .325 15 0.274% 12.278% .325 16 0.185% 8.404% .325 17 0.125% 5.713% .325 18 0.084% 3.865% .325 19 0.057% 2.607% .325 20 0.038% 1.754% .325 21 0.026% 1.179% .325 22 0.017% 0.791% .325 23 0.012% 0.531% .325 24 0.008% 0.356% .325 25 0.005% 0.239%
As opposed to last week when I stated the odds of the Royals losing at least 18 in a row was 6.6:1, now we can see that it's 3.865%, or closer to 24.9:1. Given the relative infrequency of losing streaks of this magnitude, those odds fit much better with the actual results over the history of baseball.
There's still more than can be done with understanding streakiness in baseball. Keith Woolner pointed out that "Another problem with the binomial model of game outcomes used in the article that implicitly assumes that game outcomes are independent within a streak." Given the nature of baseball scheduling, starting pitcher rotations, injuries, and any number of other factors, a team's winning percentage is likely to vary wildly when looking at each individual game or series. Determining the improbability of a streak depends heavily on a team's winning percentage and absent that, it's difficult to say just how unlikely a sudden nose dive is. Regardless of the actual probability, the Royals' 19-game losing streak was one of the longest in baseball history and it's a great deal more improbable than we found last week.