Crooked Numbers: Do Not Pass Go

As a kid, when the paper arrived I would immediately dive for the sports
section, a move closely followed by flipping past all the articles and heading
right for the “Scoreboard” page where I could scan the box scores over a bowl of
Chocolate Frosted Sugar Bombs. This frantic ritual had its pros (a quick summary
of yesterday’s action) and cons (I missed lots of headlines–for example, I didn’t
realize for two days that Buster Douglas had actually beaten Mike Tyson because
that somehow didn’t make the scoreboard page), but it did make me intimately
familiar with the old newspaper box score format.

Box scores are disappearing. While we still find them in their traditional
format in newspapers and across the Web, the ability to read several articles about
each game, daily updated stat reports, and play-by-play logs largely nullify the
need to manually keep track of how many home runs your favorite player has or to
discern the events of each game from a very limited set of numbers. The days of trying
to figure out how a player scored a run without an AB or why another player has
one fewer plate appearance than everyone else despite being in the middle of the
order are–for the most part–gone.

As the box score loses its place, certain stats become a little harder to find. In
that section just below the player-by-player lines, a quick summary of various game
events like errors, doubles, triples, home runs, and double plays was always
included. While those stats are meticulously and popularly maintained, there
hasn’t been as much discussion lately of our old friend LOB (left on base). Unless
you’re an A’s fan, a group that seems to be on the verge of having to add a counter
to the outfield bleachers to keep track of all the green and gold left stranded on
the bases.

Through Tuesday’s games, the A’s had scored 95 runs this season, good for second
to last in the majors, tied with Cleveland and ahead of Pittsburgh. During that
same time, they’ve left 209 men on base. That sounds like quite a lot, but let’s
see how it compares to the rest of the league. Pittsburgh, also down in the
basement with the A’s in run scoring, is at 178 LOB and 79 runs scored. Cleveland
has 173 and 95. So the A’s aren’t terribly outside the norm when looking at their
basement compatriots.

Since offense can be thought of in two component parts–getting men on base and
driving them in–LOB can be thought of as potential runs, fulfilling half of that
equation, but not the other. Thus, looking at runs scored as a percentage of LOB +
R (called “runner scoring percentage” for now) can give an idea of whether or not a
team is lacking in one department or the other. While this ignores events like
double plays and caught stealings, it should still give us a rough idea of offenses
that are good at one component, or the other.

In 2005, the Pirates have plated 30.74% of their potential runs. The A’s have
scored 31.25%, the Indians 35.45%. Just as before, the A’s look to be in the
middle of the pack when comparing them to the other two meager offenses this year.
But comparing the Pirates and A’s to every team since 1990, they come in dead last.
In the past 16 years, no two teams have been as bad at driving in runners they put
on base as the A’s and Pirates. For comparison’s sake, here are the top and bottom
ten teams since 1990 in runner scoring percentage:


YEAR  TEAM  LOB   R    AVG   OBP   SLG   R%
----  ---- ----  ---   ---   ---   ---  -----
1994  CLE   762  679  .290  .348  .484  47.12
2000  CHA  1127  978  .286  .352  .470  46.46
1996  COL  1108  961  .287  .350  .472  46.45
2004  CHA  1031  865  .268  .330  .457  45.62
1995  CLE  1018  840  .291  .358  .479  45.21
1996  BAL  1154  949  .274  .348  .472  45.13
1997  COL  1124  923  .288  .353  .478  45.09
2005  DET   161  132  .273  .331  .433  45.05
1999  CLE  1234 1009  .289  .370  .467  44.98
2000  COL  1198  968  .294  .358  .455  44.69
----  ---- ----  ---   ---   ---   ---  -----
1990  PHI  1242  646  .255  .324  .363  34.22
2003  LAN  1108  574  .243  .299  .368  34.13
1992  CHN  1148  593  .254  .303  .364  34.06
1990  SLN  1164  599  .256  .316  .358  33.98
1990  HOU  1132  573  .242  .309  .345  33.61
1992  BOS  1215  599  .246  .318  .347  33.02
1993  FLO  1189  582  .248  .311  .346  32.86
1992  LAN  1138  548  .248  .308  .339  32.50
2005  OAK   209   95  .237  .311  .338  31.25
2005  PIT   178   79  .230  .299  .359  30.74

When Will Carroll asked me about this on BP radio two weeks ago, I
quickly responded that the reason for the A’s struggles is their reliance on players
with high on-base percentages (OBP)
and not necessarily high slugging percentages (SLG).
Especially in the A’s situation, in which they’ve sought out players with
deceptively high OBPs built mostly on walks rather than batting average, a team
built on walks rather than slugging would seem to strand more runners than
one built on batting average (AVG).
The reason for this is quite simple: It’s hard to take the extra base on a walk.

But the Pirates don’t necessarily fall into that category. Their team line of
.230/.299/.359 is objectively terrible, but their SLG and isolated power (ISO)
are both higher than the A’s (.237/.311/.338) by margins of .021 and .028, and the
Pirates have a higher ISO than any team in the bottom 10 in runner scoring
percentage. If anything, Pittsburgh’s better power numbers should mean that they
would score a higher percentage of their runners on base than the A’s, but that’s
not the case.

Before jumping to any conclusions based on limited amounts of data, let’s expand
things to the full 15+ years worth of data we’ve got on hand. Of the three major
rate stats, SLG has the highest correlation to runner scoring percentage, meaning
that we can expect a team’s slugging percentage to account for most of the changes
in runner scoring percentage. In this case the correlation is positive, meaning
the higher the slugging percentage, the higher the runner scoring percentage.
Doing the same analysis for AVG and OBP reveals that all three stats have solid
positive correlations; so as offense increases overall, the percentage of runners
on base who score increases as well. Again, this is just a logical extension of the
fact that there are only three bases where runners can be stranded, so as teams put
more runners on those bases, more of them have to score.
Essentially, teams can only strand up to three runners per inning, but they can
hypothetically score an infinite number of runs.

So if all three major rate stats have positive correlations to runner scoring
percentage, we cannot say that teams with high OBP and low SLG will have a lower
runner scoring percentage–not exactly, anyway. Because both OBP and SLG encompass AVG to
some degree, the positive correlation of AVG may be overshadowing what we’re really
looking for. Instead, we can run a multivariable regression using all three major
rate stats against runner scoring percentage. Doing so yields the following
equation:

Runner Scoring Percentage = 0.40*AVG + -0.21*OBP + 0.69*SLG + 0.07 (+/- 0.01)

Note that when AVG and SLG are included in the regression, OBP actually has a
negative effect on runner scoring percentage. This is exactly what we suspected:
if AVG and SLG are held steady, increasing OBP (in this case only in the form of
walks because AVG is constant) results in more baserunners, but not nearly as many
runs as we’d expect if those baserunners reached on hits instead of walks. This
doesn’t mean that walks are a bad thing; it just means that teams with a
disproportionate percentage of their baserunners coming on walks will have a higher
percentage of their baserunners left on base than teams whose baserunners come from
hits.

Believe it or not, there’s actually some hope here for A’s and Pirates fans (and
even Cleveland fans). Instead of using AVG, OBP, and SLG in the regression, we can
try to remove the AVG component of OBP and SLG. For SLG, this is simply ISO. For
OBP, it can be a little trickier because the denominators for the two stats are
different, but to keep things simple, we’ll just use OBP-AVG and call it ISO_BB for
now. It’s not technically correct, but it still gives us a good idea of teams
whose OBP is built more on walks than hits. Running things again, we now get this
formula with a virtually identical correlation:

Runner scoring percentage = 0.88*AVG + -0.21*ISO_BB + 0.69*ISO + 0.07 (+/- 0.01)

Once again, the walks component of offense results in more baserunners but not
the corresponding number of runs based on runner scoring percentage. Note that the
coefficient for AVG has gone way up while the other two have remained very similar.
As mentioned above, the Pirates (.230), A’s (.237), and Indians (.226) have
struggled mightily in the batting average department. All three teams are likely
to see significant improvements in those numbers as the season moves along. As
their batting average increases, all three teams will start to score a higher
percentage of the runners they put on base.

(The other major point made frequently by the mainstream media is a team’s performance with
runners in scoring position. On the whole, teams tend to bat very similarly with
runners in scoring position than without and there doesn’t appear to be any
characteristics of teams that’s indicative of a group that bats better or worse
than expected with runners in scoring position. Part of the A’s and Pirates’
struggles is their ineptitude with runners on second or third, but those numbers
aren’t far out of line with their overall offensive performance and lend little to
no additional information about runner scoring percentages.)

As with any regression formula, forecasts for outliers are going to involve some
extreme regression to the mean. In this case, the Pirates, instead of scoring
30.74% of their runners, would be expected to score 35.07%. The A’s increase from
31.25% to 33.64% (Note that because of their higher-OBP, lower-SLG numbers
compared to the Pirates, the A’s don’t increase nearly as much. Scoring 33.64% of
their runners would still rank them fifth worst since 1990). Applying those
numbers to their actual run totals, the Pirates would be forecast to score 90 runs
instead of their actual 79; the A’s jump to 102 instead of 95. The Indians,
however, are already scoring 35.45% of their runners, very close to their predicted
average of 36.37%, a net of only two more runs. In Cleveland’s case, it isn’t that
the offense can’t get men home, it’s that it can’t get them on base in the first
place.

The A’s and Pirates are better offenses than they’ve shown so far this year, and
expecting them to maintain both their poor overall offensive pace and their poor
ability to score runners on base is like expecting Brian Roberts
to hit 48 home runs. Both teams should see a rebound, both because their team
batting averages will increase and because they’ve been underperforming their
runner scoring percentage so far this year. For now, they’ve both dug themselves quite
a hole, and it may be a while before they climb out of it.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Thank you for reading

Latest Articles

Un mensaje de los editores de Baseball Prospectus: el precio de la suscripción no cambiará

Baseball Prospectus Announcement: Subscription Price Not Changing

Any Way You Cut It $

Notes From The Field (Arizona Fall League Edition) – Part 5B $

Transaction Analysis: Mets Get Siri-ous B

James Click

Latest Articles

Un mensaje de los editores de Baseball Prospectus: el precio de la suscripción no cambiará

Baseball Prospectus Announcement: Subscription Price Not Changing

Any Way You Cut It $