Crooked Numbers: Sizing Up Small Sample Size

Every year is a fresh start. For teams and for players the changes of
a winter’s worth of work are finally on display. Despite all the changes
from last year, most of baseball remains the same from year to year, but
there is an adjustment period in the early part of the season as teams
and players settle into the season.

Small sample size doesn’t mean no sample. While there’s meaning
in how a team starts off, it’s also important to determine whether
the early parts of the season can be deceptive for reasons other than
the lack of sufficient data, especially when considering individual
player performances. There’s already evidence that hitters
tend to perform better in the first half of the season than in the
second half. There’s the conventional wisdom that pitchers dominate in
the colder months early in the season while August is when the bats wake
up. Then there are the A’s fans who keep looking at Barry
Zito‘s 4.51 ERA the last three Aprils followed by five months
of 2.74, 3.80, 3.46, 3.13, and 3.34.

April results that don’t fit the public perception are usually
attributed to some change discovered by the media looking for the cause.
A hot start by a hitter is attributed to a change in batting stance,
weight, or physique. This year’s example is Eric Hinske
whose new stance is the easy answer to his hot start. With pitchers,
learning or mastering a new pitch or changing the delivery are the easy
answers for early success. Teams off to hot starts have new veteran
leadership or youthful exuberance.

Inherent in a lot of this discussion is the idea that other players
have yet to adjust to the changes in their opponents. In the matchup of
batter and pitcher, we usually assume that the pitcher benefits more
from deception and lack of information than the hitter. Pitchers never
before seen by hitters can hide the ball in different ways, mix in
unexpected pitches, and throw off a hitter’s timing with a new windup.
Warren Spahn put it best: “Hitting is timing; pitching
is upsetting timing.”

One quick way to determine if pitchers see an early season advantage
is to look at league wide stats broken down by month:

There are several different trends depending on the year. 2000 saw
ERA decline steadily throughout the season until September; 2001 and
2003 peaked in June, 2002 in July, and 2004 in August. More importantly,
there doesn’t appear to be any distinct trend towards lower ERAs in
April; if anything, there’s a slight dip in May and a rise in June in
four of the five seasons, perhaps as teams begin to weed through who’s
playing well and who’s not before the hitters catch up in the hotter months.

Month-by-month ERA may not be the best indicator of any inherent
advantage by newer pitchers or pitchers who have changed their
repertoire since last season. Instead, let’s break things down by a
pitcher’s starts against a particular team. To do so, I’ll look at each
pitcher’s performance broken down by the number of times he’s seen that
team, including the current appearance. Let’s see what we get when
looking at 2004 numbers:

App Year   IP     ERA  K_PA  BB_PA HR_PA H_PA
1   2004 21305.0 4.50  .166  .087  .029  .237
2   2004 10795.0 4.56  .166  .085  .030  .240
3   2004  5209.0 4.28  .173  .085  .029  .228
4   2004  2859.0 4.42  .177  .083  .030  .237
5   2004  1482.7 4.38  .179  .084  .029  .236
6   2004  769.7  4.20  .183  .086  .028  .233
7   2004  410.7  3.75  .187  .091  .024  .224
8   2004  251.3  4.08  .202  .105  .033  .225
9   2004  164.3  3.40  .228  .077  .029  .228
10  2004   90.3  3.49  .211  .110  .018  .175
11  2004   43.7  4.12  .199  .044  .017  .249
12  2004   11.3  2.38  .163  .102  .041  .184
13  2004    2.0  0.00  .429  .000  .000  .143

Or if you’re a more visual person:

(For the curious, those 2.0 IP in the thirteenth appearance against a
team were contributed by Tom Gordon against the
Orioles, Scott Eyre against the Diamondbacks, and
Joe Nathan against the Tigers. The highest since 1990
was Mike Myers against the Diamondbacks in 2001 with 15
appearances. Maybe that’s why the Snakes acquired him that winter.)

In the over 30,000 innings when pitchers faced a team either once or
twice in 2004, they had ERAs of 4.50 and 4.56. After that, as
appearances increase, ERA declines steadily. Of the four major metrics
to accompany ERA, K/PA increases while ERA decreases–as we would
expect–but BB/PA increases as well. (Nate
Silver has already discussed the advantages of using K/PA rather
than K/9, so I’ll endeavor to use K/PA in the future. For reference,
Randy Johnson and Johan Santana led
all qualifiers last year with a K/PA of .301 while Kirk
Rueter finished last with .067.)

Lest we think that 2004 was bucking a trend or the small sample sizes
as appearances increase, here are the numbers for 2003 and 2002:

The most obvious explanation here is that players who are called upon
to face teams many times are going to be the best pitchers in the
league. To check for that bias, here’s what we would have expected each
group to do based on their weighted season performances in 2004. Here’s
what we get:

App Year  IP      ERA  K/PA  BB/PA HR/PA H/PA
1  2004 21305.0  4.73  .164  .087  .030  .239
2  2004 10795.0  4.50  .169  .085  .029  .236
3  2004  5209.0  4.31  .173  .084  .028  .234
4  2004  2859.0  4.20  .175  .084  .028  .232
5  2004  1482.7  4.09  .179  .085  .027  .229
6  2004   769.7  4.03  .183  .088  .026  .227
7  2004   410.7  3.73  .192  .089  .024  .221
8  2004   251.3  3.63  .200  .089  .023  .219
9  2004   164.3  3.42  .199  .085  .021  .218
10 2004    90.3  3.31  .205  .084  .021  .217
11 2004    43.7  3.42  .207  .085  .022  .216
12 2004    11.3  2.90  .198  .080  .018  .202
13 2004     2.0  2.37  .276  .076  .017  .168

Compare that to the other chart above and we get the following:

App Year    IP   ERA   K/PA BB/PA HR/PA  H/PA
1  2004 21305.0  0.23 -.002  .000  .001  .002
2  2004 10795.0 -0.06  .003  .000 -.001 -.004
3  2004  5209.0  0.03  .000 -.001 -.001  .006
4  2004  2859.0 -0.22 -.002  .001 -.002 -.005
5  2004  1482.7 -0.29  .000  .001 -.002 -.007
6  2004   769.7 -0.17  .000  .002 -.002 -.006
7  2004   410.7 -0.02  .005 -.002  .000 -.003
8  2004   251.3 -0.45 -.002 -.016 -.010 -.006
9  2004   164.3  0.02 -.029  .008 -.008 -.010
10 2004    90.3 -0.18 -.006 -.026  .003  .042
11 2004    43.7 -0.70  .008  .041  .005 -.033
12 2004    11.3  0.52  .035 -.022 -.023  .018
13 2004     2.0  2.37 -.153  .076  .017  .025

As opposed to the apparent improvement in performance as appearances
increase, pitchers actually perform worse as their appearances mount.
Pitchers performed about a quarter of a run better in their initial
appearance against batters than we would expect from their complete
season performance, but performed steadily worse as appearances mounted.
The discrepancy between the expected and actual ERA in the initial
performance against a team is especially conclusive given the massive
sample size of innings involved in the initial appearance. Teams may be
pretty good about selecting the correct pitchers for the majority of the
playing time, but diminishing returns increase as those pitchers face
the same teams more and more during a season.

Though there isn’t any apparent improvement in pitching performance
in April compared to other months of the season as evidenced by league
ERA, pitchers do appear to see a slight advantage in their initial
appearance against an opposing team. Things tend to even out in the
second or third appearance, but after that, the batters appear to have
figured things out and the advantage is now gone. Adding a new pitch or
a new wrinkle to a pitcher’s motion may work for a while, but don’t
expect that advantage to last all season. This trend doesn’t bode well
for struggling players like Zito, so if you’re an A’s fan, perhaps you
should just forget you read any of this and read up on regression to the
mean.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Crooked Numbers: Sizing Up Small Sample Size

Thank you for reading

Latest Articles

Box Score Banter: The Chasm $

Brass Tacks: And Then There Were 26 $

Five & Dive, Episode 455: Wicki Wicki Wild Wild West (Yeah)

The Heat Check: A Good Kind of Petty $

What You Deserve B

James Click

Latest Articles

Box Score Banter: The Chasm $

Brass Tacks: And Then There Were 26 $

Five & Dive, Episode 455: Wicki Wicki Wild Wild West (Yeah)