Notice: Trying to get property 'display_name' of non-object in /var/www/html/wp-content/plugins/wordpress-seo/src/generators/schema/article.php on line 52
keyboard_arrow_uptop

Earlier this week we introduced the run estimator SIERA, providing a general summary of its purpose as well as the evolution of its development. Today, in Part 3, our focus will shift to the quantitative side of the metric, offering a detailed look at the data used to derive the formula as well as specifics pertaining to the regression analysis techniques used. The transparency should provide a better understanding of the integrity of such a process as well as a few insights into the SIERA-laden approach towards pitcher valuations.

The Data

All data used throughout this process, be it the calculation of SIERA or the various other comparative estimators, came from Retrosheet, a monumental achievement in the world of data without which several advancements in the field would not exist. The first step involved extracting seasonal tallies from the main events table, with statistics being grouped by pitcher, team, and year. This way, a pitcher with stints on various clubs throughout the same season would carry a different entry for each; Cliff Lee as both a Phillies and Indians pitcher last season. Next, using the Lahman database, the pitching park factor was added to each row in the table.

Park-adjusted ERA was then calculated, though only half of the park factor was applied to the individual pitchers given that only half of a team’s games are played at the home stadium. If a pitcher ended up with a PPF of 105, instead of taking 95 percent of his ERA, 97.5 percent was taken, equating to one-half of the difference between the actual park he called home and one considered to be neutral. With the adjustment applied to raw ERA, the next issue to bypass involved batted ball reliability.

While Retrosheet provides a fantastic wealth of information, batted ball data is realistically only usable from 2003-present. The major reason for this involves how balls put in play were scored, as the processes implemented have not been consistent. Before 2003, batted balls were only recorded on outs, meaning that a ground ball single through the third base hole counted as a single while a ground out to the second baseman went down as a grounder. Both are ground balls, but this rather vast issue precludes the usage of batted balls prior to that season.

Only data from 2003-09 moved onto the next round given this restriction. With that table in place, the QERA formula was unfoiled and the nine emerging terms were calculated for each row in the database table. The data was then ready for further processing and rigorous study.

The Results

SIERA was first estimated with 10 parameters: an intercept and the nine aforementioned terms that surface once QERA is unfoiled, which involved regressing park-adjusted ERA on all nine terms. The results can be seen below:


VARIABLE                 COEF. T-STAT P-STAT
Constant                 6.368 16.97  0.000
SO/PA                  -18.341 -7.10  0.000
BB/PA                    9.471  2.00  0.046
(GB-FB-PU)/PA           -1.807 -1.60  0.110
(SO/PA)^2               10.254  1.98  0.048
(BB/PA)^2                6.833  0.33  0.742
((GB-FB-PU)/PA)^2       -7.063 -3.93  0.000
((GB-FB-PU)/PA)*(SO/PA)  9.661  2.38  0.017
((GB-FB-PU)/PA)*(BB/PA) -3.208 -0.44  0.661
(BB/PA)*(SO/PA)          2.828  0.18  0.857

Before getting into what the data originally said, a description of the columns is in order. The first column lists the variable in question while the coefficients were estimated by the regression. The t-statistic describes how many standard deviations from zero the coefficient strayed and the p-statistic tells us that, if the effect of the variable on park-adjusted ERA were actually zero, what the probability is a coefficient that far from zero would surface.

It is commonly accepted that p-stats less than .05 or .10 are probably different from zero. Unfortunately, reliable data for balls in play only exists from 2003-09, which means that we are unable to get many coefficients that make sense to be significant. Our intuition helped to build this model, with an understanding that as pitchers get back on the mound and throw some more games even more accurate results can be had. Note that the above table does not show the final formula for SIERA, but rather the original estimation using the entire formula for QERA regressed on park-adjusted ERA. Also note that the data used to build the table above originally came from 2003-08, not 2009; the latter year was excluded for the purpose of eventually testing a regression on an outside element. However, to contrast it with the table below, the table above includes 2009 data as well even though our original tests left out 2009 data for honest testing procedures.

What immediately stands out is that the quadratic term for walks is not significant, the .74 p-stat indicates that there is a 74-percent chance that you would get a value further from zero than 6.833 if the true quadratic effect of walks on ERA was zero. The conclusion: the effect of walks on ERA is linear but perhaps with interactions with strikeouts or ground balls. It is also evident that the effect of strikeouts and walks is non-existent as well. This seems plausible, seeing as there is no reason to assume walks increase ERA more for high strikeout pitchers as opposed to those with low whiff totals.

Two quadratic terms are significant as is an interaction term. The interaction between walks and ground balls could have been dropped, but intuition chimed in and kept it afloat because the significance of the interaction of strikeouts and ground balls forces honesty and requires the presence of the former interaction. The reason this interaction is believed to be clinically significant is that pitchers who strike more batters out allow fewer singles and need fewer double plays. This is true for walks as well.

Removing the other two insignificant terms sends the walk and ground ball interaction term closer to significant, but still far from it. It is our belief that including this interaction gives a more accurate prediction of a pitcher’s skill level and that the reason that the coefficient is insignificant is that the sample size is too small. Some of the other effects are even crisper when the regression is analyzed with the two insignificant terms removed:


VARIABLE                 COEF. T-STAT P-STAT
Constant                 6.262 28.07  0.000
SO/PA                  -18.055 -8.39  0.000
BB/PA                   11.292 12.81  0.000
(GB-FB-PU)/PA           -1.721 -1.57  0.116
(SO/PA)^2               10.169  1.97  0.049
((GB-FB-PU)/PA)^2       -7.069 -3.94  0.000
((GB-FB-PU)/PA)*(SO/PA)  9.561  2.38  0.017
((GB-FB-PU)/PA)*(BB/PA) -4.027 -0.58  0.563

Four terms are worthy of further explanation as they are significant, or close enough to significant, like in the case of the linear term in (GB-FB-PU)/PA since its square proved to be significant. Each will be explained separately:

  1. (SO/PA)^2 has a significant and positive coefficient, even though the linear SO/PA has a significant and negative coefficient. Essentially, this means that although increasing strikeout rate lowers ERA, whiffing more hitters has a diminishing effect on run prevention. If you take the derivative of SIERA with respect to SO/PA (the amount that SIERA changes relative to the change in SO/PA at a given level), you get the following:

    -18.054 + 20.337*SO/PA + 9.561*(GB-FB-PU)/PA

    Ignoring the third term for now, but for a given (GB-FB-PU)/PA of zero (to simplify calculations), an increase of strikeouts from 0-1 percent decreases ERA by about 0.179; an increase in strikeouts from 10 percent to 11 percent decreases ERA by about 0.138; and an increase in strikeouts from 20 percent to 21 percent decreases ERA by .097. Basically, strikeouts are more useful with runners on base, as the more whiffs one tallies, the fewer runners that reach base. Strikeouts have a gradually diminishing effect on run prevention as someone who strikes out 90 percent of hitters he faces is not doing much harm by allowing a few more balls in play.

  2. ((GB-FB-PU)/PA)^2 has a significant negative coefficient, adding to the negative coefficient on (GB-FB-PU)/PA. The more ground balls a pitcher allows, the more he will benefit from even more worm beaters, due to the fact that ground balls are often singles. Due to this, there will be more runners to double up. The derivative of (GB-FB-PU)/PA can be seen below:

    -1.721 – 14.138*(GB-FB-PU)/PA + 9.561*SO/PA – 4.027*BB/PA

    Using league average strikeout and walk rates, a (GB-FB-PU)/PA from 0-0.05 would drop ERA by 0.104; from 0.05-0.10 would drop ERA by 0.139; and from .10-.15 would drop ERA by 0.174. The more ground balls a pitcher gets, the more he will benefit by getting more of them.

    The interaction terms that follow are probably even more important, and represent the major contributions of SIERA to ERA estimation.

  3. (BB/PA)*((GB-FB-PU)/PA) has an insignificant but negative coefficient. This suggests that pitchers with higher walk rates will prevent more runs by generating ground balls than pitchers with low walk rates, but it does not offer statistical proof. However, for reasons suggested above, the significance of the interaction between strikeouts and ground balls implies that this is probably true as well, and that only sample size is holding us back. There are two reasons toexplain the negative effect.

    First, pitchers who put more runners on first will get more double plays from generating ground balls than pitchers who do not put many hitters on first base.

    Second, pitchers who allow more fly balls allow more home runs. These are more likely to be solo home runs if the pitcher does not give out many free bases on balls so they will not be as damaging. Getting ground balls is even more important for pitchers with high walk rates since they can avoid multi-run dingers. It seems particularly inaccurate that FIP puts a coefficient of 13 on HR/IP for all pitchers, regardless of their walking exploits. Solo shots do something different to ERA than grand slams.

  4. (SO/PA)*((GB-FB-PU)/PA) has a significant positive coefficient. Pitchers who strike out a lot of hitters benefit from ground balls less than pitchers who allow many hitters to put the ball in play.

    This follows the same logic as the walk/ground-ball interaction, since pitchers who strike out fewer hitters allow more balls in play. This leads to more runners reaching base who will conceivably be doubled up ground balls and allowed to score on multi-run fly ball blasts.

Thus, these four points have shown us that strikeouts have a diminishing return as you accrue more of them, ground balls have an increasing return the higher your tally, and ground balls are more beneficial to pitchers who allow more walks or balls in play, especially because fly balls are more detrimental to pitchers who allow more runners on base.

How beneficial are these results? In Part 4 of our introductory series on SIERA, the estimator will be put to the test at both predicting same year ERA better than other estimators that use similar statistics and at predicting future year ERA than any other estimator out there.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
sunpar
2/10
Cool stuff. Just a thought I had-- Have you tried using (BB-IBB)/PA as your walk rate variable (and maybe using another variable that is (IBB/PA)? Presumably all intentional walks are done to reduce the number of runs allowed, while in your regression all walks will increase ERA equally. Then again, I assume the scarcity of intentional walks will make this addition insignificant.
swartzm
2/10
Thanks. We did play around with IBB a little bit, but some of the problem is that it is difficult to differentiate between IBB where the pitcher gives up after getting into a 2-0 or 3-1 count and direct IBB from the first pitch, and then to separate even further the difference between those IBB and just pitching around people.

There certainly was some indication that IBB led to fewer runs, particularly with respect to the ground ball term, but at this sample size we figured it was probably best not to do something that could be construed as data mining. We also felt that the gains from distinguishing between BB & IBB seemed negligible anyway. That is a good point, though. Thanks for highlighting it.
EJSeidman
2/10
The BB and IBB discussion is one Matt and I had for a long, long time, but we ultimately felt that the difficulty in differentiating the types of IBBs muddied the waters and for now just felt more comfortable using the term in its current state. But it is definitely something we were conscious of throughout this process.
Clonod
2/10
I don't think the roll-over glossary text for SIERA is long enough, guys.

Honestly, though, great work. This is neat stuff.
EJSeidman
2/10
You're right... we should add in more paragraphs.
dwachtell
2/10
To nit-pick on the above point, perhaps it's my smallish laptop, but the rollover for SIERA (and QERA, and I think EQA, too) are all too long for one page, and can't be scrolled through. Maybe your rollover pop-up should be wider? Or am I the only one who hasn't figured out a work-around for this?
EJSeidman
2/11
Yes, I noticed this too when the SIERA glossary rollover was mentioned above and couldn't figure out why I couldn't read the whole thing. Definitely something to work on.
NathanJM
2/10
Perhaps a stupid question on Park Factors... But if you have the retrosheet game data and you have the individual Park Factors, why wouldn't you Park-Adjust on the per-game level before aggregating ERA rather than applying it after the aggregation and just assuming that half the games pitched were at home? Or is this just because it's much easier to do that on the inputs than on the projections?
swartzm
2/10
Yeah, it was just easier to approximate that half the games were at home. I can't imagine that there would be enough noise there to affect the results. I guess also since we were running a regression with park-adjusted ERA as the dependent variable, it was less important to be precise with park-adjusted ERA because the coefficients would be unbiased even if the park-adjustment was noisy. Noisy independent variables would bias the coefficients towards zero, though, so I think that would have been a bigger issue.
DrDave
2/10
Question: are you folding HBP into BB, or ignoring them? With some starters in the 15+ HBP per season category, it could make a clinical difference.
swartzm
2/10
We didn't include HBP. I just did a little re-checking and but I remembered correctly that it doesn't seem like it would have changed all that much. It might be a small improvement, though, and we might look into it in the future as we get more data, but it was probably too small of a factor to consider. This is a good point, though, and worth checking as we get more years of data, especially if HBP are very persistent which I suspect they are at least somewhat.
rowenbell
2/11
Interesting question as to whether HBP is persistent among pitchers. The Strat-O-Matic world view is that HBP is entirely a hitter's skill. Of course, a lot has been learned about baseball since the SOM engine was developed many years ago. Dave Stieb was my boyhood idol, and he certainly had consistently high HBP rates.
DrDave
2/11
Somewhere, Ron Hunt and Craig Biggio are systematically shredding a copy of Strat-o-Matic between them...
nosybrian
2/10
Why does the little box "definition" of QERA say that the "formula was described most verbosely by Nate Silver...."? Was this little popup intended as a criticism?

Verbose means "wordy." Silver's description wasn't wordy. A simple edit will do here: "... formula was first described by Nate Silver...".
EJSeidman
2/10
I think that the glossary entry was saying more that he described it in the most detail in that particular article, given he explained it elsewhere, but I agree, sounds odd. Thanks for pointing it out.
joenemanick
2/10
I hope you keep an eye on the BB*GB factor going forward. That p-stat is pretty bad. I know it make intuitive sense to keep it, but if we're trying to verify our intuition with data, we have to let the data speak.

Have you done multiple train-test experiments on your data set with and without this parameter? Not only just train on 2003-2008 test on 2009, but also 2003-2005 + 2007-2009 and test on 2006, etc. etc. and see if BB*GB ever has a significant predictive value?
swartzm
2/10
We definitely will be keeping an eye on the BB*GB term. The problem is really that we suspect this term has an effect but that even a perfect term that accurately captures the effect probably would not be statistically significant because we only have 7 years of data.

We tested it on individual years and sets of years and the coefficient jumped all over the place from much more negative even to more positive. The -4.027 number is probably a pretty good approximation, though.

If we left it out and re-ran the regression, it would move the SIERAs by no more than .10 runs, which is pretty much the magnitude that you would expect the term to be. It's important, but it's not going to show as significant in this sample size.

Thanks for the suggestion though. Definitely was an important thing to check.
DrDave
2/11
I understand the intuition, but it's not like your p-value was .15 or something; it's huge. The regression is yelling "even with this little data, I can tell this term is totally irrelevant". It's not impossible that this is just bad luck in the sample, but it's really unlikely.

The other possibility in cases like this is some kind of multicollinearity -- that there's another term that is sufficiently correlated with GB*BB that you can't interpret their coefficients independently. Did you check for that?
swartzm
2/11
I see your point, but it really wouldn't change a single SIERA by .10 and it's a matter of preference. The reason I don't agree is that I think that the effect is real but close to -4.0. So the type II error of rejecting anything less than -15 is very, very high. It's a matter of intuition in this case. Especially given that the variance in GB*SO is high enough that the regression said it was positive.

I don't know what else it would be correlated with that would get in the way. I doubt it though. If you think about the implications are of high walk rates and high ground ball rates, you'd think it adds a few double plays a year to have both skills, which is exactly what this type of coefficient around -4.0 would suggest.
nosybrian
2/11
This may be an "irrelevant variable" from a statistical standpoint. It's theoretically (or logically) aproprite. But the coefficient isn't statistically significant. Howevever, it's inclusion doesn't distort the estimated effects of the other regression coefficients. In that case, it's often reasonable to do exactly what you're doing. I recall a nice old discussion of this in an econometrics book by Rao and Miller.

The main "cost" of including this irrelevant variable is parsimony. While you are arguing for logical completeness (and keeping open the possibility that the term will matter when you extend the data set), it just makes your equation a bit busier than it needs to be.
nosybrian
2/11
--sorry for the typos....
DrDave
2/12
It's not that hard to check for direct correlation among your predictor variables in the model. What does the variance/covariance matrix of the independent variables look like? Many stats packages will provide that as an optional output. It doesn't spot variables that are linear combinations of more than one other variable, but it spots direct correlation of 2 independent variables.

A variable that really ought to be signficant, but isn't, is a possible warning sign of multicollinearity. There's some pretty good discussion and advice at
http://www.nd.edu/~rwilliam/stats2/l11.pdf
nosybrian
2/10
It may be worth a brief mention that QERA was applied by Nate without any park adjustments. That's perhaps one reason why he regarded it as a short-hand toy.
nblascak
2/10
As someone who has some knowledge about statistics, but not a lot, the added logic and explanations in the article are very welcome. Keep up the great and interesting work!
blcartwright
2/11
I agree with nahtnJM - As you have the play by play, it is not very difficult to do park factors as a weighted mean, counting how many batters each pitcher faced in each park.
ericmvan
2/11
First of all, and this is a big we-need-to-start-over-and-run-the-numbers-again mistake (although I think ultimately we're only talking about tweaking the coefficients): I am fairly certain that the (Pete-Palmer, Total Baseball designed) Park Factors in the Lahman database do not have to be cut in half, because they are already designed to be straight multipliers. Compare them to the straight Run Indexes published each year in the Bill James Annual, or plow your way through the technical explanation at

http://www.baseball-reference.com/about/parkadjust.shtml.

Biggest unanswered question: what's the minimum BFP for inclusion? I've found no loss in year-to-year K/PA correlation down to 260. If you didn't go that low, you can increase your sample size.

My biggest disappointment is that you started with ERA. Granted, that's the stat we look at. But there's no good reason to ignore the (very accurately) quantifiable errors in RA caused by good or bad inherited runner support. You have that data here ("Fair RA," IIRC).

And you probably should have wrestled with R vs ER. Personally, I believe in keeping track of UER but doing it exactly the way you adjust for inherited runners -- the pitcher is credited with the average change in Run Expectancy caused by the error rather than the number of UER that actually end up scoring. (This actually only works for ROE; for errors leading only to base advances you need a "subsequently rendered moot" adjustment, ao it does get tricky.) I bet there's a correlation of GB% to errors and hence UER ... you may have been better off regressing to Fair RA (adjusted only for inherited runners) with a separate term estimating ER/R. Or regressed to Fair RA and used a fixed ER/R, which is just using RA but scaling it to look like ERA.

Finally, I've never kept a term with p = .56 no matter how strongly I felt it deserved to stay. That is not trending towards significance and I think it's wishful thinking to expect it to get there with a bigger sample. Although I am at a loss to explain why it's not showing up. I would experiment with taking out the straight, non-squared GB term and see if that helps this one.
swartzm
2/11
Eric and I can check into the Lahman database thing park factor issue. I'm not sure about this yet.

We used 40 IP as a minimum.

We checked RA by the same method (though not FRA) and got basically the same coefficients with the intercept being about 0.4-0.5 higher, so since people are familiar with ERA this is easiest to do. Fair RA is an intriguing idea, though.

The reason we kept the GB*BB term with p=.56 is that (a) we don't think the effect is bigger than something around -4, and it would take 20 years of batted ball data for it to be significant, and (b) the exclusion of it, while re-running the regression and generating new SIERAs would not change anybody's ERA by 0.10. It's just too small of a difference to make a fuss about.
ericmvan
2/11
40 IP is actually lower than I think might be safe; that's about 170 BFP and the Y2Y correlations seem to start falling off more steeply below 200. But probably no big deal*.

You really should try removing the straight GB term (the rationale being that you've already got it squared and there's no logic that says it needs to be fully quadratic) and see what happens with the GB*BB one. I'm just personally curious because I've done so many of these multiple regressions and I've seen a lot of funky things happen when you take out one term.

*It's worth noting, though, that increasing the sample size with noisy data can give you worse (less significant) regression terms.
swartzm
2/11
That actually does bump the GB*BB term to -10 and up to weak significance (p=.07), but why would you take out the linear GB term. It's effectively equivalent to limiting the minimum effect of GB% to exactly where GB=FB+PU.

Think of it as a regression showing an equation of:

SIERA = a + (b + c*SO_PA)^2 + d*BB_PA + (e + f*GB_FB)^2 + g*GB_FB*SO_PA + h*GB_FB*BB_PA. This way the effect of GB_FB is minimized at a value determined by where f = -e which can move rather than where f = 0. It's a more general assumption to leave it in there even if it cuts the GB*BB term in half and makes it appear insignificant.
ericmvan
2/12
I don't follow this logic at all. As far as I can tell, the minimum effect of GB happens when GB = FB + PU in either form of the metric. When that happens, the linear term, the squared term, and the interaction terms all become zero. When GB > FB + PU or GB < FB + PU the term becomes non-zero and you start to see GB loading on the metric. Your final equation does reflect the reality of the situation (GB rate is minimized at at unknown value) but your constant e is just an unknown portion of the overall constant a + b + e.

In general, I don't think there's any rationale for keeping a term as both linear and squared if the squared term is significant and leaving out the linear term improves the overall regression. In this case, the interactions of GB with itself and with K and BB rates appear to be so important that if you include them you don't need to include the term directly. That may make the seeming illogic of not having the term directly more palatable to consider.

The one problem I can see in general is that a pitcher with GB = FB + PU is not at average pitcher and yet he's the baseline that's determining the constant (i.e., he's contributing your unknown variable "e" to it). I would have begun by normalizing all the data, so that GB_FB = 0 meant a pitcher with an average rate. This would be the best solution to the problem you're worrying about, since by definition the effect of GB can be regarded as minimized for an average pitcher. Then you run the regressions and you convert the coefficients to useful ones by reversing the normalization.
Tom9418
2/11
You say missing a bat is a good indicator, but a pop-up isn't. Would not throwing a pitch that the batter swings under be a "super pop-up"? By your logic in the first article, a pitch that the batter swings under would be a very bad indicator.

Second random comment, but has anyone ever looked at foul balls or swinging strikes as a % of strikes to hone predictions, or does that stuff all flow neatly into K rate?

Thanks.

swartzm
2/11
There probably is, but it's probably canceled out by the grounders and super-grounders, another benefit of the (GB-FB-PU) term where GB includes balls chopped into the ground in front of the plate as well balls that one-hop between the SS and 3B. But if the pop-ups happen more often than the choppers, it's an indicator that the pitcher is throwing the ball on a trajectory that generates upwards spin, and therefore is home run prone. The key is that pop-ups/batted ball is correlated with fly ball rate, which is correlated with home run rate.

I don't know about foul balls and swinging strikes as a percent of strikes, but I suspect it would be interesting to look at. If anybody has, Russell Carleton has-- he's the foul ball expert.
TomLongwell
2/11
Not sure if I have this all correct, but it looks like you are treating the data as a cross-section, correct? It's pretty obvious you have a panel data structure here, so wouldn't it make sense to at least include year and league fixed effects?

I would also be concerned about correlation in the error term across years for a given player, for obvious reasons, although that probably isn't something easily controlled for.

This is still a pretty impressive piece of work, so bravo to the authors.
swartzm
2/11
Hmm...I would really like that approach only it doesn't do a good job of projecting next-year ERA that well.

Why would the error term being correlated across years for pitchers matter as much for what we're doing? I agree that's definitely true, especially because it includes team defense, but still I'm not sure that is a big deal.

Thanks for your comments and approach. It's interesting to see how you frame it.
TomLongwell
2/11
Glad to help. I guess the panel approach strips out too much of the variation in the data, then.

As for the autocorrelation, it can bias your results and lower efficiency. Basically I'd be worried that, for example, the coefficient on GB% would be biased high if it's picking up on good infield defense, since you'd expect teams with good infield defenses to favor ground ball pitchers. I guess that wouldn't be a problem for forecasting unless that tendency is changing over time though. Your standard errors would be biased either way, unfortunately.

I suppose a good part of my concern is run-of-the-mill applied economist paranoia. Obviously here the important thing is the ability to forecast, not on the interpretation of the parameters.