To read Brian Cartwright’s Unfiltered post following up on one of the audience’s suggested topics, surf here.
Former Pirates‘ pitching coach Ray Miller had a simple motto: ‘Work Fast, Throw Strikes, Change Speeds’, but there’s more to not walking batters than just throwing strikes as the rate those strikes get put into play also has a large influence in determining bases on balls.
Back in college I had dreams of designing the next great baseball simulation, one even better that Strat-o-Matic. Many hours of Statistics and Economics classes were spent doodling in my baseball notebook, which is sitting on my lap as I type. One of my favorite ideas was modeling the batter-pitcher match up pitch by pitch, calculating the walks, strikeouts and balls in play as a function of two things:
- What percent of the pitches were thrown for strikes?
- What percent of those strikes were put in play?
Walks and strikeouts are interconnected. They both are a function of the strike percentage and the contact rate. In projecting a specific batter/pitcher match up, as in a simulation game, or in general how any observed batter or pitcher would project in a new environment (major league equivalencies) I believed it was not possible to make an adjustment to a player’s strikeout rate without a complementary change in the walk rate, and vice versa. I felt it necessary to do this analysis at the pitch level.
In the early 1980’s I didn’t have the data to verify that my mathematical models were accurate, so the notebook got shoved in a box. Unfortunately, 25 years later I struggle to understand those notes I myself wrote, but now I have something better-Retrosheet!
To find this relationship between balls and strikes and walk and strikeouts, I extracted all the plate appearances from the Retrosheet database where pitch records are present, generally all those beginning in 1988. For each batter and pitcher in each season, I counted the number of plate appearances, balls and strike, and walks, strikeouts and balls in play. The strike percentage (called strikes+swinging strikes+fouls+balls in play/pitches) and the contact rate (balls in play/all strikes) were rounded to decimal places, with the walk and strikeouts per plate appearance calculated for each combination of Str% and Con%.
Modern box scores provide pitches and strikes for each pitcher, allowing calculation of the strike percentage. Subtract hit batters, walks and strikeouts from batters faced to determine balls in play, and divide that by strikes to get the contact rate.
These graphs illustrate the walk and strikeout rates for a low strike% (0.57), average (0.62) and high (0.67), as the contact rate ranges from 0.23 to 0.40.

As the strike rate increases, as would be expected the walk rate decreases. Notice that for each given Str%, as the contact rate increases, the walk rate decreases as well. Without throwing any higher of a percentage of strikes, a pitcher can greatly lower his walk rate by pitching more to contact.
Pitchers who throw a high percentage of strikes have a very flat change in their walk rate as their contact rate is changed. If the pitcher is throwing strikes, whether with a high or low contact rate, he will not be walking many batters, and will have a higher strikeout rate than a pitcher with the same contact rate but a lower strike percentage.
Conversely, free swinging batters who advance a level will face pitchers who allow lower contact rates, ballooning the batter’s strikeouts while leaving his walks low. A patient batter with the same contact rate will not only keep more of his walk percentage, but also have less of an increase in strikeouts.
From 1988 through 2000, the major league average strike% was 0.615 and the contact rate 0.324, although the contact rate had been 0.335 to 0.336 up until 1990, then began a downward slide that accelerated in 1994 and 1995 (as home run rates rose), then settled into 0.314 to 0.316 from 1997 to 2000.
The strike zone was enlarged in 2001, with the strike% increasing from 0.615 to 0.627 and the contact rate dropping from 0.315 the previous four seasons to 0.309.

Here are individual pitching leaders and trailers in strike% from 2001 to 2008
Str% Con% BB% SO% Schilling, Curt 0.697 0.272 0.038 0.250 Rivera, Mariano 0.689 0.268 0.039 0.242 Byrd, Paul 0.688 0.346 0.040 0.122 Radke, Brad 0.687 0.331 0.031 0.141 Wells, David 0.684 0.343 0.036 0.131 Lieber, John 0.682 0.345 0.031 0.150 Smoltz, John 0.681 0.286 0.048 0.231 Towers, Josh 0.680 0.356 0.035 0.124 Oswalt, Roy 0.679 0.298 0.052 0.200 Maddux, Greg 0.669 0.362 0.032 0.148 Str% Con% BB% SO% Zambrano, Victor 0.573 0.311 0.123 0.167 Cabrera, Daniel 0.580 0.299 0.123 0.173 Rueter, Kirk 0.581 0.377 0.071 0.077 Romero, J.C. 0.582 0.297 0.114 0.187 Ishii, Kazuhisa 0.583 0.294 0.138 0.175 Estes, Shawn 0.586 0.336 0.105 0.140
Notice that a difference of only 12% from best to worst in strikes thrown results in roughly three times as many bases on balls. The pitchers with the best control are able to maintain excellent walk rates regardless of their contact rate, but this is not so for the pitchers with poor control.
Kirk Rueter never had a reputation for poor control, as his 0.071 walk rate is below the MLB average of .078, but he the same low percentage of strikes as Daniel Cabrera or Kaz Ishii. The difference is that Rueter allowed balls in play at a very high rate of 37.7% of the strikes he did throw.
The pitching leaders and trailers in contact rate
Str% Con% BB% SO% Prior, Mark 0.644 0.242 0.077 0.274 Kazmir, Scott 0.624 0.244 0.104 0.253 Wood, Kerry 0.619 0.247 0.102 0.271 Johnson, Randy 0.663 0.251 0.057 0.280 Harden, Rich 0.627 0.255 0.100 0.243 Perez, Oliver 0.618 0.257 0.118 0.235 Young, Chris 0.637 0.257 0.086 0.216 Martinez, Pedro 0.657 0.259 0.061 0.265 Santana, Johan 0.667 0.260 0.063 0.261 Peavy, Jake 0.644 0.262 0.073 0.242 Str% Con% BB% SO% Cook, Aaron 0.628 0.384 0.060 0.093 Wang, Chien-Ming 0.620 0.379 0.065 0.108 Rueter, Kirk 0.581 0.377 0.071 0.077 Silva, Carlos 0.655 0.377 0.039 0.099 Mays, Joe 0.625 0.369 0.065 0.102 Anderson, Brian 0.634 0.368 0.051 0.106 Maddux, Greg 0.669 0.362 0.032 0.148 Maroth, Mike 0.609 0.361 0.064 0.111 Duke, Zach 0.632 0.359 0.059 0.118
Oliver Perez at 0.618 and Chien-Ming Wang at 0.620 both have nearly average rate of throwing strikes, but because they are at opposite ends of the contact rate list have widely varying results, with Perez walking 83% more and striking out 118% more batters than Wang. Missing as many bats as Perez does, he can not afford to be anywhere near below average in his rate of throwing strikes. His best season, in 2004 with Pittsburgh, Perez was above average at 64.4%. Battling a knee injury this season, his strike percentage has fallen to 57% and the walks have skyrocketed.
A low contact pitcher must pound the strike zone to keep his walk rate manageable. A pitcher who relies on batters chasing pitches outside the strike zone (such as Oliver Perez or Ian Snell) is likely to be far less consistent, dependent on the skill of the batters in laying off those pitches. This is an important concept to consider when projecting minor league pitchers and batters. Minor league hitters with poor strike zone judgment such as Brad Eldred will likely under perform standard minor league projections.
I want to continue this research by looking at how strike percentage and contact rate change for both batters and pitchers when comparing minor to major league performances. Pitch locations and results are available via Gameday for the Texas and Southern leagues in Double-A and for all of Triple-A for the past few seasons.
- Are major league equivalent walk and strikeout rates calculated from str% and con% more accurate than directly converting walks and strikeouts?
- Can str% and con% be used to identify players less likely to survive a promotion?
- Where pitches are not available, can a regression equation be used to reliably estimate str% and con% from walks and strikeouts?
- This might also allow a pitch count estimator that considers the interaction between walks and strikeouts.
Walks and strikeouts cannot be analyzed independently of one another. Pitch analysis offers a simple tool to better model this interaction when studying batter/pitcher match ups.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
People think I've been harsh with Brian throughout the contest, but it's only because I see so much potential in his writings. He's already a great analyst, but he could be one of the voices that goes beyond analyzing into teaching and proselytizing. There's absolutely nothing wrong with being a stathead who speaks only to the smaller audience that craves that type of content. I think Brian could be much much more, creating statheads and explaining the principles to those, like me, that want to learn more and just don't have the math in our heads to grasp some of it.
In the interests of time, I'll print a question about the source material that I have for Brian, which he can answer in this space: maybe it was the presence of J.C. Romero in the table for leaders and trailers for strike percentage that got me to wondering, but these numbers utilize unintentional walk rates, no? As the presence of a situational lefty reminds us, the incidence of intentional walks isn't distributed evenly among pitchers. I *think* you used UBB%, subtracting intentional walks from total PA as well as walk totals; in calculating the 2001-2008 tallies for Romero, I got an unintentional walk percentage of 11.2% (247 unintentional walks against 2214 total batters faced), and a total walk percentage of 12.9% (285 total freebies), but subtract the IBBs from the 2214 total batters faced to take them out of the PA data set gets me to 11.4%, the number you've reported.
If that's what you've done, I'm a very happy camper indeed, because I hate seeing an elective managerial decision dependent on the vagaries of an individual skipper get lumped into a pitcher's body of work when you're doing performance evaluation of said hurler. It would reflect a care and forethought in your work which others should take to heart, and while I expect nothing less given the quality of what you've done, it's the sort of little thing that might go underappreciated if it isn't brought up out loud.
I've always loved Brian's work, and this was a great read.
(I wish the graphs would pop out when I clicked them though, they are tough to read in this format)
Starting out this article I got really excited because Brian had an interesting topic and he used a great narrative device, which, to me, helps to show his maturation since the beginning of the contest. He was always a good analyst but we needed more personality.
Later, it fell a little flat later for two reasons that have plagued Brian since the beginning:
1. He completely drops the narrative which helps to both explain what he's doing and makes it into enough of a story to pull the less stat-oriented through.
2. I hate to seem like the old grammarian, but there is no thesis. This had 4-6 sentences that basically told me he is going to talk all about the relationship between balls and strikes, but you need one concise sentence that tells the reader what it is you intend to demonstrate/prove with your piece. I feel that a lot of the 'heavy eyelid' comments that Brian gets would go away if he was clear about what he was doing from the get-go. It gets hard to read when you don't know what you're reading about.
Brian is great at doing analysis and I feel like I can learn a lot from him, but the writing still needs to be tightened up.
A 1500 word article is short enough to read to it's conclusion without being told all in the first couple of paragraphs.
The numbers speak for themselves; they don't need to be set-up for the readers like that. All that does in colour the readers' perception of the numbers (which I would argue is a bad thing).
As Daisuke showed last year, walks don't kill you if you can prevent H and HR (he did but was that skill or luck). In 2009 his strike% went up, Contact% went up, BB went down, So/9 stayed high but his H/9 and HR/9 exploded. Maybe Daisuke lost his lucky charm, or his arm problems prevented him from limiting H and HR, not sure, but his lower BB/9 did not help.
That said, I enjoyed the article and look forward to seeing your further research on this subject.
My main interest will be in improving projections, seeing how these numbers change when a batter or pitcher is put in a new environment (which of course builds a better game engine)
Without getting into all the physics of the flight of the baseball available in Pitchf/x (Enhanced Gameday), what it does offer for this type of analysis is pitch type (curve, fastball, slider, etc) and velocity. For example, Bob Walk has talked a lot this year on Pirates' broadcasts about pitching coach Joe Kerrigan's instruction of Zach Duke - how to use the inside fastball to 'speed up the bats', setting the hitters up for the off speed pitch. FanGraphs data has shown Duke's off-speed pitches to be much more effective than before.
Standard GameDay, in use in all of Triple-A and the Texas and Southern Leagues in Double-A, gives the location and outcome (called ball, swing and miss, etc) for each pitch. Whether standard from the minor leagues, or enhanced in the majors, I would like to classify pitches as in or out of the strike zone, the swing rates in and out of zone, and how these change as batters and pitchers advance to a different level of competition (Double-A, Triple-A, Majors).
Brian, thank you for extending a hand to teach people like me.
I don't know Brian's previous work so I don't know if he ever wrote like this before. I just know that he's seemed to merge his storytelling style from the wooden bat league in a way that didn't distract, and in fact, created a good segue into his research/analysis portion. The tone shift was a bit subtle and it'd be nice to pepper some of the analysis with a bit more "tone", but I still liked it a lot.
I did a few articles at SeamHeads, but they don't seem to be available anymore :(
I'm also not sure about the StatSoeak archives, they may have lost some older articles when they switched software last year, and this link only seems to bring up the most recent article http://mvn.com/profile/Brian%20Cartwright
FanGraphs archives look good http://www.fangraphs.com/blogs/index.php?author=11
I glimpsed briefly at your website prior to the competition (and I think, prior to you being announced as a finalist... it was from some comment in a thread you had made). Subsequently, there were various references to your (and other finalists) previous work outside the competition. I avoided reading what people had previously written elsewhere since I didn't want it to affect how I voted in terms of this competition. Needless to say, I have a lot of reading to do once this competition is over :)
I can see why BP is interested, but I am not.
I finished the article feeling confused. I understand there are three possible outcomes at each PA... a K, a BB, or contact. I guess I'll echo the "what's the thesis?" feedback. Is it this?
Higher strike % = Lower walk %
Higher contact % = Lower walk %
Is that real insight? Is there any relationship (or perhaps 2nd order interaction) between str% and con%?
Perhaps the point is more to introduce another set of statistics that may be useful for projections, without drawing any conclusions yet on what the relationships might be. I guess that is ok, but some kind of tip of the hat to the work that has already been done would be nice. Back in January, Eric Seidman showed some analysis on Oliver Perez demonstrating the importance of contact rate on pitches thrown in the zone vs those out of the zone. Shouldn't contact rate in Brian's article be broken down into "in zone" and "out of zone contact", or at least some acknowledgement be made that it could? (see http://www.baseballprospectus.com/article.php?articleid=8452)
I get a similar feeling from the unfiltered post... why just post Oliver projections, when posting the Pecota numbers alongside could be equally as instructive and more familiar to readers here. Introducing new concepts is ok, but not if it is presented with tunnel vision to what others before have done.
Yes, there's probably a relationship between str% and con%, as on average pitches in zone has a contact rate of .478, while out of zone of .299. However, my definition of strikes includes called strikes as they are an opportunity for the batter. Adding those to in-zone bring that contact rate down to .315, not very different from out of zone, so on average I'm not considering it. It does need to be looked at for individual players.
Thanks for the response! For me, the article definitely works as a hook... even if I'm not sure I agree with where you are trying to take these numbers, there is enough substance to have a conversation about it. Other popular fantasy authors who focus on the game "by the numbers" could learn from that. Your willingness to engage your readers in the comments section is a big plus.
Good luck, and hopefully we will all get a chance to learn alongside you about ways of improving projections between levels.
I am not voting for it, since I wouldn't like to read this type of piece frequently, but I would like to ask BP's staff to consider offering him a infrequent but in-depth guest column.
But I'd like to understnad the theory. To me a lot of the discussion on strikes is like "voo doo economics". I had this debate in week 1 when the fellow eliminated last week who said that Voros "proved" that Ks are an accurate measure of pitcher effectiveness (he never "proved it" - he found some evidence to support his theory).
So when I read this: "A low contact pitcher must pound the strike zone to keep his walk rate manageable. A pitcher who relies on batters chasing pitches outside the strike zone (such as Oliver Perez or Ian Snell) is likely to be far less consistent, dependent on the skill of the batters in laying off those pitches."
My question is why isn't a low contact pitcher (or any pitcher) also dependent on the skill of the batter? Don't some batters K a lot? Aren't some batters better at hitting strikout pitchers than others? A side question - would a strikout pitcher prefer to face a low strikout hitter or a high strikeout hitter? To me, this entire area - pitching -remains extremely unsophisticated, especially compared to statistical measurement of hitting.
Anyway, good luck to you and thanks for many good reads the last couple of months.
Also note that pitchers with enough stuff to strike people out while remaining primarily in the strike zone would _not_ be a low-contact pitcher since, instead of walking people, they would either get a strikeout or the batter would put the ball in play. That's how someone like Maddux, in his prime, could get a fair share of strikeouts, walk few batters, yet give up few hits.
Going forward I am going to focus on using this as a tool to study batters and pitchers as they are, using str% and con% instead of bb% and so%, to see if is a more accurate way to project minor leaguers.
Back in the early parts of the competition I offered some advice about using an introduction to frame your reader's attention on the topic. Since then, you've caught my attention in your introduction every time, and I think that's been a big improvement for 'readability'.
My next suggestion is that perhaps you can find a way to offer little mini-summaries and/or transitions with the same 'story-telling' perspective as you've been using in your introductions. And then, I think it would also help to get a bit of a summary at the end, leaving us with a solid and simple take-away from the article, in addition to the "more research" topics.
This would sort of follow the old maxim from Communications 101 - a) tell your audience what you're going to tell them, b) tell them, and c) tell them what you've told them.
In your follow-up post, above, you say "Going forward I am going to focus on using this as a tool to study batters and pitchers as they are, using str% and con% instead of bb% and so%, to see if is a more accurate way to project minor leaguers." That right there is the critical piece of the story that you introduced us to in the first paragraph, but I forgot about it as I was reading along the way. It would be great to remind me why this is important as I'm reading it.
I don't know if this is a problem with your writing as much as it is my own reading comprehension in this internet format. That's one of the reasons that I'd love to see your work in printed format, so I can take notes in the margins and use a highlighter as I read through it.
At any rate, as I've been saying the past few weeks, I think your research is tremendous, and I hope BP has a spot for you, whether or not you win the competition! Good luck!
I won't lie - my vote goes to Ken, whose work I've enjoyed throughout, and who finished very strong. But as another commenter stated, there really should be a place at BP for a semi-regular piece by Brian, delving more deeply into the nuts and bolts of sabermetric analysis. I never thought I'd say that, but there it is.
That said, his research is interesting and potentially very important. The bullet at the end about major league equivalencies was intriguing. All in all, Brian is a fine analyst but needs a strong editor.
My main area of interest is player valuation - how to measure each player's batting, pitching, fielding, baserunning, etc. Part of that is being able to control for the ballparks and level of competition.
My thinking about these things over the years is largely grounded in two concepts - How good were the players I got to see play as amateurs, and can I design the best simulation game ever?
Much of what I write is about the process, giving the readers a peek into the black box of statistical procedures. I want to intereact with other analysts to ensure that the methods are accurate and the best available, and to have the general fandom understand and trust the process.
Once the numbers are run, you have the players. Who's the best prospect? If two guys hit the same, can we seperate them on their fielding and baserunning? Why does Derek Jeter have Gold Gloves? Which pitchers are toughest to steal on? etc, etc.
I have enjoyed the oast year or so of writing on the internet, and especially the last few weeks here at BP. I know I still have many new things to learn, and I hope to be able to share them with you here.
As an O's fan I used to think that once Daniel Cabrera finally harnessed his great stuff and improved on his control he would be an ace. Looking back now that seems silly.
I've seen pitchers add a new pitch, I've seen pitchers adjust the grip and improve a particular pitch, I've seen pitchers adjust their strategy in terms of what pitch to use when, but I've never really seen a dramatic shift in control.