Baseball Therapy: Looking for Meaning Amid the Small-Sample Flukes

February 17, 2014

Last week, we discussed a new method for looking at how players change within a season. Baseball stats are normally denominated in years, but the problem with yearly stats is that they obscure any growth and development—or regression—that might have taken place. Six months is a long time. Players can change. This week, I want to look at how we might use that method to teach us something about the growth of individual players.

I’ve been focusing my attention on one specific decision that a hitter makes, the decision of whether or not to swing at a pitch. It’s a decision that the batter gets to make every time the ball is thrown his way, so we have a large number of data points to work with.

Warning! Gory Mathematical Details Ahead!
First off, if you haven’t read last week’s piece, you’ll need it to understand what’s going on here. The basic idea is that for some hitters, we can see evidence that their recent swing rate over their last certain number of pitches is a better predictor of what they’re about to do than is their seasonal average. I take that as evidence that within the season, something is changing. The nice thing is that we can track those changes.

I looked only at the 2013 season to see which players had a swing rate that could be best described, from pitch to pitch, by some sort of moving average. Out of 258 qualifying players—min. 250 PA and 1200 pitches faced—146 (64.4 percent) fell into this category. For some of them, the best moving average was the swing rate from the last 100 pitches. For others, 500. I tested everything from 50 to 1000 by 50s and let a regression pick the best fit for each player.

For each player in that 64.4 percent, I ran a logistic regression (and by that, I mean 146 separate logistic regressions) predicting the chances that each plate appearance they had would end in an on-base event (walk, HBP, or hit). I keyed the moving average to reflect what it was during the first pitch of the plate appearance. This told me whether there were players who had better (or worse) OBP results when they swung more. There were 17 players who had a significant association between in-season swing rate and their chances of getting on base.

Let me show you a few case studies and how we might use their examples.

The Success Story: Jedd Gyorko

Gyorko’s swing rate was best described by a moving average of 950 pitches, so we pick up his season 950 pitches in, which is roughly two months’ worth. The regression also said that when his swing rate was higher, he was more likely to get on base.

We see that he starts out at about a 47 percent swing rate. But then, something seems to happen in the red box, and he jumps by a couple of percentage points rather quickly and keeps going up. In the green box, he seems to make another jump, although by the end of the season, he is back to where he started before the second jump.

I’ll stop for a moment and let you think about how you would use this info.

Don’t peek.

Ready?

Okay, we know that something changed about what Gyorko was doing. He might very well have made a conscious decision to start swinging more often. Maybe he was just gaining confidence in his rookie year. Maybe he was completely unaware of what was going on. We could pull the date that the upswing started and ask him what was happening around that time. We don’t have that information on exactly what Gyorko was thinking, but the abrupt jump suggests that it was a discrete event. The Padres may want to check in on him and ask.

Maybe Gyorko had been working with hitting coach Phil Plantier and something finally stuck. How to work with the kid might be useful info to know as he continues his career in the bigs. We also know that he has shown evidence of being able to change his approach and to build on something when it showed success. Young players, even good ones, have a lot to learn, and you hope that they show a willingness to do so. Even if Gyorko looks at the graph and says “I have no idea where that came from,” he might go back to that time and see if there’s something else that seems to coincide with that time period. Was he sleeping better? Did he figure out a good game time prep routine that calmed him down? Had he just gotten more comfortable with being a big leaguer? Maybe it affected his approach at the plate as well, and in doing so made him a better hitter. Maybe there’s something he can do more of.

The Uh Oh: Justin Smoak

Smoak is another hitter who has a high span on his swing rate. His present swing rate is best described (in 2013) by his last 1000 pitches seen. The disappointing Mariners’ first baseman/DH shows a steady increase over time in his swing rate, with a few small bumps up along the way. There’s a small problem. Unlike Gyorko, who saw his OBP rise the more he swung, Smoak’s diagnostics suggest that the more he swung, the less he got on base. (Sure enough, his OBP in the month of July was .372, August was .324, and Everything After was .268.) He just sort of slipped away. That’s kinda scary, and maybe illustrative of why the Justin Smoak experience hasn’t been a good one for the Emerald City.

In this case, you can’t pin everything on Justin’s swinging tendencies. Swinging is one part of an overall approach. This is correlational research, and the swinging may not be causing the suck. The suck might be causing the swinging, which may in turn cause more suck…or maybe there’s some other factor at work causing both more swings and less quality. It’s also not a bad place to start looking. Maybe it’s as simple as him falling into bad habits and not knowing how to break them.

The Mountain Range: Howie Kendrick

Kendrick’s graph looks like one of the background images on the original Apple IIe version of Oregon Trail. Kendrick is a guy who is best described by his last 150 pitches, so his sample completely turns over every 40-50 PA (10-12 games?). His diagnostic suggests that more swinging is a bad idea for him as far as OBP goes, but at one point during the season he hit a peak of almost 62 percent swinging. Maybe he figured he’d try something different—and the graph tells the story of how different it was. In late May and June (May OBP, .353; June .414) Kendrick had a valley in his swing rate and put up his best months of the year. And then went back to his old ways (and on the DL in August).

Does that make Kendrick streaky? The fact that his swing rate is best described by only the past 150 pitches suggests that he changes his approach a lot. And the fact that it was correlated with his success suggests that his outcomes depend a lot on his approach. Let’s for a moment assume that these findings are “real.” His approach really did change, and that’s what made a difference in his OBP. It means that when he found something that worked, he wasn’t able to hold on to it. Either he was unaware of what was happening and fell back into his old ways, or he knew but didn’t have the ability to lock in those gains. In that case, we’re saying that Kendrick’s approach meanders all over the place, working its way in and out of places that work for him and places that don’t. Maybe there’s a case to be made for the existence of streaky players—not because they try to be streaky, but because they aren’t really in control of the steering wheel.

The “Sometimes, it’s Not Just a Small Sample Size Fluke”: Michael Saunders

Michael Saunders, first half of the season: .225/.303/.364. Second half of the season: .251/.350/.440.

Fantasy players, you are welcome.

The Hidden Message: Didi Gregorius

We now come to the chart of Mariekson Julius Gregorius. Gregorius, like Howie Kendrick, had a swing rate that was best described by his previous 150 PA, but for him, an approach marked by more swinging seemed to coincide with the times when he was most likely to get on base. Gregorius’s chart fluctuates up and down and up and down and up down and up and down over the course of the season. Near the end, he struggles to keep his head above water.

Gregorius never really spent too much time at his high or low points. I’m left to wonder whether he had enough time where his approach was working to fully appreciate it. Maybe he did. It might be worth the Diamondbacks showing Gregorius this chart and explaining what it says (or just going with “Y’know, when you were swinging more/were more aggressive, good things were happening.”) It’s tempting to just say “Hey, Didi, just go crazy swinging,” but that’s a little too simplistic. Instead, the D-Backs might be able to leverage information that they have from watching him up close, as well as talking to Gregorius himself, to figure out what the periods of higher swinging rate had in common. It’s possible that the answer is “I was just feeling more aggressive and wanted to swing more.” But more importantly, this is a teaching tool waiting to happen.

Becoming a Time Lord
Last week, I hoped to show how this method could model how players change within a season. This week, we see how we can link those changes to changes in performance, at least for some players. Here, we use a prior moving average of swinging percentage (and this is important) as a proxy for the approach that the player was taking at the time. His actual change in approach may or may not have centered around swinging, but whatever it was, it produced more (or fewer) swings. Using this method also gives us a way to time map when things were going well and when they were going poorly. It gives us the chance to see whether those “was it or wasn’t it” streaky periods were the result of actual growth or just a small sample size driven, lucky fluke of a month for the player. Not everything will yield actionable information, but it’s nice to have a few extra leads.

We are quick to dismiss just about anything as a small sample size because of some article written by a guy who named himself after an auxiliary kitchen utensil. We don’t ask the underlying question. Did something change? Can we reasonably make the case that the player figured something out and it made a huge difference? There might be information hidden in that streak.

The power of the method is that while it can’t tell us anything about baseball players as a collective, it can tell us something about specific players. Think of it as the difference of having information about “the market” writ large and an individual stock. There’s opportunity for arbitrage in both of those pieces of information. And of course, we’re also looking at only one possible proxy for a difference in approach (swing rate) and one outcome (OBP). Perhaps there are others out there.

Now, before I go, I should point out that this is all rather preliminary. Did I cherry pick the graphs that made for the best stories? You bet. Did I run a bunch of regressions and (holy mass of Type I error, Batman!) find a few of them that came up significant? Yup. Maybe I’m chasing a phantom. That’s what happens when you write things up in real time. On the flip side, I’d like to think that the theory at least makes sense. Hopefully, you’ll stick with me long enough to find out.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Russell A. Carleton

Latest Articles

You need to be logged in to comment. Login or Subscribe

harrypav

2/17

I like the sausage making articles.

Reply to harrypav

jharring59

2/17

It would seem to me that this has value. Specifically, that my initial impression is that batters with higher swing rates walk less often, any uptick in OBP along with the swing rate must be tied to a higher BA. Maybe these players are seeing the types of pitches more clearly so they are making hard-hit contact (line drives, etc) more often.

Obviously the problem is that it is impossible in real time to know when the approach change has been made, and only in obvious cases like Saunders and Gyorko is there even any possible usefulness, when looking at this as a past performance predicting future results.

Reply to jharring59

dethwurm

2/17

Yeah I don't think it's a coincidence that 4 of the 5 are pretty BB-averse, and the 5th (Smoak) is 1)a total mess under pressure to fix things quickly and 2)not exactly a BB-machine either. Of course there's going to be a notable correlation between Swing% and OBP, that's their ticket on base. It would be nice to see this method used with a more encompassing measure of offensive production like TAv or wOBA.

It would also be great to see this quasi-instantaneous Swing% correlated with quasi-instantaneous OBP and/or other measures throughout the year. Picking by month, especially when by the nature of the method there's only like 3 of them, is pretty coarse. Plus, it might be that there's some maximal correlation between with a moving-average of a metric based on the length of average. Like does this Swing% correlate best with monthly OBP? Daily? Last-10-PA-ly?

It would also also be great to see year-to-year results for a given player. (Does the method work across seasons? Like the last 150 pitches of 2012 predict Kendrick's early-2013 swing%?) It's interesting that Kendrick shows so much in-season variation (which I'd anecdotally kind of suspected from watching him...) since his overall season lines are absurdly consistent. Though 2012 was his worst year and 2013 probably his best, so I wonder if he changed his approach (or, like, meta-approach) to be more reactive between years, or something. Saunders, too, ended up being roughly as productive in 2013 as 2012, though with somewhat different "shapes" (more power less OBP om
12, vice-versa in '13).

So yeah, lots of great stuff here! Really looking forward to the rest of this series!

Reply to dethwurm

pjbenedict

2/17

This seems to be what you hope: A whole new field of study. As I did after your last article, I have a ton of questions, and a lot of interest in seeing further research and some answers.

Run with this!

Reply to pjbenedict

backwardgalaxy

2/17

At the VERY least, it's worth knowing more.

Reply to backwardgalaxy

beeker99

2/17

An AUXILIARY kitchen utensil?! Not in my house.

In all seriousness, though, this is fascinating. I can't wait to see what happens next.

Reply to beeker99

jfcross

2/17

I think this stuff is super interesting and really cool... but at the same time, a little dangerous since it's all too easy to see trends where they don't exist particularly when looking at moving averages (which tend to make it look like there are trends even in a random walk).

Reply to jfcross

jonhale

2/17

Bit of a straw man concerning small sample size concerns. I don't think anybody doesn't want to address the question "did something change" when faced with small samples, there are just very real concerns when dealing with results about about the feasibility of picking out a real signal from all the noise. There might be hidden information, but there is also the possibility that no matter how far you drill down, there just isn't anything useful that can be squeezed out with all the different factors and mutually dependent influences that lead from peripherals like swing rate-->final results, like trying to predict short-term weather on a local level.

I think it's great that you have developed a way to find trends in swing rate (and visualize glaring ones like Saunders'), but going from that to OBP results has uncovered a ton of noise and in turn a lot of narrative and confirmation bias. For both Gyorko and Smoak, I find it a stretch to see a overall correlation either way between swing rate and OBP when binning the data differently suggests wildly conflicting conclusions (i.e. Smoak's OBP falling off a cliff in the last month or so vs his swing rate, Gyorko's second half OBP being much lower than his first vs. his swing rate). And even that leaves us in correlation/causation land, which is not so easily dismissed as requiring a dose of real-world information. When Gregorius struck out 20 times in 80 AB in August, something clearly went horribly wrong -- but was it really his lack of aggression that caused that, or any one of a number of plausible factors that would lead to the same results as well as a lower swing rate? Without more evidence strongly suggesting whether we are looking at a cause or effect, it would be hard to even use this information as a teaching tool.

Not trying to dismiss your work or this line of study; I think this sort of technique could be incredibly valuable if general correlations to performance, even for specific players could be confirmed, perhaps in conjunction with other 'proxies', as you point out. And violent short-term changes in approach, especially if they fly way out of norms or coincide with a corresponding change in performance, could certainly be used as a red flag for further investigation and don't require much of a sample to be significant of *something*. But there are some inherent issues before this type of analysis makes the jump from providing somewhat uncertain food for thought about the adjustments a player made and how that might have lead to changes in his performance in hindsight to being used as a serious diagnostic tool.

Reply to jonhale

newsense

2/18

Is there any correlation between stability of swing rate and time in the league? For a rookie like Gregorius it would seem that pitchers would be trying different things, some of which would promote swinging and others that wouldn't.

Reply to newsense

dmarzo1

2/19

Great stuff
Please excuse the stupidity of my comment...but wouldnt it be helpful to graph OBP on the second y axis to track with your narrative? (Eg your statement that gyorko's OBP increased as his swing percentage increased? ) Or perhaps i am confused on what the current y axis represents ....does it already implicitly capture obp?

Reply to dmarzo1

Baseball Therapy: Looking for Meaning Amid the Small-Sample Flukes

Thank you for reading

Latest Articles

The Reappearing Act of Robbie Ray $

Box Score Banter: The Damp Cave B

The Stash List 2025: Week 0 $

Fantasy Starting Pitching Planner: Week 1 $

Cold Takes: Not Instant, But Close Enough $

Russell A. Carleton

Latest Articles

The Reappearing Act of Robbie Ray $

Box Score Banter: The Damp Cave B

The Stash List 2025: Week 0 $