Believe it or not, most of our writers didn't enter the world sporting an @baseballprospectus.com address; with a few exceptions, they started out somewhere else. In an effort to up your reading pleasure while tipping our caps to some of the most illuminating work being done elsewhere on the internet, we'll be yielding the stage once a week to the best and brightest baseball writers, researchers and thinkers from outside of the BP umbrella. If you'd like to nominate a guest contributor (including yourself), please drop us a line.
Adam is the founder of Project Prospect, a scouting and statistical analysis website. He has been writing about baseball since 2006, when he began covering college baseball and conducting quantitative analysis of minor-league prospects.
Data is one of baseball’s purest byproducts. It’s interlaced with the past, present and future. It provides a platform for discussion. And just as the game has an effect on it, it has an effect on the game.
PITCHf/x data, which is part of a new breed of baseball statistics, can be intimidating and overwhelming. But thanks to amazing efforts by MLB Advanced Media, Sportvison (the creator of PITCHf/x), and a growing pool of analysts, PITCHf/x has made a mark on the game. That said, its roots are still shallow, and relatively few players, coaches and on-air personalities have fully embraced it.
"The only time I hear about that stuff is through the media," Tim Lincecum recently told me. "Reporters came to me early this season and said that I'd been throwing about 17% sliders. I hadn't thrown one slider up to that point."
Madison Bumgarner, Lincecum’s teammate, said his biggest concern with PITCHf/x is analysts aggregating data and masking the situational adjustments a pitcher must make.
As a primer for people seeking applications for PITCHf/x, I’ve detailed a few of my findings about how PITCHf/x can be utilized to improve your scouting eye. But first, let’s take a closer look at Lincecum’s concern about pitch categorization.
How PITCHf/x categorizes pitches
MLB.com’s Gameday application delivers near-live data that includes pitch type, speed, and movement information. Pitch types are defined by mathematical models that are built around velocity, spin, and movement. It’s a constantly evolving, sophisticated system.
“When we first started doing real-time classifications, we had one generic neural net [or mathematical model] for all pitchers, but we learned pretty quickly that wouldn’t work because one pitcher’s fastball can approximate another’s changeup,” Cory Schwartz, VP of Stats for MLB.com, explained in an email. “Ultimately, we built a custom neural net for each pitcher and now have one for over 1,100 pitchers.”
In addition to rookie pitchers and their unique arsenals, MLB.com’s models must also be adjusted for pitchers introducing new pitches and tweaking others, which happens regularly. In Lincecum’s case, he cut his slider out of his arsenal for a while, and MLB.com’s mathematical model still thought it saw some.
“It’s an extremely labor-intensive process, but we recognize the importance of accurate classifications, for fans, clubs and industry partners alike, and have invested literally hundreds of man hours into building the most accurate system possible,” Schwartz wrote. “While some pitchers do throw a very distinct repertoire that can be easily classified, many throw multiple pitches that blend together and are extremely difficult to differentiate from other pitch types.”
Harry Pavlidis, founder of Pitch Info LLC, has devoted considerable time to formulating his own PITCHf/x classifications, which now appear at Brooks Baseball. Thanks to the efforts of Schwartz, Pavlidis, and others, pitch classifications have improved dramatically, and I expect them to continue to improve.
A couple PITCHf/x findings
One of my biggest PITCHf/x projects to date has been creating an algorithm that grades a pitcher’s offerings on the 20-80 scouting scale. The first step of this project was gathering and combing through PITCHf/x data to study variables and compare them with visual data. A handful of scouts have also provided input to my study, particularly with which variables they’d focus on and which they wouldn’t. My objective has been to figure out what makes a pitch a swing-and-miss offering. And I’ve walked away from my initial study with three strong variables.
I’ll get the first one out of the way: velocity. The harder a pitcher throws, the more swing-throughs he tends to get. Glad the data agrees there.
The second is also pretty logical. The variable with the single strongest correlation coefficient—stronger than velocity—for what makes a pitch a swing-and-miss offering is the frequency with which the pitcher throws the pitch. Pitchers with good fastballs tend to throw them a lot. Pitchers with below-average fastballs use them more sparingly. Simple enough. Now let’s get to the juicy finding.
I’ve discussed quick-twitch ability, as it pertains to pitchers, with a number of people in baseball. (Hitters with quick-twitch ability are known for being able to generate elite bat speed). A major-league pitching coach told me he thought pitcher quick twitch could be measured by spin rates, with faster arms imparting elite spin. This would then be anticipated to result in elite “life” that might not show up in raw velocity. I put his hypothesis to the test.
To my surprise, my research showed virtually no relationship between PITCHf/x spin rates and swing-through percentage. I was later cautioned by a front office member about the analytical value of PITCHf/x’s current spin rates.
But I discovered something stimulating and unexpected nonetheless: the correlation coefficient for vertical fastball movement is very similar to the correlation coefficient for fastball velocity.
Could vertical fastball movement be a way to roughly quantify fastball life? Do fastballs that remain on a relatively linear path get more swing-throughs than fastballs that suffer the effects of gravity more strongly on their way to the plate?
I don’t know how one pitcher could throw a fastball that decelerates less than others of the same velocity on its way to the plate, but maybe it is quick twitch. And perhaps our tendency to privilege starting versus finishing fastball velocity (out of the hand instead of over the plate) is a roadblock in the way of a deeper understanding of the data.
PITCHf/x and scouting
I’ve been researching prospects for the last six years, mixing quantitative data and first-hand scouting to further my understanding of the game. PITCHf/x has helped me create a template of major-league pitchers that I can use to evaluate prospects.
Paired with video, PITCHf/x can be a great tool to learn to recognize pitches. When I’m first studying up on a big-league pitcher, I’ll watch him while I have Gameday and its pitch classifications open. It’s a quick and easy way to learn to identify his pitches and compare him to his counterparts. Remembering what the best pitches in baseball and their supporting data look like makes it easier to know what to look for from prospects and amateurs.
I also checked in with a few scouts—who have the luxury of reviewing minor league PITCHf/x data—to see how they use PITCHf/x in their scouting.
“Anything that provides supplemental information to blend with what we see is valuable,” the first scout said. “We're constantly comparing players to what ‘major-league average’ is, and PITCHf/x data for prospects can be no different.”
“It's a useful tool to obtain objective information on a pitcher to supplement the info we have from our scouts,” said the second scout. “It’s one of the first steps to help objectively measure pitchers in the way a scout would subjectively. As it gets put into more and more minor-league parks, the more valuable the information will be.”
The idea of objectifying major-league average is at the core of the 20-80 scouting scale and similar efforts. PITCHf/x gives fans and scouts alike an opportunity to quantify scouting. As the PITCHf/x database continues to grow and more information from it is studied, templates to objectively evaluate pitchers with data—like the algorithm I’m working on—will be written. The data is too good for it not to head in that direction.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
I would imagine there i no relationship because there is so much that effects why a hitter swings and misses. The previous pitches in the at bat. The location of previous and current pitches. How the pitcher has set up the at bat. The count.
My study included thousands of pitchers and compared specific pitch types. Three variables had correlation coefficients of 0.4 or higher when compared to swing-through percentage (frequency, velocity and vertical movement). Spin had a correlation coefficient of almost exactly zero. Yes, there are many variables and situations to account for, but I don't think there's enough noise in the data, after all the attention that has been put into it, to completely dilute any one variable. If the spin rates that Pitchf/x generates correlate with a pitcher's swing-through rates, I'd expect to see some sign of that.
Has anyone tried a similar study, with or without Pitchf/x spin rates, and gotten different results?
I almost included Kyle Davies as an example of a pitcher with above-average fastball velocity and vertical movent who still struggles, with my hypothesis for why being his lack of deception -- I posted a Tweet about that when I first started my study with Brooks Baseball data.
There's absolutely more to a fastball, or any pitch, that what's captured by Pitchf/x. But with the data we have now, we're still turning a corner where merely referencing a MLB pitcher's average or peak fastball velocity and using that information alone to deem a pitch above or below average is not utilizing relevant, public information to make an informed judgment.
And translating our thinking to draft prospects and the minors, we're learning more about how to quantify the attributes of a pitch that make it special, beyond raw velocities. We have video to study those attributes. And we can apply some of our knowledge to subjects, even if we don't have Pitchf/x data on them.
Now, I will admit that vertical movment is more likely to lead to swing-and-miss than horizontal movement, given the dimensions of the bat and ball.
From a physics standpoint, the spin data should correlate, as you eluded to. It sounds possible that the Pitchf/x data collection method could be leading to this result.
Not much more. The actual spin is not measured nor is the actual release point. Everything else you would ever want to know about the pitch IS measured (i.e., the full trajectory). The trick is finding the right way to look at the data from a scouting point of view.
Also, we're not going to be able to quantify how close a pitcher was to his target. Some catchers put up early targets and we can approximate how close the pitch was, but others establish a target after the pitch is thrown or signal for a pitcher to throw in a general area. How well a pitcher can locate his pitches plays a huge role in his performance.
Don't you think measures of angular momentum would tell us more about the effect of spin on a pitch than frame-by-frame inferred movement?
http://webusers.npl.illinois.edu/~a-nathan/pob//SpinAxis.pdf
One side note: It's a little surprising that Trackman would spend time conducting studies that use ERA and batting average to evaluate hitters and pitchers. Maybe it was a quick thing someone put together for SI, but you'd hope someone with a data analysis product that's being implemented in baseball would hone in on some better variables.