Notice: Trying to get property 'display_name' of non-object in /var/www/html/wp-content/plugins/wordpress-seo/src/generators/schema/article.php on line 52
keyboard_arrow_uptop

The PITCHf/x optical video and TrackMan Doppler radar sensors estimate parameters of pitches, including the speed, horizontal movement and vertical movement. The data recorded by these systems can be used to develop pitcher similarity measures. These measures are valuable not only for comparing major-league pitchers to each other, but also for allowing the direct comparison of pitchers in other leagues (minor, amateur and foreign) to their MLB counterparts.

A pitcher similarity measure can be employed for multiple purposes by analysts. The identification of groups of similar pitchers can be used to generate optimized projection models [18], or to generate larger samples for predicting the outcome of batter/pitcher matchups [3], [20]. In addition, a similarity measure allows for individual pitchers to be monitored over time in order to detect possible changes in pitch characteristics, health and throwing mechanics.

Previous methods for quantifying pitcher similarity have been limited to the comparison of pitches of the same type, which makes these methods highly dependent on the outcome of pitch-classification algorithms. Kalk [8], [9] developed a similarity measure that compared pitches of the same type using variables that included pitch frequency, speed and movement. Loftus [11], [12], [13] improved on Kalk's approach by separating pitchers by handedness while using the Kolmogorov-Smirnov distance to compare distributions. Like Kalk's method, however, this approach only considers comparisons between pitches of the same type.

A difficulty for these methods is that different pitch types for a single pitcher or across multiple pitchers can have similar properties. This causes the pitch-frequency statistics used by similarity algorithms to depend heavily on the classification process; it also prevents the comparison of similar pitches that are classified as different pitch types.

In 2016, for example, Ubaldo Jimenez's sinker averaged 91.12 mph, -7.35 inches of horizontal movement and 8.53 inches of vertical movement, while Jeremy Hellickson's four-seam fastball had nearly identical averages of 90.81 mph, -7.63 inches of horizontal movement and 8.44 inches of vertical movement. Due to this issue, Loftus [13] conceded that his own method is best suited for comparing individual pitches as opposed to comparing pitchers based on their entire arsenal. Gennaro [3] has proposed a more qualitative approach to measuring pitcher similarity by using a hand-selected set of features and weightings. The features used by this method include a pitcher's two most-common pitch types and his most-common two-pitch sequence.

In this work, we develop a pitcher similarity measure that considers the speed and movement of every pitch. We note that other factors that are less indicative of a pitcher's raw stuff such as pitch location [4], sequencing [5], and deception [14] also play a role in determining performance.

Given pitch speed and movement, we can plot a pitch as a point in a cube. Using data from Brooks Baseball, for example, we can plot a thousand Jon Lester pitches from 2016 with the speed (s) in miles per hour, along with the horizontal and vertical movement parameters (x, z) in inches, and where different colors represent different pitch types:

Jon Lester pitches in 2016

Jon Lester pitches in 2016

We also can do this for 1,000 Chris Sale pitches:

Chris Sale pitches in 2016

Lester and Sale clearly have different pitch distributions. But how different are they?

Here's a puzzle: Suppose that each of Lester's pitches in the plot is a ten-pound weight. Without worrying about pitch types, move each of Lester's thousand pitches so that, as a group, they end up at the same location as Sale's thousand pitches. To make this more interesting, find the way to move the pitches that requires the least work.

Too busy to solve the puzzle right now? That's OK. There's an algorithm called the Earth Mover's Distance, or EMD [16], which can figure out the easiest way to move the pitches and how much work is required. The idea is that the less work that's needed to rearrange Lester's pitches to match Sale's pitches, the more similar the two pitchers are to each other. Even better, the EMD algorithm is efficient and can normalize the distributions so that we don't need the same number of pitches in each plot.

Things get more complicated because some paths are more difficult to traverse as we move pitches around in the cube. To be more specific, let's look at a plot of the speed and vertical movement (again, represented by s and z) for a large set of pitches from different pitchers in 2016.

Scatterplot of Speed and Vertical Movement

We see that s and z have a significant correlation, so that a pitch thrown with a higher speed will tend to have a higher vertical movement. This means that moving a pitch with the flow from the orange spot toward the red spot is easier than moving it against the flow toward the green spot. We can address this issue by combining a whitening transform [1] with the Earth Mover's Distance to account for both differences in the variances of the s, x and z variables and their correlation structure.

Since a pitcher's approach depends on batter handedness, we use the whitened EMD to compare pitchers separately based on their pitch distributions against right-handed and left-handed batters. The two values are then combined into a single measure of similarity. If you'd like more details [6] on how this all works, just follow the link.

Data Analysis

We will demonstrate the similarity measure for several applications including the identification of similar and dissimilar pitchers, the identification of unique pitchers, the quantification of year-to-year pitcher stability, and the quantification of pitcher variation with batter handedness and the count. All analysis in this article uses the pitch data from Brooks Baseball and the associated pitch classifications from Pitch Info. Pitch speed will be given in miles per hour, and the x and z movement parameters [15] will be specified in inches.

Similar Pitchers

For the 2016 season, we consider the 196 right-handed pitchers and the 63 left-handed pitchers who threw at least 1,000 pitches during the regular season. For each of these pitchers, the most similar pitcher and the corresponding distance can be found here [7]. Smaller values of the distance correspond to more similar pitchers.

The most similar pair of right-handed pitchers in 2016 was Matt Harvey and Shelby Miller. Both threw four-seam fastballs with similar parameters (speed, horizontal movement and vertical movement) at similar frequencies. In particular, each pitcher threw 59-60 percent four-seamers to right-handed batters, and 56-57 percent four-seamers to left-handed batters, with Harvey averaging 95.39 mph and Miller averaging 94.15 mph on these pitches. We also note that Harvey's slider (89.51 mph, 0.90 inches of horizontal movement, 4.28 inches of vertical movement) was like Miller's cutter (89.41, 1.17, 3.89), and each pitcher used this respective pitch 25-26 percent of the time against right-handed batters. Similarity metrics that do not compare pitches of different type would be unaware of the similarity of these pitches.

The most similar pair of left-handed pitchers in 2016 was Jon Niese and Chris Rusin. The most frequent pitches for each left-hander against right-handed batters were their sinker and cutter, which they threw at similar frequencies and with similar properties. For their sinkers against RHB, we have 89.52 mph, 9.63 inches of horizontal movement, 4.30 inches of vertical movement at 27.2 percent frequency for Niese, and 90.32, 9.74, 4.88 and 24.4 percent frequency for Rusin. For their cutters against RHB, we have 86.74 mph, -0.30 inches of horizontal movement, 3.86 inches of vertical movement and 27.2 percent frequency for Niese, and 87.49, 1.62, 3.78, 29.9 percent) for Rusin. Each pitcher's most frequent pitch to left-handed batters was their sinker, which Niese threw 40.7 percent of the time and Rusin threw 38.8 percent.

Dissimilar Pitchers

The most dissimilar pair of right-handed pitchers in 2016 was Brad Ziegler and Marco Estrada, with a distance of 5.688. The difference largely was due to an extreme discrepancy in the vertical movement on their pitches. Ziegler threw 57.7 percent sinkers with an average vertical movement of -6.72 inches, while Estrada threw 50.1 percent four-seam fastballs with an average vertical movement of 13.01 inches. Ziegler had the smallest average vertical movement, -5.33 inches, over all of his pitches. Estrada had the highest vertical movement at 9.64 inches.

The most dissimilar pair of left-handed pitchers was Zach Britton and Tommy Milone, with a distance of 4.238. Britton threw more than 90 percent sinkers, averaging at least 97 mph and with 3.70 inches of vertical movement. Milone averaged only 88.19 mph on his hardest and most frequent pitch, a four-seam fastball, which he threw 45.5 percent of the time with an average vertical movement of 11.45 inches.

Unique Pitchers

The similarity measure can also be used to find the most unique major league pitchers.The right-handed pitchers with the greatest distance to their most similar match in 2016 are:

Unique RHP

Distance to nearest RHP

Brad Ziegler

2.8651

Jered Weaver

1.7653

Chris Young

1.4429

Steve Cishek

1.3934

Marco Estrada

1.3896

Lance McCullers

1.3648

Fernando Rodney

1.2610

Tyler Clippard

1.2232

Aaron Nola

1.1660

Bryan Shaw

1.1258

Left-handed pitchers with the greatest distance to their most similar match in 2016:

Unique LHP

Distance to nearest LHP

Zach Britton

1.7251

Rich Hill

1.4946

Clayton Kershaw

1.4912

Zach Duke

1.4223

Andrew Miller

1.3264

Drew Pomeranz

1.2464

Tommy Milone

1.1309

Clayton Richard

1.0658

Julio Urias

0.9960

John Lamb

0.9782

Hard-throwing Aroldis Chapman fell short of the 1,000-pitch threshold, but would rank as the second-most unique left-hander behind Britton, with a distance of 1.5495 to the nearest left-hander Tony Cingrani.

Visualizing Similarity

The similarity structure for a group of pitchers can be visualized using non-metric multidimensional scaling [10]. We use NMDS to visualize properties of the similarity measure for unique right-handed and left-handed pitchers. NMDS results for the ten most unique right-handed pitchers plus the two most prominent knuckleballers R.A. Dickey and Steven Wright is:

NMDS Result for Unique Right-handed Pitchers in 2016

The most unique right-hander, Brad Ziegler, is in the far upper right in the figure. Ziegler's uniqueness is largely due to throwing a large amount (57.7 percent) of sinkers with a low average velocity (84.74 mph) and heavy sink (-7.28 inches of vertical movement). The closest pitchers to Ziegler in the plot are Steve Cishek and Aaron Nola, who each threw 40-44 percent sinkers but at a higher velocity than Ziegler. The pitchers in the plot with the highest average velocity over their pitches (Rodney, McCullers, Shaw) are in the lower-right quadrant. In this group, Rodney appears closest to Cishek and Nola due to also throwing a high percentage of sinkers (39.1 percent), but the high vertical movement on his pitches, particularly his four-seam fastball, pulls him to the left of these two. Bryan Shaw has the highest average velocity among pitchers in the figure and appears at the lowest point in the plot.

To the left of Rodney is a group of three pitchers (Estrada, Young, Clippard) who displayed the highest average vertical movement on their pitches among the pitchers in the figure. This high vertical movement was largely achieved by throwing 45-51 percent four-seam fastballs. Above this group is Jered Weaver, who threw pitches with a high average vertical movement, but also had the lowest average pitch velocity in the plot among the non-knuckleballers. Dickey and Wright appear together above Weaver and, as shown here [7], the two knuckleballers are the best match for each other over the 196 right-handed pitchers in the data set. We see that the most dissimilar right-handed pitchers in the entire data set, Ziegler and Estrada, are also the most separated in the plot.

The NMDS result for the ten-most unique left-handed pitchers, plus Aroldis Chapman, is:

NMDS Result for Unique Left-handed Pitchers

The most unique left-hander, Zach Britton, is on the far-right edge of the plot. Britton achieved his uniqueness by throwing a high volume (92.0 percent) of very hard (97.44 mph) sinkers. The closest left-hander to Britton in the figure is Clayton Richard who also threw a high volume (65.0 percent) of sinkers but at a lower velocity (91.59 mph). To the left of Richard and farther removed from Britton is Zach Duke who also threw a large number of sinkers but at an even lower frequency (50.4 percent) and velocity (90.13 mph). The second-most unique left-hander in the group, Aroldis Chapman, who threw a lot (81.1 percent) of very hard (101.32 mph) four-seam fastballs appears at the lowest point on the plot.

On the left side of the figure are four left-handers (Milone, Lamb, Urias, Kershaw) who all favored the four-seam fastball with frequencies varying between 45.5 percent for Milone and 55.3 percent for Urias. The average four-seam velocity for the pitchers increases from top to bottom with mph values of 88.19 (Milone), 90.49 (Lamb), 93.32 (Urias) and 93.74 (Kershaw). To the right of these four pitchers are Drew Pomeranz and Rich Hill, who both complemented their four-seam fastball with a large percentage of curves with sharp downward movement. Hill is the closest pitcher to Andrew Miller in the plot. Since Miller's four-seam fastball is harder than Hill's, and Miller's most frequent off-speed pitch is a slider that is thrown substantially harder than's Hill's curve, Miller appears lower than Hill. We see that the most dissimilar left-handed pitchers in the full-data set, Britton and Milone, are also the most separated in the plot.

Pitchers with Small Year-to-Year Variation

We can use the similarity measure to compare pitchers to themselves over time. For this purpose, we computed the similarity measure between 2015 and 2016 for each pitcher who threw at least 1,000 pitches in each regular season.

Right-handed pitchers who changed the least between 2015 and 2016 (with their age as of June 30, 2016):

RHP

Distance

Age

R.A. Dickey

0.1280

41

Fernando Salas

0.2584

31

Steven Wright

0.2654

31

Bartolo Colon

0.2801

43

Arquimedes Caminero

0.2881

29

Corey Kluber

0.2995

30

Adam Warren

0.3040

28

Jered Weaver

0.3062

33

Max Scherzer

0.3107

31

Scott Feldman

0.3215

33

Left-handers:

LHP

Distance

Age

Jon Lester

0.2581

32

Carlos Rodon

0.3056

23

Jorge De La Rosa

0.3357

35

Francisco Liriano

0.3572

32

Drew Smyly

0.3922

27

Adam Conley

0.3963

26

Patrick Corbin

0.4007

26

Tony Watson

0.4147

31

Gio Gonzalez

0.4150

30

Chris Rusin

0.4169

29

Many of the smallest changers are veterans, with 13 of the 20 pitchers in the tables being at least 30 years old at midseason 2016, and with all pitchers (except Carlos Rodon) being at least 26. Two of the smallest changers are the knuckleballers R.A. Dickey and Steven Wright. Unsurprisingly, Bartolo Colon is also one of the least-changing right-handers.

Pitchers with Large Year-to-Year Variation

Right-handed pitchers who changed the most between 2015 and 2016:

RHP

Distance

Age

2015 ERA

2016 ERA

David Phelps

1.1081

29

4.50

2.28

Trevor Bauer

0.9869

25

4.55

4.26

Kelvin Herrera

0.9639

26

2.71

2.75

Jesse Chavez

0.9227

32

4.18

4.43

Matt Shoemaker

0.9156

29

4.46

3.88

Joe Blanton

0.9063

35

2.84

2.48

Will Harris

0.8785

31

1.90

2.25

Lance McCullers

0.8329

22

3.22

3.22

Noah Syndergaard

0.8240

23

3.24

2.60

Aaron Nola

0.8150

23

3.59

4.78

Left-handers:

LHP

Distance

Age

2015 ERA

2016 ERA

James Paxton

1.4217

27

3.90

3.79

Mike Montgomery

1.0952

26

4.60

2.52

Brad Hand

1.0056

26

5.30

2.92

Matt Boyd

0.9570

25

7.53

4.53

Adam Morgan

0.9151

26

4.48

6.04

Daniel Norris

0.8312

23

3.75

3.38

Drew Pomeranz

0.8008

27

3.66

3.32

Danny Duffy

0.7765

27

4.08

3.51

Jeff Locke

0.7258

28

4.49

5.44

Chris Sale

0.6737

27

3.41

3.34

We see that these pitchers are younger than their more stable counterparts, with only three of the 20 pitchers being at least 30 years old at midseason 2016. Six of the 10 right-handers, and eight of the ten left-handers, improved their ERA from 2015 to 2016. Several of these pitchers (Phelps, Chavez, Montgomery, Hand, Pomeranz) went from starting in 2015 to relieving in 2016. Others near the top of the lists include Trevor Bauer and Kelvin Herrera, who made significant changes to their pitch mix [2] [19], along with James Paxton, who made a significant change to his pitching mechanics [17].

Pitchers with Small Platoon Distances

We can use our similarity measure to compute the difference between a pitcher's distribution of pitches against right-handed and left-handed batters. We considered all pitchers who threw at least 1000 pitches during the 2016 regular season.

Right-handed pitchers who changed the least with batter handedness:

RHP

Distance

wOBA vs. R

wOBA vs. L

Addison Reed

0.0781

.229

.228

Cody Allen

0.0970

.222

.292

Will Harris

0.1592

.263

.229

Lance McCullers

0.1780

.324

.327

Chris Young

0.2242

.320

.476

Adam Warren

0.2338

.343

.258

Vance Worley

0.2352

.318

.333

R.A. Dickey

0.2400

.337

.339

Anthony DeSclafani

0.2486

.260

.353

Steven Wright

0.2517

.303

.271

Left-handers:

LHP

Distance

wOBA vs. R

wOBA vs. L

Adam Conley

0.2157

.316

.334

Dan Jennings

0.2538

.310

.290

Pat Dean

0.2632

.395

.356

J.A. Happ

0.2781

.292

.287

Madison Bumgarner

0.2919

.279

.223

Drew Smyly

0.3004

.328

.305

Steven Matz

0.3076

.296

.307

Tyler Anderson

0.3166

.333

.270

Zach Britton

0.3253

.180

.226

Andrew Miller

0.3339

.207

.220

Several of these pitchers relied heavily on a single pitch type. Reed (72.2 percent), Allen (63.3 percent) and Conley (65.5 percent) threw a large fraction of four-seam fastballs. Dickey (87.6 percent) and Wright (83.1 percent) threw a large fraction of knuckleballs, while Harris (66.4 percent cutter), Britton (92.0 percent sinker) and Miller (60.7 percent slider) also threw a large fraction of a single pitch type. Throwing a similar distribution of pitches to right-handed and left-handed batters is a characteristic of a pitcher's approach, but is not necessarily indicative of his platoon results. While several of the pitchers (Reed, McCullers, Dickey, Happ) who had a similar approach against right-handed and left-handed batters exhibited a very small wOBA platoon split, others (Young, DeSclafani) had large wOBA platoon splits.

Pitchers with Large Platoon Distances

Right-handed pitchers who changed the most with batter handedness:

RHP

Distance

wOBA vs. R

wOBA vs. L

Brad Ziegler

1.8874

.278

.306

Jered Weaver

1.1993

.365

.365

Raisel Iglesias

1.0970

.224

.332

Dustin McGowan

1.0896

.212

.375

Kelvin Herrera

1.0802

.268

.246

Kyle Hendricks

0.9924

.243

.269

Matt Wisler

0.9723

.313

.334

A.J. Ramos

0.9458

.287

.262

Jhoulys Chacin

0.9152

.317

.327

Alfredo Simon

0.8719

.412

.454

Left-handers:

LHP

Distance

wOBA vs. R

wOBA vs. L

Brad Hand

1.1295

.297

.194

Felipe Rivero

1.0366

.272

.343

Tony Watson

0.9595

.302

.253

Tommy Milone

0.9032

.362

.357

Sean Manaea

0.8817

.322

.231

Danny Duffy

0.8480

.325

.201

Kevin Siegrist

0.8313

.269

.302

Rich Hill

0.8279

.244

.232

Patrick Corbin

0.7693

.363

.324

Drew Pomeranz

0.7610

.287

.284

We see that by using very different distributions of pitches to right-handed and left-handed batters, several of these pitchers (Weaver, Milone, Pomeranz) had very small wOBA platoon splits while others (Iglesias, McGowan, Duffy) had large wOBA platoon splits.

None of the right-handers and only two of the left-handers (Rivero and Siegrist) who changed the most in response to batter handedness threw a single pitch type at least 60 percent of the time. Seven of the right-handers (Ziegler, Weaver, Iglesias, McGowan, Herrera, Ramos, Chacin) contributed to their platoon variation by throwing a significantly higher fraction of sliders to right-handed batters and a significantly higher fraction of changeups to left-handed batters. For the purposes of this analysis, “significantly” refers to a fraction that is higher by at least 10 percent. Similarly, four of the left-handers (Rivero, Watson, Manaea, Corbin) threw a significantly higher fraction of sliders to left-handed batters and a significantly higher fraction of changeups to right-handed batters.

Another popular strategy used by six of the pitchers who changed the most (Weaver, McGowan, Hand, Duffy, Siegrist, Corbin) was to throw a significantly higher fraction of four-seam fastballs to same-side batters, and a significantly higher fraction of sinkers to opposite-side batters. Right-hander Kyle Hendricks employed the opposite approach by throwing a significantly higher fraction of sinkers to right-handed batters, and a significantly higher fraction of four-seam fastballs to left-handed batters. Left-handers Milone and Hill enhanced their platoon variation by throwing a significantly higher fraction of curveballs to left-handed batters.

Pitchers with Small Changes after Two Strikes

We can use the similarity measure to compute how much a pitcher changes his distribution of pitches as the count changes. For each pitcher who threw at least 1,000 pitches in 2016, we computed the similarity measure between the pitcher's distribution of pitches thrown before two strikes and his distribution of pitches thrown after two strikes.

Right-handers:

RHP

Distance

Jason Grilli

0.2231

Addison Reed

0.2744

Chris Young

0.2877

Jered Weaver

0.2944

Fernando Salas

0.3022

Alex Wilson

0.3100

Steven Wright

0.3101

Kevin Quackenbush

0.3113

Seung-hwan Oh

0.3137

Jesse Chavez

0.3192

Left-handers:

LHP

Distance

Zach Britton

0.2108

Ryan Buchter

0.2379

Brett Oberholtzer

0.3354

Tony Cingrani

0.3404

Chris Rusin

0.3549

Tyler Anderson

0.3730

Jeff Locke

0.3734

Eduardo Rodriguez

0.3756

Antonio Bastardo

0.3857

Steven Matz

0.4018

The two right-handers who changed the least (Grilli 62.4 percent four-seamer, Reed 72.2 percent four-seamer) and the two left-handers who changed the least (Britton 92 percent sinker, Buchter 84.7 percent four-seamer) each threw a large fraction of a single pitch type in 2016. In addition, several of the other pitchers in the two tables (Wright 83.1 percent knuckler, Quackenbush 63.2 percent four-seamer, Oh 60.6 percent four-seamer, Cingrani 87.4 percent four-seamer, Bastardo 65.5 percent four-seamer) each threw over 60 percent of a single pitch type in 2016.

Pitchers with Large Changes after Two Strikes

The right-handed and left-handed pitchers who changed the most after reaching two strikes in 2016 are listed below. Each of these pitchers threw a significantly higher fraction of a particular breaking ball with two strikes. The pitch with the largest increase in frequency after two strikes over all batters faced is referred to as the Delta Pitch in the lists. The Δf column indicates how much more frequently a pitcher threw the Delta Pitch after two strikes as compared to before two strikes. Brad Ziegler, for example, threw his slider 10.16 percent of the time before two strikes and 40.45 percent of the time after two strikes for a Δf of 30.29 percent.

Right-handers:

RHP

Distance

Delta Pitch

Δf

Brad Ziegler

2.4306

slider

30.29%

Dellin Betances

1.4501

curve

20.86%

Paul Clemens

1.3814

curve

26.17%

Carlos Martinez

1.2009

slider

25.65%

Jerad Eickhoff

1.1797

curve

29.98%

Mike Fiers

1.0923

curve

18.28%

Lance McCullers

1.0913

curve

31.83%

Raisel Iglesias

1.0753

slider

26.01%

Yu Darvish

1.0514

slider

12.19%

Aaron Nola

1.0365

curve

20.94%

Left-handers:

LHP

Distance

Delta Pitch

Δf

Zach Duke

1.3792

curve

29.97%

Clayton Kershaw

1.3174

curve

17.31%

Jaime Garcia

1.1124

slider

34.10%

Brad Hand

1.0659

slider

25.61%

Carlos Rodon

1.0431

slider

24.30%

Chris Sale

1.0141

slider

22.79%

Patrick Corbin

0.9607

slider

33.72%

Gio Gonzalez

0.9532

curve

19.23%

Francisco Liriano

0.9263

slider

31.01%

Blake Snell

0.8758

curve

14.26%

Among the pitchers in the lists with smaller values of Δf for their Delta Pitch, Fiers (six pitch types) and Darvish (seven pitch types) had a large set of possible pitch types with which to adjust frequencies. Left-handers Kershaw and Snell used a higher fraction of sliders with two strikes in addition to a higher fraction of their Delta Pitch curveballs.

Conclusion

We have developed a new tool that analysts can exploit to study a range of application areas. The similarity measure allows the direct comparison of pitchers across various contexts including MLB, MiLB, amateur and foreign leagues which can improve predictions for how a pitcher will perform in a new environment. The identification of similar pitchers increases the sample sizes that can be used to forecast the outcome of batter/pitcher matchups and supports regression to more appropriate population means by projection models. The measure also can be used to monitor pitchers over time, and to develop improved models for the health risk and aging characteristics associated with different pitcher classes.

For fans the new tool reveals similarities that we didn't know existed and shows us, once again, that there's more than one way to find success as a major-league pitcher.

* * *

Acknowledgment

The authors thank Tom Tango and Mitchel Lichtman for helpful comments on a previous draft of this article. All pitch data used in this study was obtained from Brooks Baseball.

References

[1] R. Duda, P. Hart and D. Stork. Pattern Classification. Wiley-Interscience, New York, 2001.

[2] A. Fagerstrom. (June 24, 2016). FanGraphs: Trevor Bauer looks like a completely different pitcher.

[3] V. Gennaro. The Big Data approach to baseball analytics. In SABR Analytics Conference, Phoenix, AZ, March 2013.

[4] G. Healey. The intrinsic value of a pitch. In SABR Analytics Conference, Phoenix, AZ, March 2017.

[5] G. Healey and S. Zhao. Using PITCHf/x to model the dependence of strikeout rate on the predictability of pitch sequences. Journal of Sports Analytics, 2017.

[6] G. Healey, S. Zhao and D. Brooks. Measuring pitcher similarity: Technical details.

[7] G. Healey, S. Zhao and D. Brooks. Most similar match tables, 2016.

[8] J. Kalk. (Feb. 12, 2008). Hardball Times: Pitcher similarity scores.

[9] J. Kalk. (Feb. 19, 2008). Hardball Times: Pitcher similarity scores (part 2).

[10] J. Kruskal. Multidimensional scaling by optimizing goodness of fit to a non-metric hypothesis. Psychometrika, 29:1-27, 1964.

[11] S. Loftus. (Apr. 15, 2013). Beyond the Box Score: Pitcher similarity scores.

[12] S. Loftus. (Apr. 25, 2013). Beyond the Box Score: Testing and visualizing similarity scores.

[13] S. Loftus. (Nov. 25, 2013). Beyond the Box Score: Pitcher similarity scores 2.0.

[14] J. Long, J. Judge and H. Pavlidis. (Jan. 24, 2017). Baseball Prospectus: Introducing pitch tunnels.

[15] A. Nathan. (Oct. 21, 2012). Determining pitch movement from PITCHf/x data.

[16] Y. Rubner, C. Tomasi and L. Guibas. The Earth Mover's Distance as a metric for image retrieval. International Journal of Computer Vision, 40(2):99-121, 2000.

[17] E. Sarris. (June 9, 2016). FanGraphs: James Paxton's new angle on life.

[18] N. Silver. Why was Kevin Maas a bust? In J. Keri, editor, Baseball between the numbers, pages 253-271. Basic Books, New York, 2006.

[19] J. Sullivan. (April 13, 2016). FanGraphs: Now Kelvin Herrera is almost impossible.

[20] T. Tango, M. Lichtman and A. Dolphin. The Book: Playing the Percentages in Baseball. Potomac Books, Dulles, Virgina, 2007.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
johnwood427
7/10
Wow.
lichtman
7/11
Glenn, have you or will you be releasing some kind a spreadsheet of pitchers and their similarity scores, e.g. groups of most similar and dissimilar pitchers, other than the ones included in this article?

Am I correct in understanding that a score can only be computed/expressed vis a vis one pair of pitchers?

Can one impute the score between, for example, pitcher A and B if we know the score between A and C and then again, B and C?

How can one produce a list of pitchers who are all similar to one another? Is it some kind of "least squares" method whereby you look at all possible combinations of scores?
GHealey
7/11
Mitchel, Yes. We have all the distances so we can make them available. I believe that Dan is also working to make the output of the model available on BrooksBaseball.

The similarity measure D(A,B) is computed for a pair of pitchers A and B. If we know D(A,C) and D(B,C) we can not determine D(A,B). We can, however, compute D(A,B) directly. The measure does satisfy the triangle inequality so that D(A,B) <= D(A,C) + D(B,C) or we could say in English that if A and B are both close to C then A can’t be too far from B.

One way to define a ``similarity group'' is as a set of pitchers where the maximum distance between any pair of pitchers in the group is less than some value. We can use clustering techniques to try to identify appropriate groups.
harrypav
7/11
our plan is to make them available here as well (if Glenn is ok with that!)
GHealey
7/11
that's fine with me
lichtman
7/11
One of the first things I would want to do is to look at batter performance versus a large group of similar pitchers in order to test the "batter/pitcher matchup" hypothesis using larger samples than we usually work with.

Similar to what we did in The Book (looking at "families" of pitchers).

Have you done this at all?
GHealey
7/11
This is one of several applications we’re considering. The idea is similar to what’s described in The Book except that we have a new way to generate the ``families''. We don’t have results for this yet.
lichtman
7/11
One other technical question: When using EMD to come up with similarity scores, doesn't that EMD depend on the scale of the 3 axes you are using and isn't that arbitrary?

What I mean is this: Your 3 axes are speed and x and z movement, right? Speed is scaled in mph and movement in inches. If you changed the speed axis to km/hour or feet per second, or anything else, wouldn't that change the EMD?

Same with the movement axes. What if it were cm and not inches? How does the EMD know how much to "weight" movement in one direction compared to another. I mean if we change the speed axis to miles per second, it wouldn't take much to move from Jared Weaver's FB to Aroldis Chapman's. Isn't EMD dependent on what arbitrary units/scale you choose for each parameter? And shouldn't you have some idea as to how you want to weight each parameter? What if, for example, you thought that in terms of similarity, speed differences mean almost nothing - that movement is the most important thing? How would you tweak the EMD algorithm?

But mostly I am confused and concerned with how the EMD algorithm "knows" how to weight each of the parameters? Is 1 inch of movement a lot? 6 inches? How does the EMD algorithm know what is a lot of movement and what is a little? Since it is relative to other pitchers, it doesn't matter, but that's only if there were only one parameter. But since there are 3, YOU have to tell it how to weight each movement, no?
GHealey
7/11
The whitening step transforms the original variables to a new set of variables which are uncorrelated and have unit variance. By applying the whitening transform before computing the EMD, the similarity measure is invariant to any scaling of the axes.
lichtman
7/11
Thanks for all the answers!