BP Comment Quick Links
| Home | Unfiltered | Articles | Newsletter | Statistics | Fantasy | Events | Radio | Glossary | Search |
![]() |
|
|
|
November 6, 2009 Checking the NumbersDetecting Discipline
Ever since Billy Beane wrote Moneyball (right, Mr. Morgan?) in order to prove that the true path to success involved only seeking the services of high-OBP employees, analysts of several varieties have worked diligently to discover market inefficiencies worth exploiting. One of the areas that has risen to prominence recently, likely due to the increased availability of the data, focuses on plate discipline on both sides of the spectrum—for hitters, or induced by pitchers. Data providers such as Baseball Info Solutions record information based on the strike zone in a plate appearance, determining the percentages of swings and contact on balls both in and out of the zone, as well as the rate of pitches thrown or observed that fell in the zone itself. This type of granular information affords analysts the opportunity to track tendencies such as which hitters chase more pitches out of the zone or which pitchers induce these chases more often. However, the numbers remain a tad ambiguous given that their application is largely contingent upon conventional wisdom; higher rates of out-of-zone swings are bad, mmmkay? This isn’t always the case, though, and the rarely discussed inverse of taking too many pitches inside the zone could also be considered poor in process. Luckily, with the ever-growing PITCHf/x dataset, we can apply a method known as the signal detection theory to gauge discipline at the plate. You might remember the signal detection theory from such articles as "Is Walk the Opposite of Strikeout?" or "The Return of the Fisheye". The technique is commonly used in epidemiological studies used in cognitive psychology and engineering. It hinges on the idea of a perfect test, one that codes all positive results as true positives and all negative results as true negatives. Unfortunately, such tests do not exist, with false negatives—being told you are healthy when you really aren’t—and false positives—hearing some bad news in error—surfacing. The first linked article above, written by Russell Carleton, applied this technique to Retrosheet data in order to measure plate discipline in a results-based fashion. Since Retrosheet lacks data for pitch location, the study was restricted to the actual results—swings and misses, balls put in play, and called pitches. Cue the wonderful dataset that is PITCHf/x. Essentially, the goal here is to apply the signal detection theory to PITCHf/x by coding the processes in and out of the strike zone, as opposed to just the end results. In that regard, a pitch thrown in the strike zone at which the batter swung becomes a true positive. A pitch in the zone that is taken is a Type II error, or a false negative. Moving outside of the zone, swings are Type I errors, or false positives; taken pitches become true negatives. With the pitches classified in this fashion, we basically treat every major league hitter as if he is his own epidemiological study. Then, a series of calculations (to be discussed in further detail in the coming paragraphs) will explain which hitters are more prone to mistakes, as well as whether or not they are biased more towards freely swinging or taking pitches. The former statistic is known as sensitivity, while the latter is called the response bias. Ideally, sensitivity will be high, as higher numbers correspond to fewer mistakes. The goal for response biases is to get as close to 1.0 as possible, since that mark exudes balance. Below 1.0 and the hitter’s level of success in being disciplined is biased towards keeping the bat stagnant with the opposite true for numbers above that threshold. As an example, over 2008-09, Luis Castillo posted a sensitivity rate slightly above the major league average, but with a very low .252 response bias that suggests his ability to make fewer mistakes in the box heavily relied upon a seeming refusal to swing. Because he rarely swung, he received some extra ball calls, but it came at the expense of many more called strikes. Hunter Pence had an almost identical sensitivity rating as Castillo, but with a response bias of .961, extremely close to 1.0, indicative of the fact that Pence has been more balanced in making errors and perhaps is not as easily exploitable as Castillo. In fact, Pence will actually make fewer mistakes than Castillo, because he is truly optimizing his balance. He is not costing himself anything extra in either direction.
|
Seems like a good method to check to see if players have or haven't made adjustments or improvements over time, and where those improvements came (with your future expansion into pitch type). I think that if you translate this into an "aging" curve, you could identify why certain prospects fail to blossem into stars and why some fail. I'm sure there are other reasons besides the ones you are examining here, but I would expect at least some of them would.
Gordon, exactly! Right now we just have 2008 and 2009 but something in my queue is a comparison. Did Luis Castillo have a response bias of 0.5 in 2008 and 0.1 in 2009, etc? Did Player X see a drastic improvement in results that his improved signal detected process indicated?