Industry friend and former colleague Patrick DiCaprio (@pdicaprioFP911) posted something on Facebook recently that spurred an interesting discussion I wanted to share with you today:

As you can tell, Patrick falls into the “strict Moneyball” camp whereupon scouting, in any form, is worthless. I’ll admit that I once fell into this camp, years ago when I first began writing and following baseball seriously, although now I couldn’t be more opposed to it. Ideally, scouting is an integral part of player analysis both for MLB clubs and for fantasy players—a view I’ve expressed in the past.
Derek Carty: Do you really believe that scouting is all bad, Pat?
Patrick DiCaprio: No, I believe it is unscientific and completely subjective. Horse players used to do the exact same thing in handicapping until they found out what they didn’t even know they didn’t know. Anything that is so bad at predicting how players will do is, at best, marginally useful. The value is in knowing who will not make it, but that is about as far as it has objective predictive value.
Derek Carty: Interesting. I didn't realize that was your stance. I disagree completely. Scouting isn't "bad at predicting how players will do" at all. Of course it's limited if it's all you're using, but it can be very valuable when utilized properly. Just because something is subjective and unquantified doesn't make it useless.
One example: bat speed. Do you believe that a player with faster bat speed will be a better hitter than one with a slower bat, ceteris paribus? While we don't have this data quantified (at least publicly), it's something scouts take note of. Maybe it's not 100 percent perfect and accurate down to six decimals, but it still has value, no?
Patrick DiCaprio: Bat speed is a good example of what I mean. Scouting bat speed makes no sense. Measuring bat speed makes sense. That is the difference.
Derek Carty: How does it make no sense? Surely measuring it is preferable, but absent the tools to do so, having a bit rougher gauge of bat speed is still going to have value. Yes, there's an element of subjectivity to it, but are you saying people are incapable of differentiating between a player who swings the bat fast versus one who swings the bat slow?
Patrick DiCaprio: Come on Derek, you know what we are talking about, and it is not "fast or slow."
Derek Carty: Honestly, Pat, I don't know. Just because we can't measure it precisely doesn't mean we can't measure it accurately, to one extent or another.
Patrick DiCaprio: We can bat this around ad infinitum, but I see no difference between horseplayers that scouted horses based on form and class and the scouting of players. Is there some success? Yes. Is there cause and effect? No. Could it pass any reasonable standard of scientific evidence? No. Should you rely on subjective opinion versus hard evidence? No. Has there been any scientific proof of validity of individual scouting of a player as having future predictive value as compared to random chance? No. Does it fall prey to psychological bias such as the Texas Sharpshooter Fallacy or Tacit Communication? Yes. Can scouting tell us whether a player is unlikely to succeed? Yes. Can individual player tweaks help a player? Yes. It has value. But not in fantasy baseball and not in predicting the future of major leaguers in any way other than as anecdotal proof. We need to completely change the way we think.
Derek Carty: "Should you rely on subjective opinion versus hard evidence?" It's not an either/or situation. Stats tell you what happened, but they can't tell you why it happened. And (as you always say, Pat) it's the process that matters, not the results. Scouting helps us better understand the process, even if it is imprecise. Certainly the stats are important, especially at the major league level—you’ll get no argument from me on this—but having extra (scouting) data will improve those projections further.
Patrick DiCaprio: To be honest, I really don’t care why something happened as much as I care "will we be able to predict the future?" At some point, things like PITCHf/x, GPS, algorithms and systems and processes are going to render scouting obsolete, just as it did in horseracing. And the sooner this happens, the better. There is a reason why these things perform better than humans in predicting the future, and baseball is not exempt.
Derek Carty: I’m all for the advancement of technology and quantifying whatever we possibly can, but there’s no guarantee that PITCHf/x and such systems will be able to capture everything that scouting can. PITCHf/x and such systems are not able to capture mechanics, for example, especially the many nuances of mechanics. Scouting will never be obsolete unless we attach sensors to players that capture their every kinematical movement. I believe Rick Peterson has actually done some work like this, but it's obviously done in laboratories under test conditions. Bringing this out to the field seems unlikely to ever happen. Unless the camera technology we’re using now advances far enough to capture these things minus sensors, I don’t see how we’ll ever be able to get the kind of detailed mechanical data you’re calling for Pat (not that it wouldn’t be a godsend).
Patrick DiCaprio: I cannot disagree with you more about kinetics. I am 99 percent certain that it won’t be long before we can measure everything like that. But what this debate really comes down to is this:
Scout: Joe Blow is hitting the outside pitch better.
Data/stats/science: Joe Blow has improved his contact rate by 10 percent on pitches outside the specified zone.
One is subjective and haphazard. The other is verifiably true or false. Which do you prefer? And which is more likely to be better at explanation/prediction? If you think it is the former then you underestimate what technology will do in the future and are valuing opinion over hard science. In the long run, I bet I win more often by the latter than the former.
Derek Carty: I would never advocate the use of scouting where there is a verifiable, statistical alternative that examines the exact same thing, such as the case with outside-the-zone hitting. But that’s rarely the case. Let's say that same hitter is hitting more fly balls this year, and my scout notes that he's altered his swing plane so as to get more loft on the ball. If you ignore the scout, when you run your statistical project, you're going to overweight the past data and underpredict his future fly ball rate. And things like swing plane are measurable; they just aren't being measured yet. Just because they're not being measured using cameras or motion sensors doesn't mean they can't be noticed by watching the player in person or on video, though.
Patrick DiCaprio: It’s funny because I think scouting can only tell you useful info ex post facto. But stats can predict the future! That is what regression to the mean is all about.
Derek Carty: I'm curious why you say this, Pat. If we recorded the things scouts said and had a large enough sample to examine, you don't think that data would prove useful for predicting future performance? And if we were to have such a database of scouting data (as I'm sure many teams do), that wouldn't conflict with concepts like regression to the mean. In fact, we could use the scouting data to create means to regress to—i.e. 22-year-old players with fast bats, short swings, level swing planes, and high contact rates post an aggregate .330 BABIP or whatever.
Patrick DiCaprio: No, I don’t think it would prove useful if it were done with the human eye. But when you say "scouting data," what do you mean? I take it to mean info generated by technological measurement of skill/performance and not the subjective opinions of people watching.
Derek Carty: No, I mean simply taking the things the scouts see with their eyes and putting it into an organized, digital format. Obviously the latter is preferable, but I don't see how that happens anytime soon. And I don't think it's necessary in order to derive value from scouting. So essentially, just because the data wouldn't be systematically and objectively recorded, you think it would be completely useless?
Let me pose another question: how do you feel about batted ball data? Balls being classified as grounders, liners, flies, etc. You use this in your analysis of a player, no? What's the difference between the subjectivity of how these are recorded and the subjectivity of the things scouts record?
Patrick DiCaprio: I trust groundball and fly ball rates but not line drive rates. We all are aware of the issues with subjectivity of line drive rates. I rarely even mention line drive rate unless it is a cross-exam point for someone on the Roundtable Show.
Derek Carty: While line drive rates have more noise in them then groundball and fly ball rates, they are still subjectively scored themselves. Groundball rates from MLB, Stats, and BIS can all vary quite a bit from each other. What I'm not grasping is why it's ok to use the subjectivity of this sort of thing but why it's not ok to use the subjectivity of, say, bat speed.
Patrick DiCaprio: Because in one case we have a clearly better option, namely measurement of the speed at which a ball comes off the bat. Also, I bet the noise in ground and fly balls is not statistically significant. And let’s be fair; we all know what a groundball and fly ball looks like. Seems like nitpicking. There are degrees if uncertainty, but mere existence of uncertainty does not make two uncertain things equally uncertain.
Derek Carty: You’re spot on with your last point. No matter what we’re looking at, there is going to be some amount of uncertainty since we’re always dealing with a finite sample size. But once we understand this and regress the proper amount based on the size of the uncertainty, we’re in the clear. Groundballs have less uncertainty than fly balls, which have much less uncertainty than line drives. Somewhere on that spectrum, subjectively-measured (but digitally-organized) scouting data would find its place. Just because the subjective nature of their measurement decreases their accuracy somewhat, it doesn’t mean their accuracy drops to zero. Partial accuracy is better than nothing at all, especially when the thing we’re examining is immeasurable any other way (at least for the time being).
***
That’s where the conversation has fizzled out for the time being, but I think there are some interesting things in there that were worth sharing. I’ve also been invited to go on The Fantasy Baseball Roundtable Show, Pat’s radio show, sometime in the future to discuss this topic further. So what do you think?
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
When you go on Pat's show, I'd be interested to hear what he thinks about defensive metrics... Is there any defensive metric (advanced or not) that does NOT have some subjective element? Is not, does that make them all worthless?
The most telling moment of the conversation was when Derek said, "Let's say that same hitter is hitting more fly balls this year, and my scout notes that he's altered his swing plane so as to get more loft on the ball." And this was essentially dismissed out of hand for the sake of adhering to a strict, if not entirely realistic, dogma.
And, of course, defense.
It is very, very easy to be wrong. You'd think that the whole "defense doesn't matter just play sluggers" debacle would have taught the only-stats crowd some humility.
And there are jerks on both sides of the stats versus scouting fence that need humility.
I wonder how many “sweet swing†guys never make it because their out-of-the-zone swing percentage is 35%. One of these things is measurable, the other is not.
On the other hand, scouting can be done at 1000's of amateur ballparks around the world. How many of the tools used to generate precisely "measured" data are available at all of those parks? I think your points about precision vs accuracy are spot on here... a skilled scout should be able to deliver accurate, if imprecise, information from any place on earth that a player throws a ball, swings a bat, etc. Is improving the precision of those data worth what it would cost?
Pat's example of 10% better contact outside the zone is helpful. Is that information predictive of anything? Is it small sample size? That requires judgment. Who provides that judgment is to be determined, but it won't be a computer.
I love stats, but you need to watch games and observe the players, too. There's no way simply one or the other will do.
The biggest problem with using only numbers - at the expense of actually watching games - is that numbers are only probabilities. You could refine and refine to the 50th decimal, but how would that really help? The player being analyzed does not play within a vacuum and the opposing pitcher/hitter also would have detailed tendencies.
Furthermore, the percentages are on a broad level and will never let us know how a player will perform in a specific at bat. Baseball is played (scouted) with whole numbers and analyzed (statistics) in partial numbers. No matter how you look at it, you get an incomplete picture. The best method can only be to discuss both the whole and the partial.
What occurs to me here though, is that the distinction between "good scout" and an "eh-scout" will matter more and more. As the game relies more on objective measures, the number of scouts whose eyes, instincts and ability-to-predict allow them to keep pace will necessarily grow scarcer ---and more valuable.
Makes me wonder how umps are going to be judged going forward as well...
It reminds me a of an old economics joke [shortened here]: Three people, one an economist, are stranded on a desert island with a bunch of canned food and no can opener. The two cannot open the cans, so they turn to the economist who says "Let's start by imagining a perfect can opener..."
Yes, it is what he is saying.
"But when you say "scouting data," what do you mean? I take it to mean info generated by technological measurement of skill/performance and not the subjective opinions of people watching."
In other words, he is denying that scouting reports are data at all. But the fact that something is subjective doesn't make it not data.
Overconfidence in either objective or subjective data alone will fuel a limited view, and baseball epiphany lies in the intersection of what we see and what we measure. Much like scouting, performance stats rely on input variables that are imperfect, which will be true as long as a bunt-single is weighted equally with a laser off the Monsta' in the box score.