Industry friend and former colleague Patrick DiCaprio (@pdicaprioFP911) posted something on Facebook recently that spurred an interesting discussion I wanted to share with you today:

As you can tell, Patrick falls into the “strict Moneyball” camp whereupon scouting, in any form, is worthless.  I’ll admit that I once fell into this camp, years ago when I first began writing and following baseball seriously, although now I couldn’t be more opposed to it.  Ideally, scouting is an integral part of player analysis both for MLB clubs and for fantasy players—a view I’ve expressed in the past.
Derek Carty: Do you really believe that scouting is all bad, Pat?
Patrick DiCaprio: No, I believe it is unscientific and completely subjective. Horse players used to do the exact same thing in handicapping until they found out what they didn’t even know they didn’t know. Anything that is so bad at predicting how players will do is, at best, marginally useful. The value is in knowing who will not make it, but that is about as far as it has objective predictive value.
Derek Carty: Interesting. I didn't realize that was your stance. I disagree completely. Scouting isn't "bad at predicting how players will do" at all. Of course it's limited if it's all you're using, but it can be very valuable when utilized properly. Just because something is subjective and unquantified doesn't make it useless.
One example: bat speed. Do you believe that a player with faster bat speed will be a better hitter than one with a slower bat, ceteris paribus? While we don't have this data quantified (at least publicly), it's something scouts take note of. Maybe it's not 100 percent perfect and accurate down to six decimals, but it still has value, no?
Patrick DiCaprio: Bat speed is a good example of what I mean. Scouting bat speed makes no sense. Measuring bat speed makes sense. That is the difference.
Derek Carty: How does it make no sense? Surely measuring it is preferable, but absent the tools to do so, having a bit rougher gauge of bat speed is still going to have value. Yes, there's an element of subjectivity to it, but are you saying people are incapable of differentiating between a player who swings the bat fast versus one who swings the bat slow?
Patrick DiCaprio: Come on Derek, you know what we are talking about, and it is not "fast or slow."
Derek Carty: Honestly, Pat, I don't know. Just because we can't measure it precisely doesn't mean we can't measure it accurately, to one extent or another.
Patrick DiCaprio: We can bat this around ad infinitum, but I see no difference between horseplayers that scouted horses based on form and class and the scouting of players. Is there some success? Yes. Is there cause and effect? No. Could it pass any reasonable standard of scientific evidence? No. Should you rely on subjective opinion versus hard evidence? No. Has there been any scientific proof of validity of individual scouting of a player as having future predictive value as compared to random chance? No. Does it fall prey to psychological bias such as the Texas Sharpshooter Fallacy or Tacit Communication? Yes.  Can scouting tell us whether a player is unlikely to succeed? Yes. Can individual player tweaks help a player? Yes. It has value. But not in fantasy baseball and not in predicting the future of major leaguers in any way other than as anecdotal proof. We need to completely change the way we think.
Derek Carty: "Should you rely on subjective opinion versus hard evidence?" It's not an either/or situation. Stats tell you what happened, but they can't tell you why it happened. And (as you always say, Pat) it's the process that matters, not the results. Scouting helps us better understand the process, even if it is imprecise. Certainly the stats are important, especially at the major league level—you’ll get no argument from me on this—but having extra (scouting) data will improve those projections further.
Patrick DiCaprio: To be honest, I really don’t care why something happened as much as I care "will we be able to predict the future?" At some point, things like PITCHf/x, GPS, algorithms and systems and processes are going to render scouting obsolete, just as it did in horseracing. And the sooner this happens, the better. There is a reason why these things perform better than humans in predicting the future, and baseball is not exempt.
Derek Carty: I’m all for the advancement of technology and quantifying whatever we possibly can, but there’s no guarantee that PITCHf/x and such systems will be able to capture everything that scouting can. PITCHf/x and such systems are not able to capture mechanics, for example, especially the many nuances of mechanics. Scouting will never be obsolete unless we attach sensors to players that capture their every kinematical movement. I believe Rick Peterson has actually done some work like this, but it's obviously done in laboratories under test conditions. Bringing this out to the field seems unlikely to ever happen.  Unless the camera technology we’re using now advances far enough to capture these things minus sensors, I don’t see how we’ll ever be able to get the kind of detailed mechanical data you’re calling for Pat (not that it wouldn’t be a godsend).
Patrick DiCaprio: I cannot disagree with you more about kinetics. I am 99 percent certain that it won’t be long before we can measure everything like that. But what this debate really comes down to is this:
Scout: Joe Blow is hitting the outside pitch better.
Data/stats/science: Joe Blow has improved his contact rate by 10 percent on pitches outside the specified zone.
One is subjective and haphazard. The other is verifiably true or false. Which do you prefer? And which is more likely to be better at explanation/prediction? If you think it is the former then you underestimate what technology will do in the future and are valuing opinion over hard science. In the long run, I bet I win more often by the latter than the former.
Derek Carty: I would never advocate the use of scouting where there is a verifiable, statistical alternative that examines the exact same thing, such as the case with outside-the-zone hitting. But that’s rarely the case.  Let's say that same hitter is hitting more fly balls this year, and my scout notes that he's altered his swing plane so as to get more loft on the ball. If you ignore the scout, when you run your statistical project, you're going to overweight the past data and underpredict his future fly ball rate. And things like swing plane are measurable; they just aren't being measured yet. Just because they're not being measured using cameras or motion sensors doesn't mean they can't be noticed by watching the player in person or on video, though.
Patrick DiCaprio: It’s funny because I think scouting can only tell you useful info ex post facto. But stats can predict the future! That is what regression to the mean is all about.
Derek Carty: I'm curious why you say this, Pat. If we recorded the things scouts said and had a large enough sample to examine, you don't think that data would prove useful for predicting future performance?  And if we were to have such a database of scouting data (as I'm sure many teams do), that wouldn't conflict with concepts like regression to the mean. In fact, we could use the scouting data to create means to regress to—i.e. 22-year-old players with fast bats, short swings, level swing planes, and high contact rates post an aggregate .330 BABIP or whatever.
Patrick DiCaprio: No, I don’t think it would prove useful if it were done with the human eye. But when you say "scouting data," what do you mean? I take it to mean info generated by technological measurement of skill/performance and not the subjective opinions of people watching.
Derek Carty: No, I mean simply taking the things the scouts see with their eyes and putting it into an organized, digital format. Obviously the latter is preferable, but I don't see how that happens anytime soon. And I don't think it's necessary in order to derive value from scouting. So essentially, just because the data wouldn't be systematically and objectively recorded, you think it would be completely useless? 
Let me pose another question: how do you feel about batted ball data? Balls being classified as grounders, liners, flies, etc. You use this in your analysis of a player, no? What's the difference between the subjectivity of how these are recorded and the subjectivity of the things scouts record?
Patrick DiCaprio: I trust groundball and fly ball rates but not line drive rates. We all are aware of the issues with subjectivity of line drive rates. I rarely even mention line drive rate unless it is a cross-exam point for someone on the Roundtable Show.
Derek Carty: While line drive rates have more noise in them then groundball and fly ball rates, they are still subjectively scored themselves. Groundball rates from MLB, Stats, and BIS can all vary quite a bit from each other. What I'm not grasping is why it's ok to use the subjectivity of this sort of thing but why it's not ok to use the subjectivity of, say, bat speed.
Patrick DiCaprio: Because in one case we have a clearly better option, namely measurement of the speed at which a ball comes off the bat. Also, I bet the noise in ground and fly balls is not statistically significant. And let’s be fair; we all know what a groundball and fly ball looks like. Seems like nitpicking. There are degrees if uncertainty, but mere existence of uncertainty does not make two uncertain things equally uncertain.
Derek Carty: You’re spot on with your last point.  No matter what we’re looking at, there is going to be some amount of uncertainty since we’re always dealing with a finite sample size.  But once we understand this and regress the proper amount based on the size of the uncertainty, we’re in the clear.  Groundballs have less uncertainty than fly balls, which have much less uncertainty than line drives.  Somewhere on that spectrum, subjectively-measured (but digitally-organized) scouting data would find its place.  Just because the subjective nature of their measurement decreases their accuracy somewhat, it doesn’t mean their accuracy drops to zero.  Partial accuracy is better than nothing at all, especially when the thing we’re examining is immeasurable any other way (at least for the time being).
That’s where the conversation has fizzled out for the time being, but I think there are some interesting things in there that were worth sharing.  I’ve also been invited to go on The Fantasy Baseball Roundtable Show, Pat’s radio show, sometime in the future to discuss this topic further.  So what do you think?

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
Really interesting discussion- I agree with you Derek, that data with partial accuracy is better than no data at all. And I'm guessing you'd be hard pressed to find even a scout who would call scouting a "hard science".

When you go on Pat's show, I'd be interested to hear what he thinks about defensive metrics... Is there any defensive metric (advanced or not) that does NOT have some subjective element? Is not, does that make them all worthless?
It's just so easy and so convenient to approach this conversation as though scouting and sabermetrics are diametrically opposed to one another. Patrick seems to argue under the assumption (genuine or affected, I'm uncertain) that all scouts' opinions always stand in direct opposition to what scientific data say.

The most telling moment of the conversation was when Derek said, "Let's say that same hitter is hitting more fly balls this year, and my scout notes that he's altered his swing plane so as to get more loft on the ball." And this was essentially dismissed out of hand for the sake of adhering to a strict, if not entirely realistic, dogma.

And, of course, defense.
Check out Pecota predictions for Jose Bautista for the past four years and compare those to scouting reports. The stats may be scientific, but they don't know to look for something like a changed batting stance.
This is an important point. Statistics right now are not that great at predicting success. This might be because we don't measure enough factors/well enough. It might also be that there are inherently unmeasurable factors, contingency, and path dependency that render future performance of players inherently UNpredictable.

It is very, very easy to be wrong. You'd think that the whole "defense doesn't matter just play sluggers" debacle would have taught the only-stats crowd some humility.
I’m not sure that anyone suggested defense doesn’t matter. I believe that scoring more runs than your opponent should be the only thing that matters; and if you win 12-8 or 3-1 it’s still a ‘W’. For a small window of time it appeared that offense-first beer-league-looking guys were cheaper to acquire than athletic guys who could field. That may not be the style of baseball that you like, but that doesn’t mean it was a debacle.

And there are jerks on both sides of the stats versus scouting fence that need humility.
For every one player that changed their batting stance and became Jose Bautista there are hundreds that have changed their batting stance and haven’t. I believe that you will rarely be able to point to a changed batting stance and say “that’ll work” ahead of time.

I wonder how many “sweet swing” guys never make it because their out-of-the-zone swing percentage is 35%. One of these things is measurable, the other is not.
One point that isn't being made in the dicussion above is with regard to allocation of resources. If a team has a specific budget for player projection, how much of that is being spent on accumulating enough scouting opinions to distill out meaningful conclusions. Is there a more optimal allocation that moves some of those resources into longer term technology development, even if it means less scouting data today?

On the other hand, scouting can be done at 1000's of amateur ballparks around the world. How many of the tools used to generate precisely "measured" data are available at all of those parks? I think your points about precision vs accuracy are spot on here... a skilled scout should be able to deliver accurate, if imprecise, information from any place on earth that a player throws a ball, swings a bat, etc. Is improving the precision of those data worth what it would cost?
A human being needs to interpret the data. That human being may be a scout or an analyst. If Pat is interested in the predictive value of the information, you'll still need a human to develop/adjust the formulas (regression analysis, etc.) that turn data into useful, predictive insights. Maybe that's a scout; maybe that's an analyst. But it's a human either way.

Pat's example of 10% better contact outside the zone is helpful. Is that information predictive of anything? Is it small sample size? That requires judgment. Who provides that judgment is to be determined, but it won't be a computer.
The dismissive attitude of scouting is what I find most troubling. Pat still sees it as an us vs. them debate. I think it's the condescension and superiority of "stats guys" that rubs those who are on the fence or not yet in the stats camp the wrong way.

I love stats, but you need to watch games and observe the players, too. There's no way simply one or the other will do.
Whether he is right or wrong, God help us if that is a prevailing view in more than the ultra-minority. Baseball would die out otherwise.

The biggest problem with using only numbers - at the expense of actually watching games - is that numbers are only probabilities. You could refine and refine to the 50th decimal, but how would that really help? The player being analyzed does not play within a vacuum and the opposing pitcher/hitter also would have detailed tendencies.

Furthermore, the percentages are on a broad level and will never let us know how a player will perform in a specific at bat. Baseball is played (scouted) with whole numbers and analyzed (statistics) in partial numbers. No matter how you look at it, you get an incomplete picture. The best method can only be to discuss both the whole and the partial.
It does not seem like Patrick is interested in having a discussion. He wants to tell you why he is right and you are wrong. I wouldn't waste any more time with this.
I couldn't agree more. Very well said.
Clearly the "it takes both" view is the one most of us share.

What occurs to me here though, is that the distinction between "good scout" and an "eh-scout" will matter more and more. As the game relies more on objective measures, the number of scouts whose eyes, instincts and ability-to-predict allow them to keep pace will necessarily grow scarcer ---and more valuable.
Totally agreed, but which tools will be employed to measure the accuracy of the scouts?!

Makes me wonder how umps are going to be judged going forward as well...
While I understand the horseplayer's analogy, it also comes with it's baggage. There are numerous stats collected in racing, and multitudes of ways to use those stats to predict the future. Early data collectors thought they had found the road to enlightenment (and early on, view of the goal must have seemed clear), but that road has become congested with fellow travellers blocking the view.
OK, last time I was at a track, one of the races had only six horses. I know nothing whatever about horses. But I could see that 5 of the horses were standing around before the race, basically looking bored. And once horse was racing around like crazy, looking eager to get started. I put $10 down on the only frisky horse -- the longshot in the field -- and won about $200. It may be a fluke. But I think I was seeing that one one of the horses was really ready to race. In my inept, uneducated way I was scouting the horses. It worked.
Back in the 70's, I was in graduate school and remember reading an monograph entitled something like "The Art of Muddling Through". As this was before computers were readily available, the study took quantifiable business decisions that experienced business managers had made by feel/intuition because the data was not readilty available. After the decision was made, the data was then collected and the correctness of the decisions made on intution/feel was relatively high - I think around 80%. Sure you want to improve your accuracy, but until that data is readily available, subjectivity is better than nothing.
Most decisions are trivially easy. Once the decisions cease to be trivial, people are generally no better than chance (and frequently worse due to bias). If you're in a position to leverage analytics, you'll crush people basing decisions on that same data every time. Do you remember how terrible the original Yahoo web directory was?
This seems like a discussion about the means of collecting data. Scouts are in many ways attempting to codify behavior for analysis, in the same way pitchFX does. Currently, using human collection is much more feasible for most data collection, especially at lower levels. As the infrastructure of digital data collection grows, perhaps scouting will become irrelevant in some ways. There is also a question of proprietary data collection that enters into play, ie a team doesn't want to just share a collective infrastructure of data- there is no competitive advantage. SO, in "near future" when we can quantify everything about all players, then Patrick will be correct.

It reminds me a of an old economics joke [shortened here]: Three people, one an economist, are stranded on a desert island with a bunch of canned food and no can opener. The two cannot open the cans, so they turn to the economist who says "Let's start by imagining a perfect can opener..."
All I really want to say is this: anyone who actually does statistical analysis for a living--like I do--knows that DiCaprio is way, way, way out of line in his lauding of quantitative over qualitative data. Real researchers use all data--for example, using qualitative data at the beginning of a project to help identify potential hypotheses for testing, numbers for statistical testing, and back to qualitative sources for a reality check of results. Throwing out a huge source of information--scout reports--simply because it does not hew to a comically narrow notion of "data" is just ridiculous.
That's not what he's saying at all.
"Anything that is so bad at predicting how players will do is, at best, marginally useful."

Yes, it is what he is saying.

"But when you say "scouting data," what do you mean? I take it to mean info generated by technological measurement of skill/performance and not the subjective opinions of people watching."

In other words, he is denying that scouting reports are data at all. But the fact that something is subjective doesn't make it not data.
I wouldn't believe in ghosts unless one slapped me in the face. That said, I've been beaten senseless by scouting ghosts for the past eight years, so I choose to believe in them.

Overconfidence in either objective or subjective data alone will fuel a limited view, and baseball epiphany lies in the intersection of what we see and what we measure. Much like scouting, performance stats rely on input variables that are imperfect, which will be true as long as a bunt-single is weighted equally with a laser off the Monsta' in the box score.