There never was a stats vs. scouts war. If there was, it was silly. A good researcher knows that you never throw away perfectly good data.
It seems that because of that scene in Moneyball, people believe that there is still some sort of war against scouts going on. As someone who prefers to promote peace, love, and understanding (what's so funny about that?), perhaps what's needed is a better understanding of how we might be misunderstanding scouting reports. They are data, just in a different form than those of us who usually work the numbers are used to. May I present a short five-step plan to understanding how scouting data can (and perhaps, should) be used to further our understanding of baseball in a perfectly scientific manner.
Step 1: Recognize a data collector when you see one.
Retrosheet data files are wonderful. PITCHf/x is beautiful. If FIELDf/x or HITf/x make it out into the wild, that will be some kind of awesome. Heck, I get excited looking at a good box score. Just because a data stream does not come pre-packaged as a spreadsheet does not make it useless.
Consider the scout, particularly the one who draws the 16-year-old amateur beat. He has spent a good chunk of his adult life watching really bad baseball, all in the hopes of finding one or two kids who are slightly less bad than the others, and projecting out the fact that in 10 years (!), this special kid will turn into one of the five best baseball players on the planet. He spends a lot of time thinking about how to accomplish this goal, because it’s his job. One of the things that happens when you spend a lot of time either thinking about or doing something is that you begin to, sometimes without even knowing it, pick up on subtle differences that no one else even knows to look for. The little tilt in the left elbow that distinguishes a guy who might develop some power rather than the guy who will not develop any further.
Let's give scouts credit for the skills that they have developed. They need to be experts in how the human body develops from late adolescence onward. They also need to understand how to both break a pitching motion or a swing down into parts, understand how all those body parts work together, figure out which pieces can be altered, and if so, what the swing/motion would look like with the changes in place. Then he has to piece it all back together at the macro level and visualize whether that would work against major-league pitching or hitting. He must do this in a vacuum, because the high school pitcher whom the prospect is currently facing has a career ahead of him of watching MLB games when he gets home from his real job. Also, scouts have to play amateur psychologist and try to get some idea of what makes this guy tick and whether he'll even take the teaching that he'd be offered. That's a very specialized skill set and one that's very hard to master.
In the analytical community, we're used to placing things within a horizontal context. How good has Miguel Cabrera been? We can compare his numbers to those of others who have been in the same league at the same time, which answers our question fairly well. What happens when you delete that context? Suppose that we only ever saw Miguel Cabrera bat against pitchers who hailed from one small geographical area. And then there were guys to whom we wanted to compare him and they all faced a different set of pitchers, and each set from a different geographical area. That design would fail every research methodology class out there. Except that's how things actually have to work in the real world.
The scout is the context. Through his previous experience of watching bad baseball, he's able to compare talent across time and place. He can look back over his years of doing this, make comparisons that way, and put things into that framework. Yes, that will have problems, but all data collection methods have problems. Let's recognize scouts for what they really are: highly skilled data collectors who are using a different research paradigm than we’re used to.
Step 2: Realize that while those data are going to be biased, bias can be overcome.
Most of the critiques of scouts as data sources are actually correct. Scouts are human and, like all humans, they will have cognitive blind spots of which they may not be aware. Humans are far too confident in their own predictions. The same ability that leads them to be able to detect subtle differences between players allows them to be influenced by factors that have nothing to do with anything. (There's a study that shows that applicants to medical school were more likely to be admitted if they interviewed on a sunny day, rather than a rainy one.) Many of the advances of the statistical revolution came from exploiting the fact that people remained unaware of these biases and didn't have the ability to provide the appropriate horizontal context to test them. There's still plenty of room for that. The problem with data isn't bias itself. It's hidden bias.
There are ways, methodologically and statistically, to overcome bias. In fact, a process evaluation of the way in which scouting is actually done would be a fascinating project. Teams already know one method for reducing bias: the cross-checker. If a team is seriously considering a player to sign or draft or trade for, they'll often send out a second scout (or more?) to take a look. In research, we call this inter-rater reliability. If scouts agree with one another, we can at least know that they are all on the same page. That doesn’t mean it’s the right page, but that's another discussion.
Step 3: Learn some new research techniques
One of the biggest frustrations that quantitative folks face when looking at scouting data is that a good amount of it is in narrative form. Scouts tend to describe players in these strange combinations of lines and dots put together in sequence to make a form of communication known as "words." There's a language that goes with it, and while a "plus-plus fastball" sounds ever more worryingly like Newspeak, I don't quite get why ideas borrowed from mixed-methodology research or text-based analysis has never been well represented among baseball researchers. Are there descriptions or words that appear over and over in scouting reports? (Yes.) Do these descriptions tend to clump together in some noticeable ways? (Probably.) Do any of these clumps predict success in any meaningful way? Now, there's a question worth looking into, but it'll require some skills in QUALitative research methods.
Do Jason Parks and Zach Mortimer rate players highly on their level of #want at the same rate? Do they rate the same players highly? Can we calibrate them against one another? Do they comment on the same things about the same players? Which is more powerful: the fact that a player got a high rating, or the fact that the scout noticed it enough to comment on it? We’ve now entered the world of content analysis.
No, teams won't release their scouting data directly to you, but there are plenty of websites out there that review prospects for a living. It's not exactly the same thing (especially since those sites only write up the "interesting" prospects), but an enterprising researcher could tap that data source, get his or her feet wet, and still run some truly groundbreaking analyses.
Step 4: Repeat after me: The 20/80 scale is an ordinal measure.
Okay, so there are numbers on a scouting report. There's the 20/80 scale. A 20 is "You tried hard." An 80 is elite. A 50 is theoretically MLB average. Some folks use 2 to 8. Some use 1 to 10. Some give out letter grades. Some give out stars or rainbows or stickers. It doesn't matter. They are all ordinal scales. Statistically-minded folk are generally used to dealing with ratio and interval data, and there's a temptation to simply use the 20/80 scale in the same way. You can't do that.
For those of you who missed that day in stats class, the gory details are as such. An ordinal variable is one in which the values tell you information about the order in which the things being rated fall. We know that 60 is better than 50 is better than 40. We know that four stars are better than three. But by how much? Baseball research has generally focused around measures that are on interval and ratio scales. In an interval scale, the “distance” between 40 and 50 would be the same “distance” between 50 and 60. In a ratio scale, not only are the distances between numbers the same, but the number zero means “the absence of” and one can make ratio comparisons between numbers (i.e., a “60” is twice as good as a “30.”) On-base percentage, which answers the question, “What percentage of the time did Smith not make an out?”, is a ratio scale. An OBP of .000 means that he was never able to reach base during the period in question, and a player with a .400 OBP was twice as successful as a player with a .200 OBP.
The biggest mistake that I see people make (and have made myself!) in working with data is pretending that ordinal variables are actually interval or ratio variables. Because of the mathematical properties that interval/ratio variables have that ordinal variables do not, there are things that one can do with a ratio scale, but not with an ordinal scale. It’s not that ordinal variables are useless; there are methods developed for dealing with ordinal variables. It’s just that many of the favored methods in research (OLS regression, Pearson correlation, taking a simple average) don’t actually work when you shove an ordinal variable into them. Oh, Excel or R or whatever program you like to use will run the procedure and spit out a number, but it’s a garbage number. Caveat number cruncher!
Step 5: Learn to live with (and love) error bars
There are methods for figuring out whether someone is good at predicting things. They mostly boil down to pulling old predictions and seeing whether or not they happened. And, of course, there will be mistakes. Predicting the future is hard, and those of us who do quantitative work don’t have a 100 percent accuracy rate either. Fear not, those errors are fascinating data unto themselves.
Scouting is an ongoing process, and studying any process in depth will reveal inefficiencies. Once you can spot an inefficiency, you might be able to address it. With the scouting process, the fixes can be implemented across a large system and in a high-leverage part of a baseball organization. I'd wager that teams do spend a good amount of time trying to address any weaknesses in their scouting processes, and for good reason. A well-functioning talent identification system produces cost-controlled players who put up real value.
But publicly, there's a surprising lack of investigation into these types of research questions, and a lack of the use of these types of research methods. There are, of course, plenty of errors that teams make, so perhaps there is a brilliant mind out there who can identify a few holes in how the system works. I'm often asked where the next frontier is in sabermetrics (there are several!), and I believe that the field is wide open for someone who wants to formally and systematically study the talent identification process.