April 25, 2011
Cracking the Scouting Code
Consistency: the word itself a food metaphor, irony dripping from it like ice cream from a half-melted cone. Despite the rhetoric, consistency doesn’t matter much in baseball. What matters is being good. In the process of evaluating ballplayers, however, consistency is all that matters.
Scouts grade prospects based on a 20-80 scale where 50 is average, and, according to one scout*, “one grade is a standard deviation. Think of it as a bell curve.”
*Kevin Goldstein, Managing Partner of Baseball Prospectus.
But that bell curve is meaningless without further processing. The fundamental building block of baseball is runs, from which you get wins (and from there, you can derive dollars). The 20-80 scale contains numbers without backing, and the “five tools”—hit, hit for power, run, throw, field—all have separate weightings.
“Different tools have different curves,” Goldstein says. “The range of ability is much wider in something like speed than hitting. You can only really have a .200 hitter or a .300 hitter, but speed-wise you have Billy Butler and Brett Gardner—the range is crazy. There’s no such thing as a 70 or 80 hitter who’s not an every-day big leaguer. There is such thing as a 70 or 80 runner who won’t be a big leaguer. I’d say 15 percent of major leaguers have average speed, and I know that doesn’t sound like it makes sense.”
While he understands how nonsensical it might seem to call 15 percent of a population “average,” Goldstein has nevertheless written an article on the subject, with the “staggering” finding that “nearly 92 percent of all right-handers have at least average velocity.” Indeed, sometimes scoutspeak falters upon closer inspection.
Goldstein and fellow scout Jason Parks, who co-hosts the BP Podcast, agree that Mariano Rivera’s cutter has exceptional “late life.” But according to BP’s PITCHf/x expert Mike Fast, “Sportvision tests that Alan Nathan helped with confirmed that constant acceleration is a very good approximation for nearly all, and perhaps completely all, pitches.” So what to make of the late break phenomenon?
“That pitch was filthy,” Parks says. “It buckled his knees. The cutter on a righty to a righty, you aim for the hip, and that’s what he did. Kinsler thought it was going to hit him in the crotch.”
“It may not be in compliance with the laws of physics, it may be an optical illusion, but that cutter breaks late,” says Goldstein. “It comes down to where the pitch is at the time of decision for a hitter and where it ends up.”
Goldstein points to Justin Masterson’s sinker and Cole Hamels’ changeup as other examples of pitches with late life. The cutter, sinker, and changeup all have something in common—they work best off the traditional four-seam fastball. They can all be hidden in that fastball’s plane during the first moments when the batter is trying to recognize the pitch.
From 2008-2010, there were just 500 pitches thrown by right-handed pitchers that traveled at least 90 miles per hour with as much cutting action as Mo’s Sunday nighter. Combined, Rivera and Jared Burton have thrown over half of them. The point is, even if a hitter knows it’s coming, there’s no way he can adjust to that type of horizontal movement on a pitch. Just when he expects the pitch to zig, it zigs slightly more than he expected.
“Late life” might not mean much in the physical world, but as a descriptor, it appears to get the job done. Maybe a scout’s “average” doesn’t apply only to the major leagues. There are many levels of amateur and professional ball, and many averages across levels. So long as it’s understood which average is being referenced, the terminology works.
Some terms are even harder to nail down, such as pitchability and deception. You just know them when you see them.
“Pitchability is something that gets thrown out a lot, and there’s something to it,” Goldstein says. “His pitches wouldn’t grade out crazy high, but Greg Maddux is one of the best pitchers in the history of baseball, and there’s a reason for that. It goes beyond his 80 command and control. Understanding sequencing, getting hitters to guess wrong, the mind game. Tom Glavine and Jamie Moyer were huge pitchability guys. Guys pitching well beyond their stuff and still do it for a reason other than luck.”
Picking up on “pitchability,” whatever it is, in prospects and being able to spot a 250-game winner might be helpful. But actually knowing the importance of pitchability is irrelevant. Not every pitcher with pitchability will become Greg Maddux. The process is to find a trait that exists in a set of players and then, after that, analyze the significance of that trait.
As for deception, “PITCHf/x can’t judge Tim Collins at all,” Goldstein says. “Deception is a big part of his game. He twists, goes behind his head; it’s hard to pick up the release. It’s like, 'Oh, there it is.' It depends on the guy, but [deception] can add somewhere between zero and four clicks. 92 seems like 95 for Collins.”
Data on tangibles and biographical information still needs to be processed. Collins’ height can’t simply be taken as a negative. The general wisdom is that shorter pitchers are at a disadvantage, but by how much?
Parks believes that the difference in impact on a game between Brett Gardner’s and Billy Butler’s speed (20-80) is roughly equivalent to the difference between a 50 hitter (.275-.280) and a 60 hitter (.290-.295). Goldstein sees tools impacting the game differently depending on the position. “Shortstop might be 30 or 40 percent glove, and the first baseman might be five percent. Speed is important for the center fielder, but maybe it’s one percent for a catcher.”
So long as Goldstein and Parks have conviction in their beliefs and compile data in that manner, when they assemble a body of work, their hypotheses can be tested. It’s all about consistency, and not just when it comes to one scout. Ideally, that consistency would apply to an entire scouting department.
“That’s why teams get different reports on different sources throughout a season,” Parks says. “You get another report, have another set of eyes on it, they can see the conceptual space between grades, if those numbers have grown closer together or farther apart. Even if I’m wrong, I’m providing a baseline through which all other reports are going to be judged.”
Imagine this: Jim Callis and John Manuel suddenly realize what Victor Wang already realized—that they systematically overrate the potential of top pitching prospects. Baseball America’s scouting paradigm is overhauled—an apocalypse in player evaluation. Baseball America improves its batting average, but at what cost?
In this scenario, their new rankings don’t proceed from the same tried-and-untrue system. That system, with years and years of data, is practically invaluable. If you churn out consistent data, analysis can find the bias. Analysis can make sense of the noise. It's the Costanza corollary: if everything you do is wrong, then the opposite must be right. Baseball America is valuable not because it’s always right, but because it always provides a reliable and comprehensible baseline.
Tangotiger has been compiling the Fans Scouting Reports for a number of years. Fans rate fielders based on criteria such as arm strength and arm accuracy. Incidentally, sabermetricians have developed metrics to evaluate outfield arms. I’m a Fan. I think I can do a decent job of judging an outfielder’s arm, without having had any training. I also think Roberto Clemente’s arm won the Pirates ten games a year. But the data shows that one grade in the FSR, the difference between a 50 and a 60, equates to about one run per year in the value of an outfield arm.
This type of analysis can be performed on every single data point a scout collects. Generally, OFP, Overall Future Potential, is seen as the most important predictor of a prospect’s success. Scouts base their reputations on that number, and decision-makers give that number the most weight. But really, who knows what scouting numbers are the most reliable without actual analysis? Why not throw all of those numbers into a regression? OFP might as well be an OLS-Founded Projection.
Sky Andrecheck, among many others, has shown that the expected value of the No. 1 draft pick is by far the highest when it yields a college hitter. Andrecheck, Wang, BP alum Keith Woolner, and other Prospectus veterans do work for the Indians. Yet the Indians selected a pitcher, Drew Pomeranz, fifth overall last year, and so far his performance ranks up there with that of any 2010 pick. There are no hard-and-fast rules to scouting.
This year, the choice for 1-1 appears to be between Rice infielder Anthony Rendon and UCLA RHP Gerrit Cole.
“Cole is a no-brainer,” Parks says. “Even though all the data backs up taking Rendon, Cole gives me the opportunity to go get a legit No. 1. You can’t acquire that on the free agent market, it’s difficult to trade for one.”
Goldstein: “I’d wonder what the value is if you took away the players who never reached the majors. You’re measuring value in total instead of players who make it to the big leagues.’
But why would you do that?
“In the investment game, there’s the concept of risk,” Goldstein continues. “If you put your money in savings bonds, you’re going to get these little checks every year, and it’s going to be awesome. But if you take risks in the international market, in tech stocks, you’ll lose your hat sometimes, but other days it will let you buy a house. The expected value is lower, I’ll admit you’re right, but you’re never going to win. I don’t like the concept of expected value. I don’t care about expected value. It’s about the chances of finding an impact player. If you get 12 years of big-league-average baseball, you’re going to do well on an expected-value spreadsheet, but it’s not about finding average value. It’s about finding impact players. You have to take those chances.”
Between 1987 and 2007, 12 position players were chosen first overall and 10 became All-Stars. Of the nine pitchers, only two became All-Stars.
“It’s funny, you can know the data and still give the same answer,” Parks adds.
And that’s really all it is: consistency. If the scouting director is providing his own analysis on his scouts’ data, all he needs is reliability, and the rest falls into place. In the end, it comes down to what you’re asking your scouts to be: data providers or data analysts?
“I’d call myself both,” Goldstein says. “Scouting reports are at their very core data, but there is narrative to it. There will be 20 or so numbers on, say, fastball velocity, makeup, bio info, and then there will be paragraphs writing about what he is.”
“You are collecting, analyzing what you collect, articulating it for other people to analyze,” Parks says. “But you analyze it first to have it make sense to you. You’re basically a salesman—you are convincing yourself of the validity of some product. Everything in life takes some form of convincing. The fact he can throw 98 convinces me he can throw a good fastball. You need to be able to articulate your thoughts. They need to know that I think Cole has a good fastball, a good slider, and an improving changeup. It’s all about collecting data, processing data, and putting it in a nice little package.”
The language spoken by PITCHf/x analysts can be lost in translation just as easily.
When it comes to classifying pitches, what do you call a pitch like Rivera’s non-cut fastball, Angel Guzman’s cut fastball, or Aroldis Chapman’s breaking pitch? They may be gripped one way or have the physical properties of a typical pitch type, but who determines the label?
“A lot of the time, PITCHf/x won’t properly identify a breaking ball, but ultimately it really doesn’t matter,” Parks says. “Call it a 78 mile-per-hour breaking ball. If it features more tilt than vertical action, then, depending on the arm angle, it will be a slider versus a curveball. The label doesn’t matter. I care about the effectiveness of the pitch.”
Clustering algorithms used for pitch classification are essential for analysis, but trying to label pitches in English right off the bat might be a mistake. Until these pitch types have to be communicated, they should be thought of in terms of their properties. Everyone throws a fastball, and off-speed pitches work off the fastball. You can describe the physical properties of a pitcher's fastball and how his off-speed pitches relate to it, but it doesn’t make much sense to compare pitches thrown by different pitchers because they are named the same. It makes sense to compare them because they have similar speed and movement or are similarly separated from a fastball.
“Baseball is so old-school, it might confuse a lot of people, but I’d be all for it," Parks says. “All I care is what the pitch does, how it affects people, if the pitcher has command over it.”
Players and scouts and coaches are always accurate. That doesn't mean that I necessarily agree with what they’re saying, but if I introduce a PITCHf/x concept and they either disagree or don’t understand, it has to be my fault. They’ve been speaking in their own language for ten times longer than PITCHf/x has existed. They already have definitions that they understand. PITCHf/x is still searching for its own language.