June 12, 2012
Scouting with PITCHf/x
Believe it or not, most of our writers didn't enter the world sporting an @baseballprospectus.com address; with a few exceptions, they started out somewhere else. In an effort to up your reading pleasure while tipping our caps to some of the most illuminating work being done elsewhere on the internet, we'll be yielding the stage once a week to the best and brightest baseball writers, researchers and thinkers from outside of the BP umbrella. If you'd like to nominate a guest contributor (including yourself), please drop us a line.
Adam is the founder of Project Prospect, a scouting and statistical analysis website. He has been writing about baseball since 2006, when he began covering college baseball and conducting quantitative analysis of minor-league prospects.
Data is one of baseball’s purest byproducts. It’s interlaced with the past, present and future. It provides a platform for discussion. And just as the game has an effect on it, it has an effect on the game.
PITCHf/x data, which is part of a new breed of baseball statistics, can be intimidating and overwhelming. But thanks to amazing efforts by MLB Advanced Media, Sportvison (the creator of PITCHf/x), and a growing pool of analysts, PITCHf/x has made a mark on the game. That said, its roots are still shallow, and relatively few players, coaches and on-air personalities have fully embraced it.
"The only time I hear about that stuff is through the media," Tim Lincecum recently told me. "Reporters came to me early this season and said that I'd been throwing about 17% sliders. I hadn't thrown one slider up to that point."
Madison Bumgarner, Lincecum’s teammate, said his biggest concern with PITCHf/x is analysts aggregating data and masking the situational adjustments a pitcher must make.
As a primer for people seeking applications for PITCHf/x, I’ve detailed a few of my findings about how PITCHf/x can be utilized to improve your scouting eye. But first, let’s take a closer look at Lincecum’s concern about pitch categorization.
How PITCHf/x categorizes pitches
MLB.com’s Gameday application delivers near-live data that includes pitch type, speed, and movement information. Pitch types are defined by mathematical models that are built around velocity, spin, and movement. It’s a constantly evolving, sophisticated system.
“When we first started doing real-time classifications, we had one generic neural net [or mathematical model] for all pitchers, but we learned pretty quickly that wouldn’t work because one pitcher’s fastball can approximate another’s changeup,” Cory Schwartz, VP of Stats for MLB.com, explained in an email. “Ultimately, we built a custom neural net for each pitcher and now have one for over 1,100 pitchers.”
In addition to rookie pitchers and their unique arsenals, MLB.com’s models must also be adjusted for pitchers introducing new pitches and tweaking others, which happens regularly. In Lincecum’s case, he cut his slider out of his arsenal for a while, and MLB.com’s mathematical model still thought it saw some.
“It’s an extremely labor-intensive process, but we recognize the importance of accurate classifications, for fans, clubs and industry partners alike, and have invested literally hundreds of man hours into building the most accurate system possible,” Schwartz wrote. “While some pitchers do throw a very distinct repertoire that can be easily classified, many throw multiple pitches that blend together and are extremely difficult to differentiate from other pitch types.”
Harry Pavlidis, founder of Pitch Info LLC, has devoted considerable time to formulating his own PITCHf/x classifications, which now appear at Brooks Baseball. Thanks to the efforts of Schwartz, Pavlidis, and others, pitch classifications have improved dramatically, and I expect them to continue to improve.
A couple PITCHf/x findings
One of my biggest PITCHf/x projects to date has been creating an algorithm that grades a pitcher’s offerings on the 20-80 scouting scale. The first step of this project was gathering and combing through PITCHf/x data to study variables and compare them with visual data. A handful of scouts have also provided input to my study, particularly with which variables they’d focus on and which they wouldn’t. My objective has been to figure out what makes a pitch a swing-and-miss offering. And I’ve walked away from my initial study with three strong variables.
I’ll get the first one out of the way: velocity. The harder a pitcher throws, the more swing-throughs he tends to get. Glad the data agrees there.
The second is also pretty logical. The variable with the single strongest correlation coefficient—stronger than velocity—for what makes a pitch a swing-and-miss offering is the frequency with which the pitcher throws the pitch. Pitchers with good fastballs tend to throw them a lot. Pitchers with below-average fastballs use them more sparingly. Simple enough. Now let’s get to the juicy finding.
I’ve discussed quick-twitch ability, as it pertains to pitchers, with a number of people in baseball. (Hitters with quick-twitch ability are known for being able to generate elite bat speed). A major-league pitching coach told me he thought pitcher quick twitch could be measured by spin rates, with faster arms imparting elite spin. This would then be anticipated to result in elite “life” that might not show up in raw velocity. I put his hypothesis to the test.
To my surprise, my research showed virtually no relationship between PITCHf/x spin rates and swing-through percentage. I was later cautioned by a front office member about the analytical value of PITCHf/x’s current spin rates.
But I discovered something stimulating and unexpected nonetheless: the correlation coefficient for vertical fastball movement is very similar to the correlation coefficient for fastball velocity.
Could vertical fastball movement be a way to roughly quantify fastball life? Do fastballs that remain on a relatively linear path get more swing-throughs than fastballs that suffer the effects of gravity more strongly on their way to the plate?
I don’t know how one pitcher could throw a fastball that decelerates less than others of the same velocity on its way to the plate, but maybe it is quick twitch. And perhaps our tendency to privilege starting versus finishing fastball velocity (out of the hand instead of over the plate) is a roadblock in the way of a deeper understanding of the data.
PITCHf/x and scouting
I’ve been researching prospects for the last six years, mixing quantitative data and first-hand scouting to further my understanding of the game. PITCHf/x has helped me create a template of major-league pitchers that I can use to evaluate prospects.
Paired with video, PITCHf/x can be a great tool to learn to recognize pitches. When I’m first studying up on a big-league pitcher, I’ll watch him while I have Gameday and its pitch classifications open. It’s a quick and easy way to learn to identify his pitches and compare him to his counterparts. Remembering what the best pitches in baseball and their supporting data look like makes it easier to know what to look for from prospects and amateurs.
I also checked in with a few scouts—who have the luxury of reviewing minor league PITCHf/x data—to see how they use PITCHf/x in their scouting.
“Anything that provides supplemental information to blend with what we see is valuable,” the first scout said. “We're constantly comparing players to what ‘major-league average’ is, and PITCHf/x data for prospects can be no different.”
“It's a useful tool to obtain objective information on a pitcher to supplement the info we have from our scouts,” said the second scout. “It’s one of the first steps to help objectively measure pitchers in the way a scout would subjectively. As it gets put into more and more minor-league parks, the more valuable the information will be.”
The idea of objectifying major-league average is at the core of the 20-80 scouting scale and similar efforts. PITCHf/x gives fans and scouts alike an opportunity to quantify scouting. As the PITCHf/x database continues to grow and more information from it is studied, templates to objectively evaluate pitchers with data—like the algorithm I’m working on—will be written. The data is too good for it not to head in that direction.