CSS Button No Image Css3Menu.com

Baseball Prospectus home
  
  
Click here to log in Click here for forgotten password Click here to subscribe

<< Previous Article
Premium Article What You Need to Know:... (05/16)
<< Previous Column
Premium Article The Stats Go Marching ... (02/26)
Next Column >>
The Stats Go Marching ... (06/13)
Next Article >>
Premium Article On the Beat: The Most ... (05/16)

May 16, 2013

The Stats Go Marching In

Catcher Framing Before PITCHf/x

by Max Marchi

Analysis of framing has intensified over the past couple of years, with Joe Maddon talking about it on the radio and (via Ben Lindbergh) Clubhouse Confidential and MLB Network’s Diamond Demo series featuring discussions of the issue with guests like Jonathan Lucroy. Ben has been running a weekly column on the subject since the start of the season: in the first installment (as well as this piece for Grantland) he provided some background on the research so far, so you’re invited to have a look at that article before you read the rest of this one.

Framing evaluation is one of those research subjects that has been made possible by PITCHf/x data, which means that we’re now into the sixth full season for which catcher framing can be measured. However, for quite some time, I’ve been thinking about this: if one could get a good approximation of the framing numbers just using Retrosheet pitch sequences, 20 years of catcher framing could be added to the discussion. When Ben jogged my memory recently, I decided it was time to stop thinking about it and start doing some number-crunching.

The Method
Going back to 1988, Retrosheet has data with a fair degree of completeness for pitch sequences, indicating the outcome (ball, called strike, swinging strike, foul, and so on) of every pitch thrown.

For each plate appearance, I counted the number of pitches not featuring a swing by the batter (basically balls and called strikes), with the useful Chadwick Tools saving me a lot of time and work.

In the original model I created with PITCHf/x data, in addition to using the location coordinates as measured by the camera system and the pitch type as classified by the MLBAM algorithm, I controlled for the effect of the ball/strike count, the home plate umpire, the pitcher, and the batter—plus, obviously, the catcher.

Since that model requires a lot of computing time, in order to update my numbers once in a while, I switched to a simpler but quicker model in which the pitcher and the batter are not accounted for. In fact, once the location and pitch type are factored in, the batter has very little effect on the call by the umpire (mostly due to his stance and proximity to the plate, I suppose). The effect of the pitcher is also reduced, and I decided that the tradeoff between accuracy and computing time was worth the exclusion. However, with Retrosheet data, we have no information on pitch location and type, so throwing the pitcher and the batter back into the model was necessary.

In short, for every plate appearance I have the percentage of strikes on pitches not swung at as the outcome variable and the four actors involved (pitcher, catcher, umpire, batter) as the predictors. As I have done many other times in my baseball analysis, I have used a Cross-Classified Multilevel Mixed Model, which for saber-oriented people I’ll call WOWY-on-steroids.

Note that when using PITCHf/x data, an extra strike is more or less attributable to something framing-related, being it a good reception by the catcher, the pitcher hitting the target, or the umpire being deceived (or, more likely, a combination of the three). However, when no information is available about location, several other factors come into play: among the called strikes are, for example, pitches thrown right down Broadway that may have not been swung at because of the batter’s tendencies (partly accounted for as the batter is in the model) or because great sequencing has fooled the batter. Thus, this version of framing might include at least some pitch-sequencing effect as well.

Comparing Retrosheet and PITCHf/x numbers
Obviously, the first thing to do before calculating and showing numbers going back to 1988 is to test how the rankings based on Retrosheet-only data compare with the PITCHf/x version for the years that have the more detailed data.

Let’s start by showing a scatterplot featuring framing runs saved (prorated to 5,000 pitches caught*) by catchers in the seasons from 2008 to 2012. The darker dots denote a higher number of pitches caught, signifying more reliable estimates.

* Keep in mind that from here on, when I write “pitches caught” I really mean “pitches caught with no swing attempt by the batter.”

Not a bad start. The chart displays a good agreement between the two different models; the Pearson correlation coefficient, weighted for the number of pitches caught, is a healthy 0.72.

One important difference between the two methods is the distribution of ratings. The PITCHf/x-based numbers are more dispersed: when one considers catcher-seasons with at least 1500 pitches caught, the standard deviation is close to 13 runs for the PITCHf/x numbers and about 7.5 for the Retrosheet ones. That means the Retrosheet-based values (I’ll call them “RetroFraming”) will yield more conservative results.

Given the good agreement of RetroFraming with the PITCHf/x-based numbers, we can move on to showing some numbers going back to 1988, keeping in mind that we’ll less likely see extreme values with this metric.

Single-season achievements
The best catcher-framing season of the last quarter century belongs to Brad Ausmus, with 36 runs saved for the 2000 Detroit Tigers.

Here a note is due. In the previous section, I warned that RetroFraming numbers give more conservative results: in fact, there is no trace of a 50-run season. A recent revision of my algorithm has changed Jose Molina’s PITCHf/x framing value for 2012 to 41 runs, but that would still make it higher than Ausmus’ 2000. RetroFraming has Molina’s 2012 at 25 runs saved, which is quite a difference.

I know such discrepancies can be enough for some people to turn away altogether from this article and others on framing, as they often do when two play-by-play-based fielding metrics disagree on an evaluation of any position player. However, what I make of these numbers is this:

  • There are two metrics that strongly agree: no catcher over the past five years is rated above average by one and below average by the other.
     
  • According to either method, a good framing catcher can be expected to bring his team a handful of extra wins in a single season.
     
  • The PITCHf/x-based method is more precise and less likely to be pulling in other aspects of a catcher’s defensive performance, so for seasons where both methods are available, I would tend to trust its output over the Retrosheet estimate. If you’re skeptical that the big numbers associated with the PITCHf/x approach could be accurate, Mitchel Lichtman’s testing from last year might lay some of your concerns to rest.
     
  • Teams with analytically minded front offices are already making seven-figure decisions based on numbers like these.

Enough talk—here are the 20 best RetroFraming seasons since 1988.

Catcher

season

pitches

Run Value

Brad Ausmus

2000

10863

36

Russell Martin

2008

10861

30

Jason Varitek

2007

9404

26

Jose Molina

2012

6347

25

Joe Mauer

2010

7778

23

Paul LoDuca

2003

9057

23

Javier Lopez

2000

9615

23

Brad Ausmus

2005

8576

22

Jose Molina

2008

6665

22

Johnny Estrada

2004

9127

21

Brad Ausmus

2006

9282

20

Jonathan Lucroy

2011

9637

20

Jason Varitek

2002

9202

19

Ramon Hernandez

2002

9298

19

Tony Pena

1991

9954

18

Joe Mauer

2005

7461

18

Mike Scioscia

1989

8353

18

Ramon Hernandez

2001

9756

18

Russell Martin

2010

7138

18

Carlton Fisk

1990

7733

18

At age 40 Carlton Fisk was still capable of a top-20 season. In a subsequent section, I’ll take a look at aging curves for the framing skill.

In case you’re wondering, the worst season belongs to framing whipping-boy Ryan Doumit (2008) by a mile, with Jason Kendall (2000) and Jorge Posada (2005) just a bit better.

Career framers
Ausmus also gets the career laurel as the cumulative king of framing for the past quarter century. In an 18-year career behind the plate, he added roughly one win per season through his ability to earn extra strike calls. Once more, the purported divide between scouting and statistical analysis is revealed to be a false one: way before numbers-based discussions on framing were made, teams were willing to give playing time to weak-hitting catchers like Ausmus because of their defensive ability.

Catcher

pitches

Run value

Brad Ausmus

135045

179

Jose Molina

49116

122

Jason Varitek

107444

111

Joe Mauer

58510

102

Russell Martin

67441

99

Javier Lopez

94920

89

Yadier Molina

76486

87

Tony Pena

68627

83

Mike Piazza

113843

78

Charlie O'Brien

47143

73

Jose Molina is a solid second, despite much more limited playing time. In fact, over the same amount of playing time, we’d estimate Molina to be close to twice as valuable as Ausmus. Below is the Top 10 list for prorated (to 5,000 pitches caught) values, minimum 25,000 pitches.

Catcher

pitches

Run value / 5000 pitches

Alberto Castillo

25686

13

Jose Molina

49116

12

Sal Fasano

26642

10

Joe Mauer

58510

9

Johnny Estrada

41128

8

Charlie O'Brien

47143

8

Todd Pratt

33520

7

Ryan Hanigan

26178

7

Russell Martin

67441

7

Mike Scioscia

37835

7

At the bottom of the list, depending on whether you prefer the counting stat or the prorated version, are either Charles Johnson (costing more than a win per year for 12 seasons) or, once more, Ryan Doumit.

Year-to-year correlation
So what do we do with 25 seasons of ratings? The first thing I thought of is running a year-to-year correlation. I did the usual matching of every catcher with his previous-year-self and produced the following plot, which shows the year-to-year correlation for runs saved per 5,000 pitches caught. Again, the shading of dots indicates the underlying number of pitches (minimum between the two seasons considered). The weighted Pearson correlation coefficient is 0.52.

A look at aging
The second analysis it made sense to perform with 25 available seasons is an exploration of aging. I looked at the subject through a few different statistical lenses, but the results were fairly consistent. Basically, the aging effect is very small, with no more than two runs separating the prime from the career nadir. Below is a chart showing an estimated career curve, featuring a slight improvement until age 25, followed by a gentle decline.

Below are charts for a few interesting careers. In each one of them, the dots indicate the seasonal ratings, the thinner line is a smooth curve through the data points based on the displayed catcher’s data only, and the thicker line makes use of data coming from the other catchers as well (sort of regressing the curve).

Here’s Jose Molina, who just keeps getting better:

Ausmus also improved throughout his career:

Posada, on the other hand, displayed a declining trend:

 

Finally, Piazza’s numbers were consistent throughout his career:

What’s next?
So far I’ve been reluctant to combine game-calling numbers with PITCHf/x-based framing ratings because they’re derived from different sources, with different levels of granularity. But with the framing approach presented here, I now feel more comfortable in subtracting framing from what I termed game-calling, which actually was more of a sum of framing plus calling. Thus, in the future I plan to explore the quantification of game-calling further.

In this article I’ve used pitch-by-pitch data without PITCHf/x information to generate historical leaderboards. However, this kind of data is also available for Minor League Baseball going back a handful of years, so numbers like those shown above can be calculated for lower levels of baseball as well. In that way, good framing catchers might be identified before they reach The Show. And while it might be a long time before we see ubiquitous pitch-tracking technology in the college game, recording pitch outcomes is much more feasible, meaning that teams might even use this information for drafting purposes.

Incidentally, while refining this article, I mentioned its contents to a baseball insider (who obviously will go unnamed here), and he stated, “It's an idea potentially worth millions of dollars.” So, clubs with college pitch-by-pitch data: feel free to knock at my door.

Related Content:  Defense,  Catcher Framing

18 comments have been left for this article.

<< Previous Article
Premium Article What You Need to Know:... (05/16)
<< Previous Column
Premium Article The Stats Go Marching ... (02/26)
Next Column >>
The Stats Go Marching ... (06/13)
Next Article >>
Premium Article On the Beat: The Most ... (05/16)

RECENTLY AT BASEBALL PROSPECTUS
Premium Article Transaction Analysis: What the Rays and Nati...
Premium Article Rumor Roundup: Kenta Maeda Will Not Be Appea...
Premium Article Transaction Analysis: Live That Fantasy
Premium Article Pitching Backward: Brandon McCarthy and the ...
Premium Article Transaction Analysis: Bringing the Band Back...
Premium Article Raising Aces: Best and Worst Mechanics: NL W...
Premium Article Transaction Analysis: Catchin' Relief

MORE FROM MAY 16, 2013
Premium Article Overthinking It: The Mystique and Aura of th...
Premium Article On the Beat: The Most Improved Players
Premium Article What You Need to Know: The Price is Not Righ...
Premium Article The Prospectus Hit List: Thursday, May 16
Fantasy Article Bullpen Report: Diamondbacks Settle on Bell
Premium Article Daily Roundup: Around the League: May 16, 20...
Fantasy Article Free Agent Watch: National League, Week Seve...

MORE BY MAX MARCHI
2014-02-25 - BP Announcements: Another Analyst Gets the C...
2013-08-07 - The Stats Go Marching In: Is it Time to Lift...
2013-06-13 - The Stats Go Marching In: Measuring Catcher ...
2013-05-16 - The Stats Go Marching In: Catcher Framing Be...
2013-02-26 - Premium Article The Stats Go Marching In: Who's Ahead of Who...
2012-09-07 - Premium Article The Stats Go Marching In: Four Questions for...
2012-08-24 - The Stats Go Marching In: Do Pitchers Forget...
More...

MORE THE STATS GO MARCHING IN
2013-08-07 - The Stats Go Marching In: Is it Time to Lift...
2013-06-13 - The Stats Go Marching In: Measuring Catcher ...
2013-05-16 - The Stats Go Marching In: Catcher Framing Be...
2013-02-26 - Premium Article The Stats Go Marching In: Who's Ahead of Who...
2012-09-07 - Premium Article The Stats Go Marching In: Four Questions for...
2012-08-24 - The Stats Go Marching In: Do Pitchers Forget...
More...

INCOMING ARTICLE LINKS
2014-02-01 - BP Announcements: SABR Analytics Conference ...
2013-12-19 - Transaction Analysis: The 13-Team TA
2013-11-25 - Premium Article Transaction Analysis: Yankees Sign Brian McC...
2013-11-25 - Premium Article Transaction Analysis: Rays Bring Back Molina...
2013-08-07 - The Stats Go Marching In: Is it Time to Lift...
2013-06-15 - Overthinking It: This Week in Catcher Framin...
2013-06-13 - The Stats Go Marching In: Measuring Catcher ...
2013-06-08 - Overthinking It: This Week in Catcher Framin...
2013-05-20 - Premium Article Prospectus Q&A: The College of Coaches on Ca...
2013-05-18 - Overthinking It: This Week in Catcher Framin...