Analysis of framing has intensified over the past couple of years, with Joe Maddon talking about it on the radio and (via Ben Lindbergh) Clubhouse Confidential and MLB Network’s Diamond Demo series featuring discussions of the issue with guests like Jonathan Lucroy. Ben has been running a weekly column on the subject since the start of the season: in the first installment (as well as this piece for Grantland) he provided some background on the research so far, so you’re invited to have a look at that article before you read the rest of this one.
Framing evaluation is one of those research subjects that has been made possible by PITCHf/x data, which means that we’re now into the sixth full season for which catcher framing can be measured. However, for quite some time, I’ve been thinking about this: if one could get a good approximation of the framing numbers just using Retrosheet pitch sequences, 20 years of catcher framing could be added to the discussion. When Ben jogged my memory recently, I decided it was time to stop thinking about it and start doing some numbercrunching.
The Method
Going back to 1988, Retrosheet has data with a fair degree of completeness for pitch sequences, indicating the outcome (ball, called strike, swinging strike, foul, and so on) of every pitch thrown.
For each plate appearance, I counted the number of pitches not featuring a swing by the batter (basically balls and called strikes), with the useful Chadwick Tools saving me a lot of time and work.
In the original model I created with PITCHf/x data, in addition to using the location coordinates as measured by the camera system and the pitch type as classified by the MLBAM algorithm, I controlled for the effect of the ball/strike count, the home plate umpire, the pitcher, and the batter—plus, obviously, the catcher.
Since that model requires a lot of computing time, in order to update my numbers once in a while, I switched to a simpler but quicker model in which the pitcher and the batter are not accounted for. In fact, once the location and pitch type are factored in, the batter has very little effect on the call by the umpire (mostly due to his stance and proximity to the plate, I suppose). The effect of the pitcher is also reduced, and I decided that the tradeoff between accuracy and computing time was worth the exclusion. However, with Retrosheet data, we have no information on pitch location and type, so throwing the pitcher and the batter back into the model was necessary.
In short, for every plate appearance I have the percentage of strikes on pitches not swung at as the outcome variable and the four actors involved (pitcher, catcher, umpire, batter) as the predictors. As I have done many other times in my baseball analysis, I have used a CrossClassified Multilevel Mixed Model, which for saberoriented people I’ll call WOWYonsteroids.
Note that when using PITCHf/x data, an extra strike is more or less attributable to something framingrelated, being it a good reception by the catcher, the pitcher hitting the target, or the umpire being deceived (or, more likely, a combination of the three). However, when no information is available about location, several other factors come into play: among the called strikes are, for example, pitches thrown right down Broadway that may have not been swung at because of the batter’s tendencies (partly accounted for as the batter is in the model) or because great sequencing has fooled the batter. Thus, this version of framing might include at least some pitchsequencing effect as well.
Comparing Retrosheet and PITCHf/x numbers
Obviously, the first thing to do before calculating and showing numbers going back to 1988 is to test how the rankings based on Retrosheetonly data compare with the PITCHf/x version for the years that have the more detailed data.
Let’s start by showing a scatterplot featuring framing runs saved (prorated to 5,000 pitches caught*) by catchers in the seasons from 2008 to 2012. The darker dots denote a higher number of pitches caught, signifying more reliable estimates.
* Keep in mind that from here on, when I write “pitches caught” I really mean “pitches caught with no swing attempt by the batter.”
Not a bad start. The chart displays a good agreement between the two different models; the Pearson correlation coefficient, weighted for the number of pitches caught, is a healthy 0.72.
One important difference between the two methods is the distribution of ratings. The PITCHf/xbased numbers are more dispersed: when one considers catcherseasons with at least 1500 pitches caught, the standard deviation is close to 13 runs for the PITCHf/x numbers and about 7.5 for the Retrosheet ones. That means the Retrosheetbased values (I’ll call them “RetroFraming”) will yield more conservative results.
Given the good agreement of RetroFraming with the PITCHf/xbased numbers, we can move on to showing some numbers going back to 1988, keeping in mind that we’ll less likely see extreme values with this metric.
Singleseason achievements
The best catcherframing season of the last quarter century belongs to Brad Ausmus, with 36 runs saved for the 2000 Detroit Tigers.
Here a note is due. In the previous section, I warned that RetroFraming numbers give more conservative results: in fact, there is no trace of a 50run season. A recent revision of my algorithm has changed Jose Molina’s PITCHf/x framing value for 2012 to 41 runs, but that would still make it higher than Ausmus’ 2000. RetroFraming has Molina’s 2012 at 25 runs saved, which is quite a difference.
I know such discrepancies can be enough for some people to turn away altogether from this article and others on framing, as they often do when two playbyplaybased fielding metrics disagree on an evaluation of any position player. However, what I make of these numbers is this:
 There are two metrics that strongly agree: no catcher over the past five years is rated above average by one and below average by the other.
 According to either method, a good framing catcher can be expected to bring his team a handful of extra wins in a single season.
 The PITCHf/xbased method is more precise and less likely to be pulling in other aspects of a catcher’s defensive performance, so for seasons where both methods are available, I would tend to trust its output over the Retrosheet estimate. If you’re skeptical that the big numbers associated with the PITCHf/x approach could be accurate, Mitchel Lichtman’s testing from last year might lay some of your concerns to rest.
 Teams with analytically minded front offices are already making sevenfigure decisions based on numbers like these.
Enough talk—here are the 20 best RetroFraming seasons since 1988.
Catcher 
season 
pitches 

Brad Ausmus 
2000 
10863 
36 
2008 
10861 
30 

2007 
9404 
26 

Jose Molina 
2012 
6347 
25 
2010 
7778 
23 

Paul LoDuca 
2003 
9057 
23 
2000 
9615 
23 

Brad Ausmus 
2005 
8576 
22 
Jose Molina 
2008 
6665 
22 
2004 
9127 
21 

Brad Ausmus 
2006 
9282 
20 
2011 
9637 
20 

Jason Varitek 
2002 
9202 
19 
2002 
9298 
19 

1991 
9954 
18 

Joe Mauer 
2005 
7461 
18 
1989 
8353 
18 

Ramon Hernandez 
2001 
9756 
18 
Russell Martin 
2010 
7138 
18 
1990 
7733 
18 
At age 40 Carlton Fisk was still capable of a top20 season. In a subsequent section, I’ll take a look at aging curves for the framing skill.
In case you’re wondering, the worst season belongs to framing whippingboy Ryan Doumit (2008) by a mile, with Jason Kendall (2000) and Jorge Posada (2005) just a bit better.
Career framers
Ausmus also gets the career laurel as the cumulative king of framing for the past quarter century. In an 18year career behind the plate, he added roughly one win per season through his ability to earn extra strike calls. Once more, the purported divide between scouting and statistical analysis is revealed to be a false one: way before numbersbased discussions on framing were made, teams were willing to give playing time to weakhitting catchers like Ausmus because of their defensive ability.
Catcher 
pitches 
Run value 
Brad Ausmus 
135045 
179 
Jose Molina 
49116 
122 
Jason Varitek 
107444 
111 
Joe Mauer 
58510 
102 
Russell Martin 
67441 
99 
Javier Lopez 
94920 
89 
76486 
87 

Tony Pena 
68627 
83 
113843 
78 

47143 
73 
Jose Molina is a solid second, despite much more limited playing time. In fact, over the same amount of playing time, we’d estimate Molina to be close to twice as valuable as Ausmus. Below is the Top 10 list for prorated (to 5,000 pitches caught) values, minimum 25,000 pitches.
Catcher 
pitches 
Run value / 5000 pitches 
25686 
13 

Jose Molina 
49116 
12 
26642 
10 

Joe Mauer 
58510 
9 
Johnny Estrada 
41128 
8 
Charlie O'Brien 
47143 
8 
33520 
7 

26178 
7 

Russell Martin 
67441 
7 
Mike Scioscia 
37835 
7 
At the bottom of the list, depending on whether you prefer the counting stat or the prorated version, are either Charles Johnson (costing more than a win per year for 12 seasons) or, once more, Ryan Doumit.
Yeartoyear correlation
So what do we do with 25 seasons of ratings? The first thing I thought of is running a yeartoyear correlation. I did the usual matching of every catcher with his previousyearself and produced the following plot, which shows the yeartoyear correlation for runs saved per 5,000 pitches caught. Again, the shading of dots indicates the underlying number of pitches (minimum between the two seasons considered). The weighted Pearson correlation coefficient is 0.52.
A look at aging
The second analysis it made sense to perform with 25 available seasons is an exploration of aging. I looked at the subject through a few different statistical lenses, but the results were fairly consistent. Basically, the aging effect is very small, with no more than two runs separating the prime from the career nadir. Below is a chart showing an estimated career curve, featuring a slight improvement until age 25, followed by a gentle decline.
Below are charts for a few interesting careers. In each one of them, the dots indicate the seasonal ratings, the thinner line is a smooth curve through the data points based on the displayed catcher’s data only, and the thicker line makes use of data coming from the other catchers as well (sort of regressing the curve).
Here’s Jose Molina, who just keeps getting better:
Ausmus also improved throughout his career:
Posada, on the other hand, displayed a declining trend:
Finally, Piazza’s numbers were consistent throughout his career:
What’s next?
So far I’ve been reluctant to combine gamecalling numbers with PITCHf/xbased framing ratings because they’re derived from different sources, with different levels of granularity. But with the framing approach presented here, I now feel more comfortable in subtracting framing from what I termed gamecalling, which actually was more of a sum of framing plus calling. Thus, in the future I plan to explore the quantification of gamecalling further.
In this article I’ve used pitchbypitch data without PITCHf/x information to generate historical leaderboards. However, this kind of data is also available for Minor League Baseball going back a handful of years, so numbers like those shown above can be calculated for lower levels of baseball as well. In that way, good framing catchers might be identified before they reach The Show. And while it might be a long time before we see ubiquitous pitchtracking technology in the college game, recording pitch outcomes is much more feasible, meaning that teams might even use this information for drafting purposes.
Incidentally, while refining this article, I mentioned its contents to a baseball insider (who obviously will go unnamed here), and he stated, “It's an idea potentially worth millions of dollars.” So, clubs with college pitchbypitch data: feel free to knock at my door.