It’s been an unexpectedly good offseason to be a veteran catcher in Major League Baseball. The Washington Nationals gave 38-year-old Ivan Rodriguez a two-year contract worth $6 million. True, Pudge did win the AL MVP in 1999, but a decade later, he is no longer much of a threat offensively, and his once legendary throwing arm behind the plate has lost some of its thunder. In 2009, 65 percent of would-be basestealers took their base against Pudge, still a good rate, but down from a mere 45 percent early in his career.

A few days later, the Kansas City Royals announced the signing of 35-year-old Jason Kendall to a similar contract. Kendall also had some offensive success as late as 2004, a year in which he posted a .390 OBP. Of late, though, he hasn’t exactly been doing a good Johnny Bench impersonation. Bob Boone, perhaps?

Two guaranteed two-year, multi-million contracts for two catchers who have lost their offensive groove? What gives? After the Rodriguez deal, BP’s Kevin Goldstein suggested that the Nationals might want Rodriguez not so much as a hitter, but as a teacher/mentor to their young pitchers (read: Strasburg, Stephen). One might imagine that the Royals were thinking the same thing about Jason Kendall. The idea is that Rodriguez’s own production might not justify his salary, but if he helps Strasburg and some of the other Nationals pitchers to pitch better, then that will justify the money. At the end of his Unfiltered post, Kevin mentioned that this indirect mentorship effect has never truly been studied.

I considered myself challenged.

No Pitcher Left Behind

How do we know if a teacher is any good? Movie clichés about "giving his all" aside, how do we tell the truly outstanding from the duds? It’s tempting to say, "Look at what their students do," but that can be deceptive. Suppose that I was given a room full of super-geniuses to teach, despite the fact that I have no teaching ability. The kids all ace whatever the standardized test du jour is in spite of me. We could look at their rate of improvement from year to year, but suppose that I were in a district where all the kids improved a great deal, not just the ones in my class. At that point, it’s probably something inherent in the district, rather than my teaching.

Thankfully, there’s a statistical tool that can get around many of these issues. It’s called hierarchical linear modeling (HLM), and it’s how educational policy experts look into whether your kid’s teacher is doing a good job. The gory mathematical details are a little complicated, but here’s the idea: your kid is in a classroom. That classroom is in a school. That school is in a district. On a test, if we see that all of the kids in one classroom did well, but the kids in the other classrooms did poorly, we might logically assume that the teacher in that classroom had something to do with it (or got unbelievably lucky). HLM has a way to parcel out not only how much of the variance can be explained by each level, but also estimate exactly how much of an effect having that teacher/being in that district had. Well, why not substitute "catcher" for "teacher" and see what happens?

I located all teams from 1989-2008 who had a catcher on their roster who was age 32 or older (on Opening Day) and who caught at least 360 innings (40 games) during that season. Just being on the roster was sufficient, as the idea tends to be that the catcher will counsel the baby pitchers in between innings or games, whether or not he actually plays. If a team had more than one catcher/mentor, I took the elder of the two. Next, I looked for all pitchers on those teams who were age 27 and under (again, as of Opening Day), faced a minimum of 250 batters within the season, and did not switch teams during the season.

Warning: the following technobabble is given first in Slavonic, then in English. Viewer Discretion is advised.

I then ran a hierarchical model that pulled apart the contribution of the catcher/mentor just being there. The gory details, for the interested: I used the pitcher as the subject, age during the season as the index/repeated variable, and an auto-regressive first order – AR(1) – covariance matrix. I used walk rate as my first dependent variable and set the intercept to vary randomly. I set the identity of the catcher/mentor as a fixed effect and asked for parameter estimates for each qualifying catcher. A catcher had to appear as a mentor in ten player-seasons to qualify. I ran separate models for walk rate (BB/BFP) and strikeout rate (K/BFP) as the dependent variables.

Translation: I wanted to find out what effect the catcher/mentor had, if any, on the strikeout and walk rates of the young pitchers on his team. If a pitcher comes into MLB already striking out 20 percent of the batters that he faces, the model will know that because of the AR(1) covariance matrix. Since we have a longitudinal model (a pitcher can be in it for his age-24, -25, and -26 seasons), we can see how pitchers change over time. This is the tweak that allows us to correct for that and not to over-credit the catcher. The fixed effect for a catcher sounds a lot more complicated than it is. If you’ve run a simple linear regression, it’s a fixed effect regression. I want to know what the coefficients are for "catcher = Jason Kendall" or "catcher = Ivan Rodriguez." If the catcher is having an effect across all (or most) of the pitchers with whom he interacts, then it will show up here.

The output that this particular configuration gives can be read like this. Let’s say that I took an average pitcher from this group (broke into the majors before 28, faced 250 batters during the season). If all I knew was the identity of his catcher/mentor, what would I predict the pitcher’s walk and strikeout rate to be? The results, from 1989-2008:

Best for strikeouts       Worst for strikeouts 
Jason Varitek  21.57%     Jeff Reed      13.60%
Joe Girardi    19.52%     Chad Kreuter   14.71%
Bengie Molina  19.05%     Carlton Fisk   14.68%
Henry Blanco   18.73%     Gary Bennett   14.88%
Jason Kendall  18.71%     Lance Parrish  15.21%

Best for walks            Worst for walks 
Mike Redmond    7.67%     Javy Lopez     10.64%
Gary Bennett    8.13%     Bengie Molina  10.06%
Gregg Zaun      8.15%     Jeff Reed       9.93%
Jason LaRue     8.22%     Chad Kreuter    9.91%
Jason Kendall   8.30%     Henry Blanco    9.90%

I also included a category for a pitcher who had no catcher/mentor as a baseline. That is, a pitcher whose team didn’t employ a catcher above the age of 32. That group had anticipated rates of 9.21 percent for walks and 16.27 percent for strikeouts. Jason Kendall appears on both lists in the best-of column, which suggests that, over time, his presence has made the pitchers with whom he has worked better. Luke Hochevar and Kyle Davies will be happy. (Kendall will also catch Zack Greinke, who technically fits the "under 27" category, but I think Greinke’s doing OK for himself.) Pudge Rodriguez, on the other hand, checked in with a walk rate of 9.29 percent and a strikeout rate of 17.05 percent. The strikeout rate was good enough for 13th place (of 38 catchers and the "blank" category.) The walk rate was below average. While he may carry a good reputation as a good catcher/mentor, the numbers just don’t bear it out.

The free-agent catcher no one has mentioned makes an interesting guest appearance: Mike Redmond. He appears to be very good at teaching pitchers how to not walk batters. He’s never been a gifted hitter, but when the other catcher on your team is name Joe Mauer, it’s not that big a deal. Redmond turns out to be in the middle of the pack in reducing strikeouts, but he might make a good pick for a team that’s worried about its young pitchers issuing too many free passes.


A few words on taking those numbers and running with them: first, consider that we are dealing with a very small sample size. Putting faith in 12 pitcher-seasons worth of data is always a little dubious. Standard warnings about small sample sizes apply, both in terms of the confidence to place in those numbers and the scale of the numbers. According to these numbers, Jason Kendall’s presence is worth a one percent drop in walk rate for the young pitchers with whom he works, and a 2.5 percent jump in strikeout rate. Intuitively, it seems like a lot to ask. Indeed, highly structured models can get a little wacky when faced with small sample sizes.

Second, there’s a giant confounding variable. One reason that Mike Redmond might be such a powerful force for reducing walks is that "Mike Redmond after age 32 working with young pitchers" is a decent proxy for "the young kids that the Twins have brought up over the past few years." It’s hard to disentangle those variables. The model assumes that young pitchers are assigned randomly to the various catchers, which is absurd. Some teams do a better job stocking up on young pitching talent. Others draft and promote based on a statistical profile. The Twins in particular brought up Nick Blackburn, Scott Baker, and Matt Garza (before the trade), all of whom have relatively low walk rates. Is that Redmond’s influence, or is the model over-crediting him for the decisions of Twins management? It’s hard to tell.

Finally, I ran another HLM model, asking this time for the computer to tell me what percentage of the variance in the model was accounted for by the identity of the catcher/mentor. (For the initiated, I switched it over to a random effect.) The answer turned out to be a little bit less than one percent. However, the model suggested that about 40 percent of the variance was due to the pitchers’ own abilities. Presumably, the rest can be explained by extra-model variables and random error. So, even if Jason Kendall is as good as advertised by the model, his contributions are only a small part of the equation in helping young pitchers to develop.

Russell A. Carleton, the writer formerly known as 'Pizza Cutter,' is a contributor to Baseball Prospectus. He can be reached here.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
While it might make for a cute little section of writing, I have to call the un-sourced, over-simplified two-paragraph section "No Pitcher Left Behind" irresponsible writing. The reason, as you say at the end, you can't separate Redmond's influence on the Twins' pitchers from those pitchers themselves is the same reason why HLM is a dead end for educational policy "experts." Class-by-class achievement is inherently small sample size, and, when measuring by poorly designed standardized testing produced by companies working for profit and looking at the performance of students that are themselves the product of a system that influences them by over-many variables (not simply the teacher of their subject in that year), the information received is inconclusive and, often, misleading.
A cute little tool, but it seems if you'd run the second scenario (percentage of variance accounted for by mentor identity) then it makes the first chart pretty irrelevant.

Nonetheless, I'm kind of interested in seeing a best/worst HR rate on the theory that a catcher/mentor with a high home run rate is calling for the wrong pitch more often.
This was going to be my point exactly. I know that if I ran a multiple linear regression model and found the r-squared (adjusted or otherwise) to be a mere 1%, I'd be looking elsewhere before I cared much about the significant factors in the model. But I can't say I'm an expert statistician.

Russell: I think this is important to address. It is sort of like putting a lot of work out there, and then saying "oh, but the data is deeply flawed so you can't trust any of it".
I think there's some value in that process though. One of the things that I think has happened in Sabermetrics is that we've valued construct validity (it makes sense in our heads) over environmental validity (it actually makes a difference out there in the real world.)
Well, I'd get slaughtered in my business if I weren't up front about the conclusion. Hey, even a "no findings" conclusion is worth telling a story and thought process, so that others can think about how else to look at the issue.
This does seem like measuring the effects of butterfy drafts on hurricanes. When a team makes an inexplicable signing, the simplest explanation is often the best.

Importing talent validated by others as MLB caliber reduces the perception that people are actually watching a bunch of homegrown players thet aren't. The trick is to get them while they still have recognizable names, but are actually willing to play for you.

Catchers tend to stick around. Don't have to pull them out of retirement.
IIRC, the idea that Rodriguez was signed simply to be a draw at the gate was floated as well. Seems likely that it's about the only benefit that the Nats will derive from him.
I don't really buy the fan argument. Rodriguez didn't sell out Rangers/Tigers games based on some yearning desire to see him, even when he was elite.

If it's a Latino factor, Stealing from Wikipedia, as of 2006, 51.7% of the Washington DC metro area is white, 26.3% is black and 11.6$ is Hispanic.( I don't know if more of that 11.6% will show up in droves to see Rodriguez ground weakly to second base. Even if that did, are there enough of those Latino fans that would come to the park that would offset the price of his contract?

That might be an interesting bit of math... how many more fans need to come to the park for a team to justify an extra $1 million in salary? How many Japanese fans in the LAX area, who were not going to baseball games before, will now attend Angels games because of Hideki Matsui?
Or how about, How many fans buy a ticket just to see a particular player?

Assumption no. 2: Demographically, I'd bet the older the fan, the more likely they pay to see the game, versus a particular player.
I go to 5-10 Nationals games a year, and I find it hard to believe that any of the few dozen people in the stands are there because a particular National is playing.

(Caveat: not including Stephen Strasburg when he first comes up)
This is another way to test whether there's a "Catcher's ERA" -- a hypothesis that has not found support in previous research by Woolner, among others. Brief summary here in Wikipedia:
I love technobabble in Slavonic! Hospodin pomiluj...

I have to say, though, that the people noting that explaining 1% of observed variation as a random effect pretty much makes any conclusions drawn the science equivalent of tea leaves and goat entrails.
Christos Razdajetsja! (a few days early.)
As in any of these sorts of analyses, the key questions is whether your model is correctly specified: if you have correctly included all of the variables that might lead the 2004 Twins pitchers to be different from the 2007 Rangers pitchers, then the remaining difference might be attributable to the catcher. This is especially important, though, when one is running a fixed-effects model since the fixed-effects tend to soak up quite easily the effects of omitted variables. The fact that the random-effects model suggested catcher had little importance makes me think it is omitted variable bias that leads to why the fixed-effects model suggests catchers do matter. (There are tests for random vs. fixed effects, like the Hausman test. Have you tried these?)

Given how unlikely it is to correctly specify this type of model, though, I'd think a matching strategy might be better where you match similar pitchers (based on minor-league track record to get rid of endogeneity) and see the extent to which catcher identity affects development.
And right there, you nailed down the struggle that I have when I do this sort of work. I did have the thought of trying to use some sort of MLE at the time of being brought up as a control on the model, but the truth is I'm awful at MLE's.

I did some earlier work of this type with managers and tried specifying for pitcher age, home ballpark, and year-league, as well as using an AR(1) covariance matrix, but even then, the model was acting a little screwy. I'm not familiar with the test you suggested (I have to admit, it's been a while since I was fully immersed in HLM) but I'll check it out. Thanks for the tip.

The biggest problem though is that the ommitted variables that "catcher mentor" picks up on, as I mentioned in connection with M. Redmond are organizational variables. What sort of guys do the Twins draft. Whom do they promote? Whom do they keep around? I suppose that an MLE approach might answer pieces of those questions. But how to quantify the rest???

Maybe I should just stick to t-tests!
Since you started the slavonic technobabble, here goes:

A hausman test isn't anything specific to HLM's, but rather is a more general specification test: essentially, it compares the coefficients in the fixed-effects model to the coefficients in the random-effects model (or any two models) and if it determines that they're different from each other, concludes that one set of estimates must be inconsistent. (In this context, a hausman test assumes that fixed-effects estimates are consistent and tests random-effects against them. But if it finds that random-effects aren't consistent, there's a good chance this means that you should be controlling for additional variables in the model and that the fixed-effects are sucking up the explanatory value of these uncontrolled variables.) In many stats packages, like stata, it's actually a simple post-estimation command.

Also, there's no reason why you can't include additional covariates in an HLM set-up. Most software packages that estimate HLMs allow for pretty flexible specification of which level each control variable should enter (and even allows for interacting them across levels.)

But I think in the end, your suspicion is correct that it will be impossible to control for everything, which is why I think matching is the way to go. Matching techniques can get horribly complicated, though.
Is not being able to disentangle Redmond from the Twins organization intractable?

If he's only been with them, something like a dummy variable for team wouldn't work.

Also, if I understand correctly, the premise of mentoring is that the catcher wouldn't necessarily have to be in that many games (360 innings), so pitch calling and its results on walk, strike out, or home run rate (as suggested by Richard) is only one contributory activity, with dugout and clubhouse "presence" and teaching being a major one. Maybe split the sample more and look at the true "mentors" (maybe between 40 and 81 games equivalent) vs. starting catchers?

Also made me wonder if there's a better metric, maybe one that would need to be devised or derived, for measuring the mentoring effect.
Why do the catchers have to be (in baseball terms) elderly in order to be considered for this "mentor effect"?

What about the universe of catchers?
Hi Russell:

I understand what you are trying to do, but I simply don't see a co-relation in the variables as postulated.

For the study to be relevant, you would have to find out from Nat pitchers:

1. What IRod taught them ie. additional pitches or
2. Variance in pitching patterns with IRod behind the plate.

3. The impact on the staff of item #1 #2, above. You might be able to compare pitch selection with IRod vs. the other Nat catchers maybe using Pitch FX data.