December 20, 2009
Will You Be My Mentor?
It’s been an unexpectedly good offseason to be a veteran catcher in Major League Baseball. The Washington Nationals gave 38-year-old Ivan Rodriguez a two-year contract worth $6 million. True, Pudge did win the AL MVP in 1999, but a decade later, he is no longer much of a threat offensively, and his once legendary throwing arm behind the plate has lost some of its thunder. In 2009, 65 percent of would-be basestealers took their base against Pudge, still a good rate, but down from a mere 45 percent early in his career.
A few days later, the Kansas City Royals announced the signing of 35-year-old Jason Kendall to a similar contract. Kendall also had some offensive success as late as 2004, a year in which he posted a .390 OBP. Of late, though, he hasn’t exactly been doing a good Johnny Bench impersonation. Bob Boone, perhaps?
Two guaranteed two-year, multi-million contracts for two catchers who have lost their offensive groove? What gives? After the Rodriguez deal, BP’s Kevin Goldstein suggested that the Nationals might want Rodriguez not so much as a hitter, but as a teacher/mentor to their young pitchers (read: Strasburg, Stephen). One might imagine that the Royals were thinking the same thing about Jason Kendall. The idea is that Rodriguez’s own production might not justify his salary, but if he helps Strasburg and some of the other Nationals pitchers to pitch better, then that will justify the money. At the end of his Unfiltered post, Kevin mentioned that this indirect mentorship effect has never truly been studied.
I considered myself challenged.
No Pitcher Left Behind
How do we know if a teacher is any good? Movie clichés about "giving his all" aside, how do we tell the truly outstanding from the duds? It’s tempting to say, "Look at what their students do," but that can be deceptive. Suppose that I was given a room full of super-geniuses to teach, despite the fact that I have no teaching ability. The kids all ace whatever the standardized test du jour is in spite of me. We could look at their rate of improvement from year to year, but suppose that I were in a district where all the kids improved a great deal, not just the ones in my class. At that point, it’s probably something inherent in the district, rather than my teaching.
Thankfully, there’s a statistical tool that can get around many of these issues. It’s called hierarchical linear modeling (HLM), and it’s how educational policy experts look into whether your kid’s teacher is doing a good job. The gory mathematical details are a little complicated, but here’s the idea: your kid is in a classroom. That classroom is in a school. That school is in a district. On a test, if we see that all of the kids in one classroom did well, but the kids in the other classrooms did poorly, we might logically assume that the teacher in that classroom had something to do with it (or got unbelievably lucky). HLM has a way to parcel out not only how much of the variance can be explained by each level, but also estimate exactly how much of an effect having that teacher/being in that district had. Well, why not substitute "catcher" for "teacher" and see what happens?
I located all teams from 1989-2008 who had a catcher on their roster who was age 32 or older (on Opening Day) and who caught at least 360 innings (40 games) during that season. Just being on the roster was sufficient, as the idea tends to be that the catcher will counsel the baby pitchers in between innings or games, whether or not he actually plays. If a team had more than one catcher/mentor, I took the elder of the two. Next, I looked for all pitchers on those teams who were age 27 and under (again, as of Opening Day), faced a minimum of 250 batters within the season, and did not switch teams during the season.
Warning: the following technobabble is given first in Slavonic, then in English. Viewer Discretion is advised.
I then ran a hierarchical model that pulled apart the contribution of the catcher/mentor just being there. The gory details, for the interested: I used the pitcher as the subject, age during the season as the index/repeated variable, and an auto-regressive first order – AR(1) – covariance matrix. I used walk rate as my first dependent variable and set the intercept to vary randomly. I set the identity of the catcher/mentor as a fixed effect and asked for parameter estimates for each qualifying catcher. A catcher had to appear as a mentor in ten player-seasons to qualify. I ran separate models for walk rate (BB/BFP) and strikeout rate (K/BFP) as the dependent variables.
Translation: I wanted to find out what effect the catcher/mentor had, if any, on the strikeout and walk rates of the young pitchers on his team. If a pitcher comes into MLB already striking out 20 percent of the batters that he faces, the model will know that because of the AR(1) covariance matrix. Since we have a longitudinal model (a pitcher can be in it for his age-24, -25, and -26 seasons), we can see how pitchers change over time. This is the tweak that allows us to correct for that and not to over-credit the catcher. The fixed effect for a catcher sounds a lot more complicated than it is. If you’ve run a simple linear regression, it’s a fixed effect regression. I want to know what the coefficients are for "catcher = Jason Kendall" or "catcher = Ivan Rodriguez." If the catcher is having an effect across all (or most) of the pitchers with whom he interacts, then it will show up here.
The output that this particular configuration gives can be read like this. Let’s say that I took an average pitcher from this group (broke into the majors before 28, faced 250 batters during the season). If all I knew was the identity of his catcher/mentor, what would I predict the pitcher’s walk and strikeout rate to be? The results, from 1989-2008:
Best for strikeouts Worst for strikeouts Jason Varitek 21.57% Jeff Reed 13.60% Joe Girardi 19.52% Chad Kreuter 14.71% Bengie Molina 19.05% Carlton Fisk 14.68% Henry Blanco 18.73% Gary Bennett 14.88% Jason Kendall 18.71% Lance Parrish 15.21%
Best for walks Worst for walks Mike Redmond 7.67% Javy Lopez 10.64% Gary Bennett 8.13% Bengie Molina 10.06% Gregg Zaun 8.15% Jeff Reed 9.93% Jason LaRue 8.22% Chad Kreuter 9.91% Jason Kendall 8.30% Henry Blanco 9.90%
I also included a category for a pitcher who had no catcher/mentor as a baseline. That is, a pitcher whose team didn’t employ a catcher above the age of 32. That group had anticipated rates of 9.21 percent for walks and 16.27 percent for strikeouts. Jason Kendall appears on both lists in the best-of column, which suggests that, over time, his presence has made the pitchers with whom he has worked better. Luke Hochevar and Kyle Davies will be happy. (Kendall will also catch Zack Greinke, who technically fits the "under 27" category, but I think Greinke’s doing OK for himself.) Pudge Rodriguez, on the other hand, checked in with a walk rate of 9.29 percent and a strikeout rate of 17.05 percent. The strikeout rate was good enough for 13th place (of 38 catchers and the "blank" category.) The walk rate was below average. While he may carry a good reputation as a good catcher/mentor, the numbers just don’t bear it out.
The free-agent catcher no one has mentioned makes an interesting guest appearance: Mike Redmond. He appears to be very good at teaching pitchers how to not walk batters. He’s never been a gifted hitter, but when the other catcher on your team is name Joe Mauer, it’s not that big a deal. Redmond turns out to be in the middle of the pack in reducing strikeouts, but he might make a good pick for a team that’s worried about its young pitchers issuing too many free passes.
A few words on taking those numbers and running with them: first, consider that we are dealing with a very small sample size. Putting faith in 12 pitcher-seasons worth of data is always a little dubious. Standard warnings about small sample sizes apply, both in terms of the confidence to place in those numbers and the scale of the numbers. According to these numbers, Jason Kendall’s presence is worth a one percent drop in walk rate for the young pitchers with whom he works, and a 2.5 percent jump in strikeout rate. Intuitively, it seems like a lot to ask. Indeed, highly structured models can get a little wacky when faced with small sample sizes.
Second, there’s a giant confounding variable. One reason that Mike Redmond might be such a powerful force for reducing walks is that "Mike Redmond after age 32 working with young pitchers" is a decent proxy for "the young kids that the Twins have brought up over the past few years." It’s hard to disentangle those variables. The model assumes that young pitchers are assigned randomly to the various catchers, which is absurd. Some teams do a better job stocking up on young pitching talent. Others draft and promote based on a statistical profile. The Twins in particular brought up Nick Blackburn, Scott Baker, and Matt Garza (before the trade), all of whom have relatively low walk rates. Is that Redmond’s influence, or is the model over-crediting him for the decisions of Twins management? It’s hard to tell.
Finally, I ran another HLM model, asking this time for the computer to tell me what percentage of the variance in the model was accounted for by the identity of the catcher/mentor. (For the initiated, I switched it over to a random effect.) The answer turned out to be a little bit less than one percent. However, the model suggested that about 40 percent of the variance was due to the pitchers’ own abilities. Presumably, the rest can be explained by extra-model variables and random error. So, even if Jason Kendall is as good as advertised by the model, his contributions are only a small part of the equation in helping young pitchers to develop.