“My own view is that the SABR mania (and I speak as a member of the organization) has gone out of control. The Bill James approach, of cloaking totally subjective views (to which he is entirely entitled) in some sort of asserted ‘statistical evidence,’ is divorced from reality.”
– Leonard Koppett, The Thinking Fan’s Guide to Baseball

As promised last week, in today’s column (split into two parts) we’ll report on the most interesting research presentations (assuredly full of thinking, not divorced from reality) that yours truly attended at the 36th annual Society for American Baseball Research (SABR) convention in Seattle last week, where over 500 researchers, scholars, and baseball fanatics gathered. For a recap of the convention as a whole, including the fine panel discussions such as the one by members of the Seattle Pilots, former Pacific Coast League players, and the upcoming collective bargaining agreement, you can check out the reports filed on my blog last week from the Emerald City.

But as they say on late night television infomercials, that’s not all! In the finale we’ll conclude the convention review and then rethink a few bits of last week’s column regarding salary variation within payroll.

Statistical Evidence and Reality
The convention was held from Wednesday night through Saturday, but really got rolling on Thursday morning. As is typical, Thursday and Saturday were reserved for research presentations, with a few committee meetings sprinkled in.

SABR has over 7,000 members whose interests range from understanding and preserving the history of nineteenth century baseball to exploring the physics of baseball, and yes, even quantitative analysis. As a result, there were nearly fifty presentations given overall and their topics covered the gamut of SABR activities. Because the presentations ran concurrently, one person could not have taken them all in. So what follows are the handful of presentations with a more quantitative bent that I was able to attend and found interesting enough to discuss a bit further. Of course, this list shouldn’t be construed as of the only studies presented worthy of further conversation–and let me also note in passing that far from cloaking subjectivity in a blanket of numbers as the late Mr. Koppett suggested, I found that all of those individuals presenting studies truly exemplified the spirit of scientific inquiry in addition to being both gracious and open-minded.

So with that, let’s work our way through the first three presentations.

The Effect of Managers
One of the presentations I found interesting for a variety of reasons was given by Chris Jaffe (no relation to our own Jay Jaffe) and was titled “Evaluating Managers: Which Men Get the Most and Which Get the Least out of Their Players.” One of the reasons it attracted me is because its conclusions are diametrically opposed to those found by James Click in the chapter “Is Joe Torre a Hall of Fame Manager?” in Baseball Between the Numbers (BBTN).

In short, Jaffe analyzed five components of individual and team play from 1894 to 2001 and credits managers with both positive and negative changes in each of these components. They included looking at improvement and decline in individual hitters and pitchers under each manager using the algorithm Phil Birnbaum created for his SABR35 presentation (warning: zip file): Pythagorean versus actual record, team runs scored compared to that predicted by their offensive elements, and team runs allowed compared to the aggregate statistics of their pitchers.

What he finds is that managers who managed more than 2,000 games show improvement in each of these categories with decreasing improvement for those managing 1,000, 500, and fewer games, respectively. Click, on the other hand, concluded using correlation coefficients for individual and groups of seasons that managers had no impact on individual batter performance or on differences between actual and Pythagorean record. And while Jaffe assigns player improvement or decline to the manager (although admittedly while ignoring players who player their entire career or the majority of it under a particular manager), Click uses in-season performance to determine whether managers, through the judicious allocation of roles and evaluation, eke out greater performance. He concludes that batter performance “over the course of a season show virtually no correlation to the same manager in subsequent seasons” and that there is no advantage held by more experienced managers.

Jaffe then breaks down each category before adding them all back up again. His top and bottom 10 as published in part 1 of the study include:

Top 10 Managers (1894-2001)
1. Joe McCarthy     +1798.24
2. Al Lopez          +991.91
3. John McGraw       +954.87
4. Bill McKehnie     +877.55
5. Tony LaRussa      +775.04
6. Billy Southworth  +761.57
7. Walt Alston       +727.34
8. Earl Weaver       +700.61
9. Miller Huggins    +677.65
10. Billy Martin     +646.87

Bottom 10 Managers (1894-2001)
1. Connie Mack       -891.63
2. Jimmie Wilson     -845.92
3. Art Fletcher      -519.13
4. Don Baylor        -506.51
5. Johnny McCloskey  -484.56
6. Billy Meyer       -454.56
7. Rogers Hornsby    -439.76
8. Fred Tenney       -432.74
9. Zach Taylor       -378.05
10. Lee Fohl         -367.18

The values in the previous table are given in terms of total runs. This would mean that McCarthy was worth about 180 wins over the course of his 24 year managerial career, which would equate to 7.5 wins per season or as much as a truly elite player. Obviously, this is absurd, and Jaffe goes to great pains to explain the weaknesses of his approach, the most important being that managers of good teams like McCarthy benefit disproportionately and that in fact rather than looking at these numbers as the number of runs somehow contributed by the manager, we can instead view them simply as the cumulative changes in the five components for teams managed by these individuals.

In the end, however, the primary weakness of Jaffe’s study, as he freely admits in the published version, is that he assumes that managers have an impact and therefore assigns the credit or blame to each manager without providing evidence that the man at the helm is a key causal component. As a result, the analysis Click performs by first attempting to establish a basis for Jaffe’s assumptions in various categories, to me anyway, is the better approach. And yet, there is something tantalizing in Jaffe’s lists since in many cases they pass the “reasonableness test.”

But the primary reason to highlight this presentation is that it illustrates that quantitative (and often qualitative) evaluation of managers is clearly a topic on which there is wide divergence. And where there is divergence of opinion there is always opportunity for more information to help us see a little more clearly.

The Walk Year
Phil Birnbaum is the editor of the SABR Statistical Analysis committee’s newsletter By The Numbers. His presentation at last year’s convention was the inspiration for my look at so-called “lucky” and “unlucky” teams published in The Hardball Times Baseball Annual.

Well, this year Birnbaum took a look at the performance of players in their “walk year” (link is PowerPoint file), in a presentation titled appropriately enough “Do Players Outperform In Their Free-Agent Year?” We can all point to cases where players clearly appeared to step it up just before they became eligible for the big contract such as John Burkett‘s 2001 campaign or Jack Clark‘s 1987 season. We can also point to players who fell flat (Jeff Fassero in 1999 and Terry Pendleton in 1990); Birnbaum does no such thing, as he undertook a more systematic study of the issue.

He applied the weighted performance algorithm he developed for last year’s presentation and looked at all walk years from 1977 to 2001. What he found can be summarized in the following table:

Hitters (1977-2001)
Category                          Avg Run Differential
All                                   -0.1
300+ Batting Outs                     +1.9
Above Normalized to 400 Batting Outs  +2.2

Pitchers (1977-2001)
Category                          Avg Run Differential
All                                   -0.2
100+ Innings                          +0.6
Above Normalized to 200 Innings       -1.1

A positive value for both hitters and pitchers means that they improved in their walk year.

As you can see, these differences are not very large. In fact, even though hitters seemed to improve slightly while pitchers declined, Birnbaum noted that the test of statistical significance was not met and so the results are in fact indistinguishable from zero. In short, he found no measurable effect of walk year on performance. This result is also in opposition to Dayn Perry’s conclusion in the chapter “Do Players Perform Better in their Contract Years?” of Baseball Between the Numbers where he found that players perform slightly better in their contract years than they do in the two proceeding seasons.

The major difference between Perry’s study and Birnbaum’s is that while the former used 212 “prominent” free agents the latter used all free agents, both of which could bias the results in different ways. Choosing prominent names may lead to selection bias where better players are chosen subconsciously, biasing the results upwards, whereas including all free agents incorporates marginal players who sign a string of one-year contracts, not really players expecting a big payoff. But of course the differences here are not that large to begin with as Perry’s results indicate at most a 5 run improvement in the walk year, subject to a bit of mitigation because of playing time differences. Other studies cited by Birnbaum also found no measurable effect in walk year performance.

In the end the results are about what you’d expect intuitively. Major League Baseball is so competitive at all times that one would assume that it is rare indeed if a player can turn it on one season and turn it off the next. With the average Major League career historically spanning just over four years, players are aware of just how quickly they can be out of baseball if their performance slips. So if there are indeed players with the “talent flexibility” to perform at will, they are likely few and far between.

Missed it by That Much
Our own Keith Woolner has done plenty of analysis on how catchers affect, or don’t affect, the outcome of games with their game calling ability. And yet in the press box as I peruse the game notes that a team’s PR department prepares before each game, I’ll almost always see wasted space noting how the team’s pitchers performed with each of their catchers behind the plate.

Why not evaluate catchers in other ways?

That’s the basic question Sean Forman of Baseball Reference sought to answer in his talk “Better Defense Through Bruising,” where he examined the relationship between missed pitches (defined as wild pitches plus passed balls not counting dropped third strikes) and catchers. Using missed pitches has the two-fold advantage of crediting catchers who not only give up fewer passed balls but help their pitchers by being able to block the plate, and removing the variation in how official scorers rule on such events.

To provide a basis for evaluation, Forman used linear regression to create equations that predict the number of missed pitches for knuckleballers and non-knuckleballers (since missed pitch rates are obviously much higher for the likes of Hoyt Wilhelm and Charlie Hough) using their strikeout, walk, and hit batsmen rates as well as their missed pitch rate as inputs. The idea being that the equations can be used to effectively remove the influence of the actual catchers who caught each pitcher and therefore predict the number of missed pitches that should occur when a particular pitcher is on the mound. That prediction can then be used as a tool for comparing what actually happened when the catcher was behind the plate for each pitcher and thereby crediting the catcher accordingly.

After doing the calculations, he comes up with the following best and worst catchers over the course of their careers (using Retrosheet data encompassing 1957-2005):

Name                G    saved MPrate
Brian Downing     659         1.00
Bruce Benedict    959         0.94
Sherm Lollar      648         0.85
Mike Piazza      1384         0.71
Charles Johnson  1018         0.68


Name                G    saved MPrate
Dick Dietz        521        -1.33
J.C. Martin       658        -1.21
Bob Brenly        685        -0.80
Mike Macfarlane   954        -0.76
Junior Ortiz      668        -0.63

Here, MPrate is the number of missed pitches per 100 opportunities (defined as plate appearances with runners on base). Once again, this passes the test of reasonableness although some were surprised that Mike Piazza would rate so highly. On the other end of the spectrum Steve Treder (who placed second in the annual trivia contest) commented during the presentation that for those like himself who watched Dick Dietz play, his standing comes as no surprise whatsoever. Incidentally, Bob Uecker walked away with the worst single-season mark at -32.95 saved missed pitches for his 1967 effort for the Phillies. Forman ran some correlations and found the year to year correlation coefficients in MPrate to be around .40 even when catchers changed teams, indicating that there is indeed reason to believe that his study reflects a true ability.

The surprise of seeing Piazza on the list was echoed when Forman showed the list of active catchers who are the worst which included Benito Santiago (“tops” on the list at -0.34) and Ivan Rodriguez (-0.19 good for 3rd). What this illustrates is that rather than thinking of catchers as either good or bad based only on their more visible ability to throw out base stealers, their skills can be evaluated separately (another example might be fielding bunts) much like those of hitters where power, plate discipline, and contact are all looked at individually. That said, when evaluated in the context of run expectancy, good catchers save their teams an average of four runs per year while bad ones cost their team an equivalent amount. Of course, that’s about a quarter to a third the magnitude of the impact a good catcher has on the running game just in terms of throwing out runners.

In the end, this study (which you can review online) is a great example of using play by play data to help quantify what was formerly a pretty subjective topic, which is probably why Forman walked away with the Doug Pappas Research Award.

More to Come

In the interests of space we’ll let you chew on these for awhile and tomorrow review three more presentations, along with a rethinking of salary variation within payroll.