CSS Button No Image Css3Menu.com

Baseball Prospectus home
  
  
Click here to log in Click here for forgotten password Click here to subscribe

World Series time! Enjoy Premium-level access to most features through the end of the Series!

No Previous Article
<< Previous Column
Premium Article Lies, Damned Lies: Usi... (10/14)
Next Column >>
Premium Article Lies, Damned Lies: Bei... (10/27)
No Next Article

October 21, 2004

Lies, Damned Lies

Comparables

by Nate Silver

the archives are now free.

All Baseball Prospectus Premium and Fantasy articles more than a year old are now free as a thank you to the entire Internet for making our work possible.

Not a subscriber? Get exclusive content like this delivered hot to your inbox every weekday. Click here for more information on Baseball Prospectus subscriptions or use the buttons to the right to subscribe and get instant access to the best baseball content on the web.

Subscribe for $4.95 per month
Recurring subscription - cancel anytime.


a 33% savings over the monthly price!

Purchase a $39.95 gift subscription
a 33% savings over the monthly price!

Already a subscriber? Click here and use the blue login bar to log in.

I feel guilty writing anything these days that doesn't involve the postseason, but my reactions this time of year tend to stray far from the realm of objectivity and well into the territory of shock (Phil Garner did what?) and awe (Johnny Damon did what?). Let's take a quick break from all the catastrophe and success and answer a PECOTA question that I've been long overdue in addressing.

S.B. writes:

I look and wonder why Curt Schilling is nowhere to be found on your Ben Sheets Similarity Index. And then look at Brad Lidge's Index and see a guy at the top who had a 5.87 ERA, a 2:1 K/BB ratio and a 6.74 K/9 and I have to wonder about what's going on?
  1. PECOTA comparables do not take into account performance in the current season, but rather the three previous seasons.

    Sheets had a promising campaign a year ago, and PECOTA assigned him a 19% chance of having a breakout, finding favorable comparables in folks like Robin Roberts, Dave Stieb and Mike Mussina. Still, the system expected baby steps forward, rather than the quantum leap that Sheets took this year, one that is succinctly expressed in two numbers: 264 strikeouts, 32 walks.

    The problem is that PECOTA is a predictive tool and not a retrospective one. It selected Sheets' comparables based on the pitchers whose performance who most similar to his track record heading into the season. That suggested a pitcher, like Mussina, who was pretty durable and had some pretty good strikeout and walk numbers, but not a pitcher who had reached the elite circle occupied by folks like Schilling, Bret Saberhagen and so forth.

    A similar argument holds for Lidge. Yes, the 10.27 K/9 that Lidge managed last year was impressive. It is nowhere near the 14.93 K/9--Fourteen Point Nine Three!--that Lidge posted this year. Players whose performance departed radically from PECOTA's expectations, especially in a key category like pitcher strikeouts, are likely to draw a very different group of comparables when the 2005 PECOTAs are posted this winter.

    Similarly, the years listed next to a player's comparables reflect their performance heading into the year in question. So, when PECOTA lists Dan Naulty, 1997, as Lidge's best comparable, it means that it thinks that Naulty's performance in 1997 would tell us something about how Lidge would perform in 2004. In 1996, Naulty posted a 3.79 ERA against a league average of 5.15--similar to the 3.60 ERA that Lidge recorded last season against a league average of 4.61. This particular comparison turned out not to be prescient--Naulty regressed in 1997, and never had much of a career, while Lidge took a big step forward--but the pitchers looked pretty similar heading into their age 27 seasons.

  2. All statistics are adjusted for league context. The league context has changed more than you'd think.

    As with most all of our statistical tools, PECOTA adjusts all its statistics for park and league averages. This is crucially important for a prediction engine. It wouldn't be very fair to compare a hitter who hit 35 homers playing his home games in Coors Field in 2003, to one who hit the same number as an Astro in 1972.

    Let's take a look at Lidge and Naulty once again. Here are their performances in their previous seasons in three critical categories--strikeout rate, walk rate and home run rate--compared against league averages:

    
                     K/G         BB/G       HR/G
    2003 NL          6.59        3.40       1.10
    1996 AL          6.20        3.80       1.21
    
                     K/G         BB/G       HR/G
    Lidge 2003       10.27       4.48       0.76
    Naulty 1996      8.84        5.52       0.80
    
    Indexed to League Average (100 = average)
    
                     K/G        BB/G        HR/G
    Lidge 2003       156         132          69
    Naulty 1996      143         145          66
    
    Naulty struck out fewer batters than Lidge did, and walked more. He also pitched in a league environment in which walks were more common and strikeouts less so. When adjusted for league averages, their rates look much closer.

    This is particularly important to consider in the case of categories such as home runs and pitcher strikeouts, which have grown significantly in frequency over time. In the 1951 American League, for example, pitchers averaged 3.72 strikeouts per game and 4.00 walks. A Red Sox pitcher named Mickey McDermott struck out 6.64 batters per nine innings, a figure less than the National League average this year, but it was good enough to lead the league. McDermott's 127 strikeouts and 92 walks in the 1951 AL would translate to about 225 strikeouts and 78 walks in the 2003 NL.

  3. The small stuff adds up.

    While PECOTA places the most emphasis on factors like strikeouts, walks and isolated power, it considers 13 categories total for pitchers and 12 total for hitters. Dan Naulty was 6'6" and weighed 210 pounds. Brad Lidge is 6'5" and weighs 200 pounds. Both Naulty and Lidge had about one full major-league season under their belts heading into their age-27 years. They also had similar groundball/flyball tendencies. These aren't the most important factors that PECOTA considers, but they do have an influence on a player's comparable list, in an amount proportional to the predictive value of the statistic in question. Because we don't have a firm memory on some of these secondary characteristics for older players--Naulty, for all I remember, could have been built like Cliff Politte--they can sometimes have an unexpected and even counterintuitive impact on a player's comparables list.

  4. Comparability is relative.

    It was relatively easy to identify comparables for Lidge and Sheets heading into the season. Lidge had five cohorts who registered a score of 50 or higher, which translates as "extremely comparable" in the PECOTA system, while Sheets had seven.

    Both pitchers, by virtue of their outstanding performances, are going to find things a little bit lonelier next time around. Lidge's 2005 forecast is likely to look quite a bit like Eric Gagne's forecast this year. Gagne had no comparables with a similarity score of 50 or higher, and just three with a similarity score of 30 or higher. There aren't many pitchers who put up numbers like Gagne's.

    What does PECOTA do in situations like these? Well, it does the best it can. Gagne's third best comparable is the 2000 version of Pedro Martinez, who was then long removed from his tenure in the Dodgers' bullpen. Would you rather compare Gagne to another relief pitcher? All else being equal, sure. But Pedro's performance certainly tells us more about Gagne than, say, Bobby Thigpen's. When a player is truly unique, PECOTA sacrifices the small stuff in an effort to get the big stuff right.

    Of course there are a few players--like that guy in San Francisco--for whom finding appropriate comparables is impossible. Not only is the small stuff wrong, but the big stuff is wrong too. In those cases, PECOTA makes like a drunken frat boy and lowers its standards. In the case of Bonds, for example, it's willing to bed pretty much any 40-year-old with some power and some plate discipline. That isn't an ideal solution--Bonds is quite a bit different from someone like Edgar Martinez, his fourth-best comparable--but it's still better than comparing a player to everyone in his age cohort, which is what other projection systems do. Bonds isn't Edgar Martinez, but he sure as hell isn't Brett Butler.

Thanks for the question, S.B.

Nate Silver is an author of Baseball Prospectus. 
Click here to see Nate's other articles. You can contact Nate by clicking here

Related Content:  The Who,  Strikeouts

0 comments have been left for this article.

No Previous Article
<< Previous Column
Premium Article Lies, Damned Lies: Usi... (10/14)
Next Column >>
Premium Article Lies, Damned Lies: Bei... (10/27)
No Next Article

RECENTLY AT BASEBALL PROSPECTUS
Pebble Hunting: An Illustrated Guide to the ...
Baseball Therapy: The Truth About Butterflie...
Pitching Backward: How To Get A Hit Off Madi...
Minor League Update: Games of Tuesday, Octob...
Playoff Prospectus: PECOTA Odds and Game Two...
Moonshot: The Royals, the Strike Zone, and a...
Playoff Prospectus: The Other Royals: World ...

MORE FROM OCTOBER 21, 2004
Premium Article Prospectus Today: Making History

MORE BY NATE SILVER
2004-11-11 - Premium Article Lies, Damned Lies: Superstars, All-Stars and...
2004-11-04 - Premium Article Lies, Damned Lies: Fresh Blood
2004-10-27 - Premium Article Lies, Damned Lies: Being on the Brink
2004-10-21 - Premium Article Lies, Damned Lies: Comparables
2004-10-14 - Premium Article Lies, Damned Lies: Using the Golden Run Rati...
2004-10-07 - Premium Article Lies, Damned Lies: So You Think You Know Bas...
2004-10-06 - Premium Article Playoff Prospectus: Atlanta Braves vs. Houst...
More...

MORE LIES, DAMNED LIES
2004-11-11 - Premium Article Lies, Damned Lies: Superstars, All-Stars and...
2004-11-04 - Premium Article Lies, Damned Lies: Fresh Blood
2004-10-27 - Premium Article Lies, Damned Lies: Being on the Brink
2004-10-21 - Premium Article Lies, Damned Lies: Comparables
2004-10-14 - Premium Article Lies, Damned Lies: Using the Golden Run Rati...
2004-10-07 - Premium Article Lies, Damned Lies: So You Think You Know Bas...
2004-09-29 - Premium Article Lies, Damned Lies: A Hall of Famer
More...