February 7, 2008
The Toughest of Them All?
"Never make predictions, especially about the future."
While Punxsutawney Phil has spoken--or rather pointed--to inform us that we have six more weeks of winter, we can see the light at the end of the tunnel: pitchers and catchers report in a just a few more days. Not surprisingly, at this time of year the obsessive baseball fan begins to grow restless, becoming more than a little bored with the endless speculation around the last few free agent signings, trade packages which are typically more fantasy than reality, not to mention just who did or did not inject who (or is it whom?). This week, I offer two topics that occupied a little of my time this week between enjoying the first four games of the 1975 World Series on DVD.
The Toughest Division in Baseball
Amidst the two major deals of this offseason a common theme was repeated ad nauseam. In the trade that sent Dontrelle Willis and Miguel Cabrera to Detroit in exchange for Cameron Maybin and Andrew Miller (among others), it was often said that the "toughest division in baseball" (meaning the AL Central) just got tougher. And when Johan Santana was dealt from the AL Central Twins to the NL East Mets, pundits from around the world of baseball were opining on how dominant Santana might be now that he's freed from the "toughest division in baseball" on his way to the much weaker National League. To cite just one example of this latter hype, consider this quote from USA Today reporter Bob Nightengale, who said last week on MLB Radio:
Santana won two Cy Young Awards and he was doing it in the AL Central. I think when he goes to the National League he's got a chance to put up Bob Gibson type numbers... You're going from the AL Central, the toughest division in baseball, just to the National League... he has a chance to have an ERA of low twos if not below two.
Fortunately, calmer heads sometimes prevail and our own Nate Silver provided Santana's PECOTA projections for both teams and reveals that the difference, although noticeable, will not turn Santana into the reincarnation of Bob Gibson circa 1968.
All this talk about the toughest division in baseball got me to wondering just which division, when we look at on-field performance, actually has been the toughest in baseball. To do this, and to crown the toughest division for each of the last eleven years, we'll look at two simple measures: intradivisional records by league, and interleague records overall.
To begin, let's take a look at the National League intradivisional records since 1997 in graphic form:
Figure 1. National League Division Strength 1997-2007
Here you can see that the NL West and NL East beat up on the NL Central in 1997, although in 1998 and 1999 the Central held its own against the East, even edging them out slightly in the latter season. From there, the Central became the doormat of the league; collectively, its teams only approached the .500 mark in 2004 and 2007 in this millennium. During that time, the West was dominant from 2001-2003 and again in 2007--led by the Diamondbacks and Rockies--while the East took the title in 2000 and from 2004-2006. If we break the time frame into three periods, we see the East as dominating in the 1997-2000 and 2005-2007 periods, with the West squeaking by in 2001-2004:
Table 1: National League Intradivisional Results Period East West Central 1997-2000 .518 .494 .489 2001-2004 .506 .518 .478 2005-2007 .522 .496 .484
This probably comes as no great shock, with excellent Braves and Mets teams and the improving Phillies in the East, while there have been some pretty mediocre Astros, Cardinals, and Cubs teams in the Central. Overall the divisions can clearly be ranked East, West, Central.
On the AL side, you may find the results a little more surprising:
Figure 2. American League Division Strength 1997-2007
The AL East and West outpaced the Central in 1997, and after gaining some ground in 1998 the Central sank to rock bottom in 1999, and they would finish last in four of the next five seasons. Again, after a rise in 2005--fueled by the White Sox and the surging Indians--that saw the division best the West and another rise in 2006 due to the Tigers and Twins (with the White Sox also winning 90 games) that took them past the East. Then the East rebounded in 2007 to regain the title. This shows that in the entire eleven-year period the Central has never been the best division in the American League in terms of record in games played against other AL divisions. While that may change in 2007 with an AL Central breakthrough, as the Tigers and Indians are expected to be among the best in the league, it should serve to remind us that what you hear "ain't necessarily so."
Contrary to the popular wisdom, and once again breaking things down into three periods, we find that the AL West--led by the A's and Angels--has been the best division over the last eleven years, and paced the junior circuit from 2001-2004 and by a nose in 2005-2007. Meanwhile, the AL East was better from 1997 through 2001. But more to the point, over the last three years, no division in the AL has dominated:
Table 2: American League Intradivisional Results Period East West Central 1997-2000 .525 .502 .473 2001-2004 .489 .545 .465 2005-2007 .499 .501 .500
Despite these results, recall that this approach doesn't consider the relative strengths of the teams within the division. As such it can reasonably be argued that the current AL East is tougher than it appears, because of the presence of two elite teams that severely limits the opportunity for the rest of the division. Likewise, it can also be said that when a division like the AL Central in 2006 contains three teams with 90-plus wins, that division is indeed very competitive.
In the final analysis, perhaps all of this talk related to the AL Central being the "toughest division in baseball" is in large part a product of the fact that since 2005 (when they had the World Series winner) they've been much more competitive than at any time since the late 1990s. Sometimes the key to success is low expectations. It's also probably the case that there is some amount of the "illusion-of-truth" effect in play, where people are more likely to believe a familiar statement, and therefore repeat it, perpetuating its claims, regardless of empirical support.
We're not done yet. Since we've also had interleague play since 1997, we can combine the interleague record with the intradivision records to crown one of the six divisions as the toughest in baseball. To do this, we'll simply look at the interleague record for each season and make the simplistic assumption that the league with the best record in that season was in fact superior. By extension, the best division in the best league is therefore the mythical "toughest division in baseball." Below you'll see a result of interleague play during its history that shows the AL besting the NL in five of eleven seasons, the NL returning the favor five times, and in 2007 we saw the first tie, as each league won 126 times.
Figure 3. Interleague Play Results 1997-2007
Over the course of the history of interleague play, the AL holds the edge 1381-1323, for a winning percentage of .511, but obviously holds a large edge in the last three years due to the large imbalance in 2006. When we combine these two pieces of information, the end result is the following table:
Table 3: The Toughest Division in Baseball? Year Division 1997 NL East 1998 AL East 1999 NL Central 2000 AL West 2001 AL West 2002 NL West 2003 NL West 2004 NL East 2005 AL East 2006 AL West 2007 AL East and NL West
Overall, the AL West, AL West, and NL West each take the top spot three times, the NL East claims it twice, the NL Central once, and, as mentioned previously, the AL Central is shut out.
More Projections on the Cheap
Earlier this week Nate Silver released the PECOTAs, which I know has many of you all abuzz as you prepare for seasons both real and fantastic. Inspired by that effort, and in response to the final question in a chat a couple weeks back, I've re-worked the cheap projection system described in a column from back in November.
Some readers will recall that the system calculates a projected Normalized OPS value (NOPS/PF, where 100 is the park-adjusted league average offensive output for the year and league in question) based on the weighting of performance over the previous three seasons, regression to the mean, aging, and league difficulty. When looking at nearly 17,000 player seasons from 1903 through 2006, the correlation coefficient calculated for the comparison between the actual NOPS/PF value and the projected value was a healthy 0.64 for players who accumulated 300 or more plate appearances in a season.
The limitation of the algorithm used in that article was its basis on "projecting" seasons that had already been played, since it was developed to look at the concept of "booms and busts" in seasons already played. In other words, I couldn't use it to project the 2008 season since, somewhat counter-intuitively, it has not yet been played. A few simple adjustments now allow us to do just that, so without further ado I've run the system for 2008. Table 4 lists the top 15 offensive players in Projected NOPS/PF for the upcoming season:
Table 4: The Best of 2008? Top 15 Players in Projected NOPS/PF 2007 Results 2008 Projection Name Team Lg NOPS/PF PA Age PA NOPS/PF Albert Pujols SLN NL 133 679 27 668 138 David Ortiz BOS AL 136 667 31 679 133 Alex Rodriguez NYA AL 140 708 31 698 132 Miguel Cabrera FLO NL 130 680 24 679 131 Ryan Howard PHI NL 126 648 27 619 128 Chipper Jones ATL NL 138 600 35 536 127 David Wright NYN NL 129 711 24 687 126 Matt Holliday COL NL 129 713 27 670 125 Mark Teixeira ATL NL 137 240 27 645 123 V. Guerrero LAA AL 124 660 31 651 123 Mark Teixeira TEX AL 121 335 27 645 122 Lance Berkman HOU NL 120 668 31 645 122 Magglio Ordonez DET AL 134 678 33 616 121 Prince Fielder MIL NL 133 681 23 575 120 Barry Bonds SFN NL 138 477 42 416 120 Chase Utley PHI NL 126 613 28 654 120
There's probably not too much to be surprised about here: good hitters in 2007 are projected to be good hitters in 2008. However, in looking closer, you'll note that players in their thirties will generally be predicted to fall off a little with Chipper Jones and Magglio Ordonez (the subject of the question from our chatter) being the most likely candidates. Interestingly, Prince Fielder, despite heading into his age-24 season, also shows a projected decline primarily because of the large number of plate appearances he had at age 22 when his NOPS/PF was just 108, which tends to bring down his weighted NOPS/PF from the previous three seasons. You'll also notice that there are two projections here for Mark Teixeira because of his splitting 2007 between Texas and Atlanta. Each projection is based solely on his time played at that stop and makes the assumption that he'll be playing in the same league in 2008. When combined, they come out to a projection of 122 in 645 plate appearances. It should be noted that I'm not applying any aging curve for the plate appearance projections; it is instead simply calculated as the weighted mean of the previous three seasons.
The case of Teixeira leads directly to that of Miguel Cabrera. Here the projection is for the NL, since I don't have a database of 2008 rosters, but if we tweak the system just for him, we find that his projected NOPS/PF falls by less than a point, using the estimate that the 2008 AL will be 1.5 percent better than the 2007 NL.
Next, let's take a look at the 15 top and bottom players whose projections differ most from their 2007 performance and who accumulated 100 or more plate appearances in 2007:
Table 5: Largest Projected Differences in Performance from 2007 to 2008 2007 Results 2008 Projection Name Team Lg NOPS/PF PA Age PA NOPS/PF Diff Cody Ross FLO NL 143 197 26 201 106 -37 Milton Bradley SDN NL 139 169 29 304 108 -31 Daryle Ward CHN NL 125 133 32 187 102 -23 Ryan Braun MIL NL 132 492 23 264 112 -20 Carlos Pena TBA AL 137 612 29 386 117 -20 Ramon Castro NYN NL 119 157 31 165 100 -19 David Murphy TEX AL 120 110 25 68 101 -19 Barry Bonds SFN NL 138 477 42 416 120 -18 Cristian Guzman WAS NL 116 192 29 179 99 -17 Josh Hamilton CIN NL 120 337 26 181 104 -16 Hank Blalock TEX AL 118 232 26 432 102 -16 Jacoby Ellsbury BOS AL 116 127 23 68 101 -15 Matt Stairs TOR AL 120 405 39 410 105 -15 Jack Cust OAK AL 124 507 28 273 109 -15 Mark Teixeira ATL NL 137 240 27 645 123 -14 -------------------------------------------------------------------- Chris Woodward ATL NL 71 151 31 188 92 21 Joe Crede CHA AL 75 178 29 348 97 22 Paul Bako BAL AL 70 174 35 152 92 22 Jose Molina LAA AL 70 131 32 215 94 24 Andy Gonzalez CHA AL 69 215 25 115 94 25 Alexi Casilla MIN AL 69 204 22 111 94 25 Michael Barrett SDN NL 72 136 30 399 97 25 Shane Costa KCA AL 72 109 25 149 97 25 Alberto Callaspo ARI NL 69 156 24 98 95 26 Jason LaRue KCA AL 66 195 33 240 92 26 Ramon Martinez LAN NL 62 147 34 158 92 30 Josh Paul TBA AL 64 115 32 118 94 30 Koyie Hill CHN NL 65 105 28 70 96 31 Toby Hall CHA AL 60 120 31 226 92 32 Ben Zobrist TBA AL 52 105 26 117 94 42
At the top of our list we find Cody Ross, who put up some monster numbers (.335/.411/.653) for the Reds in 2007 despite over 300 very poor plate appearances in his career prior to 2007. His previous record--coupled with his relatively sporadic plate appearances that regress his projection pretty sharply towards the mean--still result in a respectable Projected NOPS/PF of 106. This list includes veterans like Milton Bradley, Daryle Ward, Carlos Pena, Cristian Guzman, and Matt Stairs, all of whom performed well in 2007 but also well-above the weighted performance level of their previous three seasons. That, combined with their advancing age, depresses their projections. We also have younger players like Ryan Braun, Josh Hamilton, and Jacoby Ellsbury, whose fine performances are heavily regressed to the mean because 2007 was their first in the major leagues.
On the flip side, we find a whole collection of players who performed poorly in limited playing time in 2007, but who either have little experience and so their projections are regressed up towards the mean (Alberto Callaspo and Alexi Casilla, for example) or who have some track record that makes it seem likely that they'll improve on their 2007 results (Joe Crede, Michael Barrett, Jason LaRue, Ramon Martinez).
Finally, in order to give you a feel for what these projections look like over the course of careers, I'll leave you with a few graphs of some interesting players that show their age, actual, and projected NOPS/PF using this system.
Figure 4. Magglio Ordonez, Actual versus Projected
While the projections followed his peak at age 28 and his subsequent decline, it didn't expect his monster 2007 campaign.
Figure 5. Alex Rodriguez, Actual versus Projected
A-Rod hit an early peak at 24, but then has upped the ante in two of the last three seasons, causing his projections to wiggle.
Figure 6. Andruw Jones, Actual versus Projected
Jones has been up and down a bit with his age 25 and 28 seasons being his best. He fell off dramatically last year, but the projection expects him to come back a little.
Figure 7. Gary Sheffield, Actual versus Projected
Sheffield certainly showed a meteoric rise from ages 24 through 27 and, after treading water from ages 28 through 30, enjoyed a couple of his finest seasons at ages 31 and 32. Since then his performance has predictably declined, with his projections keeping pace.
Figure 8. Torii Hunter, Actual versus Projected
Sought-after free agent Torii Hunter enjoyed his best season at age 26 although he's been on the rise since his disappointing age-27 campaign.
Figure 9. Ken Griffey Jr., Actual versus Projected
Ken Griffey Jr. had some of his best seasons before the age of 25 and, despite a steady decline from age 27 through age 32, was able to rebound at ages 33 through 35, though his projections were depressed because of his advancing age.
Let the (Exhibition) Games Begin
Despite regular snowfall in my neck of the woods, I get to take solace in the fact that it really is just a week until pitchers and catchers report, with games starting soon after. That should give me just enough time to finish the 1975 World Series.