Happy Thanksgiving! Regularly Scheduled Articles Will Resume Monday, December 1
May 11, 2011
Prospectus Hit and Run
Hot Starts Revisited (Call the Doctor)
This past weekend in Chicago, I bent elbows with several fellow baseball writers at Miller's Pub, a venue lined with old baseball and celebrity photos, chosen because it was a favorite haunt of the late Bill Veeck. Among those I had the opportunity to talk to was Baseball Prospectus co-founder Rany Jazayerli. Quite rightly, BP's resident dermatologist/Royals fan was buzzing about his team's decision to recall elite hitting prospect Eric Hosmer. "Hosmer is now a Royal, the future is now the present, and The Process is now being judged by results at the major-league level," wrote Rany at his blog earlier that day.
As our discussion turned to the Royals' last winning season (2003), I brought up my recent citation of a favorite work of Rany's, namely his "Hot Starts" series (Part I, II, and III), which I'd mentioned during a couple of recent radio hits. In that series, Rany celebrated the 2003 Royals' 14-3 start and attempted to answer the question he posed in Part I's subtitle, "Should Royals Fans Get Excited Yet?" He pored over nearly 70 seasons worth of data (1930-1999, less the 1981 and 1994 strike seasons) in an attempt to determine the point at which a strong burst from the gate attains significance, and found that the 30-game mark represented a useful milestone:
With every team having passed the 30-game mark sometime in the past week, it's a fine time to break out the doctor's kit and examine this year's slate. What Rany found via a series of regressions is that a team's final winning percentage for the current season could be projected with some accuracy via a two-step process involving their record thus far as well as those of the previous three seasons.
For the first part, a team's final winning percentage (P) for the current season as projected at the outset of the year, the formula is P = .1557 + (.4517 * X1) + (.1401 * X2) + (.0968 * X3), where X1 is last year's winning percentage, X2 is that from two years ago, and X3 is that from three years ago. Also incorporated into the mix is a .500 record, which accounts for what Bill James called "the Whirlpool Effect," the tendency of teams to be drawn towards .500 over time like water drawn to a drain. That tendency, represented in the first term of the equation, accounts for 31.14 percent of a team's projected winning percentage. Last year's winning percentage is weighted at 45.17 percent, the percentage from two years ago is weighted at 14.01 percent, and that from three years ago at 9.68. A team playing .600 ball across the previous three seasons would be pulled towards .500, expected to win at a .569 clip.
For the second part, the team's projected year-end winning percentage (Y) as seen from a given point in that season, the formula is Y = P + ((S - P) * (.0415 + (.0096 * G))), where P is the aforementioned projected winning percentage, S is their actual winning percentage at the point in question, and G is the total number of games played to date—basically, the projected percentage plus an over/underachievement factor whose impact increases with the number of games played. If a team initially projected to finish at .500 plays at a .600 clip through the first 30 games, they would thus be expected to finish at .533 (that's .500 + ((.100 * (.0415 + (.0096 * 30))). If they're still playing .600 ball at the 40-game mark, the final expected percentage bumps to .543; at the 50-game mark, it becomes .552.
Bearing all of that in mind, here are the projected final standings based on the previous three-years-plus records through Monday:
The system projects all of the teams currently atop their divisions to end up winning them, a surprising enough result (we'll get to some history momentarily). That is to say, their strong starts are significant, a finding which should particularly buoy Indians fans given their team's sprint from the gate. Not only are the Indians overachieving by the widest margin of any team (214 points), they're doing so in a division that also features the majors' two biggest underachievers, the Twins (-181 points) and White Sox (-133 points). While the Tigers are hitting their mark exactly, the Royals don't appear to have the juice to escape another sub-.500 season despite overachieving by the AL's second-highest margin (86 points). Of course, this method takes no notice of personnel changes, and while the Hosmer move is something to get excited about in the grand scheme, it's worth noting that PECOTA's 2011 outlook for Hosmer (.274/.329/.443) is less sunny than that for the man he replaced, Kila Ka'aihue (.263/.387/.472), in case you're waiting for the Royals' Depth Chart-based Playoff Odds to leap skyward.
In the AL East, the system suggests not only that the Yankees will hold their spot atop the division, but that the Rays will claim the Wild Card by staving off the Red Sox, who dug themselves a 2-10 hole and have yet to spend a day at .500, let alone above it. In the AL West, the Angels appear poised to return to the catbird seat they occupied five times in a six-year span from 2004-2009, with the Rangers returning to earth in large part due to Josh Hamilton's injury, Adrian Beltre's struggles, and the inability of Colby Lewis to live up to last year's strong showing.
Turning to the NL, note that while nobody is actually surprised that the Phillies lead the East given the strength of their rotation, they nonetheless rate as the league's most overachieving team, exceeding their expected winning percentage by 115 points. In the only deviation from the current standings, the Braves are expected to surpass the Marlins for the NL Wild Card spot, though it's worth noting that the final three-point gap between the two teams is only about half a game. In the Central, the Cardinals are overachieving to a greater extent than the defending champion Reds, while the Brewers have dug themselves a significant hole and rate as the league's third-worst underachievers besides the Astros, who still have that 86-win season from 2008 propping them up, and the Padres, who are burdened by the expectations produced last year when Adrian Gonzalez still wore their colors. The NL West appears to have the majors' tightest race on its hands, with the Rockies rating as the slight favorite—in this context, at least—over the defending world champion Giants.
Given that it's been eight years since Rany cobbled this system together, and 11 since any new data went into it, it's fair to ask how well the system works. To evaluate this, I calculated pre-season projected winning percentages (P, from above — wait, that sounds even kinkier than the stuff of Emma Span's debut) for every team going back to 2003, which is to say, using full-season winning percentages back to 2000. I then delved into Baseball-Reference's historical standings for the years 2003-2010 to find the best match for the current point in the schedule. Through Monday, a total of 518 games had been played, 21.3 percent of the season; the dates I selected featured totals ranging from 515 to 525 games, with an average of 520, 21.4 percent of the schedule—roughly 35 games per team.
I then tallied the number of correct playoff predictions—ignoring the distinction between Wild Card and division winner, because it's a moot point come October—via three methods: the as-is standings on a given date, the preseason projected winning percentages (Rany's method, not PECOTA's), and the in-season projections produced by the method. The results show the power of a system that can account for multiple types of information:
Half of the teams in position to reach the playoffs at a point in the season comparable to this one did so, though the performance was uneven from year to year, ranging from a low of two teams (2008) to a high of six (2003). The correlation between those teams' records at that juncture and their final records was a strong .66. That's better than what was predicted using only the preseason projections, which weren't even right half the time, and which ranged from a low of one (2007) to a high of six (2003 and 2009); not surprisingly, the more recent data about a team's performance does have value. When the preseason projections and records at the 35-ish game mark are combined via Rany's method, the rate of correct projections improves, as does the correlation.
The method is hardy foolproof; based upon track record, we should expect 4.5 of the eight teams in line for the playoffs to hold serve, but it's still more accurate than a simple eyeballing of the standings, or reliance solely upon recent history. Considering we're not yet at the one-quarter mark of the season, that's not too shabby at all.