December 2, 2009
Checking the Numbers
Knocking in Those Ducks
One of my favorite things to do throughout the baseball season is call upon the Extra Innings television package and watching a multitude of games for teams I would otherwise never watch. Last season, one of my teams was the Washington Nationals, which afforded me the opportunity to see Josh Willingham display his early-season gift for hitting solo home runs. For the first two and a half months of the season, it seemed that was the only situation in which he could launch a long ball. On June 1, Willingham had hit nine home runs, and none of them had come with a runner on base, giving him a jarring lower-third graphic: 9 HR, 12 RBI. The streak eventually extended to 11 solo home runs before he "regressed," finishing the year with 15 solo jacks out of 24 total, a rate of 62.5 percent. What made this rather curious was that, in spite of the Nationals struggling mightily as a team, their offense really could produce runs. Willingham had the benefit of on-base luminaries like Adam Dunn and Nick Johnson hitting in front of him, as well as Ryan Zimmerman.
At the time, I researched the phenomenon to find the highest percentages of solo home runs at different cut-off points. For instance, of those players with 15 or more blasts in a year, Ken Singleton has a perfect score, knocking exactly 15 balls out the yard in 1975, none of them with a man aboard. Up the minimum to 20 home runs, and Curtis Granderson's 2007 season featured 21 of his 23 homers being solo shots; Dave Winfield came in second place with 18 of his 20 being one-man shows in 1974. Of those with 25 or more home runs in a season, Toby Harrah's 1982 campaign featured a ratio of 22/25 going solo, with Bobby Bonds' 22 one-run blows out of 26 total in 1970 just a smidge behind. Getting really out there, when the minimum is set to 40 home runs, Richard Hidalgo leads the pack with a 35/44 rate in 2000; if Barry Bonds were more selfish in the 2003 season than his 35-of-45 split, he'd have another record.
The leaders in each scenario tended to lead games off, which makes intuitive sense since, aside from batting first with nobody on base, they would need the eighth- and ninth-place hitters to reach base later in games; in the senior circuit, those spots are generally reserved for the worst-hitting regular and the pitcher. This is all anecdotal and fun, but it got my mind motoring about RBI in general. Most avid fans understand the dependency of the statistic on teammates-reverting to Willingham, we would expect a player in his situation to knock in a decent amount of runners, especially given the success at reaching base of the hitters before him in the order.
To that end, Baseball Prospectus publishes the statistics OBI and OBI%. OBI measures the raw number of RBI minus individual RBI that result from home runs, and OBI% is the percentage of baserunners knocked in out of the total number of runners on base for a hitter in a given season. In 2009, the entire league averaged 13.9 percent of their baserunners knocked in, which broke down further to 5.0 percent of runners on first, 15.1 percent of runners on second, and 36.4 percent of runners on third. There were a grand total of 187,079 plate appearances, 84,248 of which came with runners on base in any of the various configurations, providing batters the opportunity for knocking in others in 45 percent of the aggregate trips to the dish. One of the uses of these stats is being able to find efficient run-producers whose abilities may be masked by batting behind poor hitters, or who hit in a batting slot less traditionally known for cranking out ribbies.
For an example of such a situation, look no further than the 2009 leaders in OBI% with at least 400 plate appearances:
Player PA OBI% Andrew McCutchen 493 19.8 Bobby Abreu 667 19.8 Howie Kendrick 400 19.8 Gordon Beckham 430 19.8 Gerardo Parra 491 19.2 Torii Hunter 506 19.2 Ryan Howard 703 19.2 Hanley Ramirez 652 19.2 Joe Mauer 606 19.2 Todd Helton 645 19.1
McCutchen hit in the leadoff spot in all 108 of the games he played in, yet he managed to lead the league in the percentage of baserunners knocked in. His more traditional RBI total of 54 comes off as solid given the partial season and lineup position, but it understates his in-season success at plating runners. If he'd been able to keep it up while batting with more runners on base in front of him-either through an improved offense or a drop in the batting order-the Bucco rookie's batting line may have been that much more aesthetically pleasing. We also break the OBI% down by specific bases, leading to another important point worth discussing after the boards:
Player R1BI% Player R2BI% Player R3BI% Ryan Howard 12.9 Yunel Escobar 25.4 Andrew McCutchen 57.5 Jim Thome 11.4 Derrek Lee 25.0 Miguel Tejada 53.7 Pablo Sandoval 10.7 Carlos Lee 24.7 Craig Counsell 53.7 Lyle Overbay 10.4 Torii Hunter 24.4 Todd Helton 50.6 Adam Lind 10.3 Joe Mauer 24.1 Four Tied* 50.0 *: Orlando Cabrera, Delmon Young, Kurt Suzuki, Pedro Feliz
Note Ryan Howard's inclusion atop the R1BI% leader board-Howard hits the ball harder than any other player in the game, and the bulk of these come on the heels of homer, which automatically plate runners. However, Howard has the benefit of three smart and speedy runners hitting before him in Jimmy Rollins, Shane Victorino, and Chase Utley. If Rollins stands on first and Howard hits a double down the right-field line, it is usually a 'gimme' for Rollins, the same of which cannot be said for, say, Jermaine Dye or Carlos Quentin, or anyone else that happened to hit in front of Thome. By calculating OBI% on the basis of total runners knocked in out of total runners on base, I fear we run the risk of over-crediting the ability of the hitter while understating the value of the speed of the baserunners. Additionally, as has been noted before, the 2007-09 Phillies are the most efficient base-stealing team in major-league history, making the pre-Howard hitters more than capable of putting themselves in a better position to score on one of his hits. For Willingham, teammates like Dunn and Johnson might have reached base aplenty, but they're also famously lead-footed; with one of them on second, a single to the outfield is in no way a sure thing to result in a run scored-or an RBI.
Kicking the concept around in a conversation with Christina Kahrl led to the idea of calculating OBI% a bit differently, by taking the total number of plate appearances in which a baserunner scored and dividing that figure by the total number of plate appearances when runners were on base. How does this change things? As noted, Bobby Abreu tied McCutchen for the league lead in OBI% at 19.8 percent, but changing gears and using this calculation would peg his rate at a 23.1 percent success rate, having knocked in runs other than himself in 77 PAs out of a grand total of 332 with at least one runner on base. This would not completely rectify the situation-perhaps Abreu did not hit the ball hard enough to plate multiple runners as frequently as Howard, but it does bypass the speed-of-the-runners issue by putting hitters on equal footing in this regard.
The ideal situation, to potentially bypass all of these issues, would involve an expected value approach that works to incorporate the probabilities of knocking in a runner given the averages at specific bases and comparing that to the actual tally. This would not necessarily replace the current iteration of OBI% but would rather add some clarity; a hitter's OBI% could be adjusted for the poor speed of preceding hitters and runners on base. If he hit a single up the middle that an average runner would have scored on if at second base, but Dunn fails to do so, he could be credited as knocking in a runner.
Ultimately, however, two of the goals of any statistic are to accurately model the game of baseball, and to be consistent on some level from a predictive point of view. Front-office types in both real and fantasy leagues want to know what to expect next year-thus one of the reasons to use component-based ERAs as opposed to plain vanilla ERA. The OBI approach makes much more sense than raw RBI totals given the dependency of the individual on the team, but what does an intra-class correlation think of the three different methods here-the raw RBI total, the OBI%, or the percentage of plate appearance with runners on resulting in any number of runs knocked in? From 2005-09, OBI% produced an ICC of 0.34, with the newer idea falling slightly behind it at 0.29.
The raw RBI total was each of their "daddies," with an ICC of 0.63. At that juncture, it dawned on me that the raw RBI total involved home runs, which are consistent in nature and should be subtracted out before being put through the ringer. The resulting raw OBI figure managed to beat its predecessor with a 0.72 ICC, essentially suggesting that over a five-year span, regardless of how the player rates as far as knocking in baserunners by a percentage, his raw figure is going to be more consistent than anything else. This does not imply that he is as skilled at driving in runners, especially with the much lower correlations for the OBI percentages, but rather that the actual raw tally should stay the same.
Of course, it would not be accurate to declare a "winner," because we are yet to build a metric based on the aforementioned expected value and speed adjustment approach, which in theory would more accurately model reality and describe a batter's ability to drive in runs. Given the various team dependencies, perhaps RBI more so than batting average is equivalent to pitcher W-L records in terms of limited utility, but while the latter can be rectified to an extent through adjustments OBI% on its own does not necessarily succeed 100 percent. It needs a bit more clarity to better reflect what happens when a ball is put in play with runners on base.
So I throw it out to the BP faithful, assuming you even care about metrics measuring the ability to plate runners, for your thoughts on the ideas involving expected values and adjustments for speed. They seem worth exploring and worthy of perfecting in order to bypass the various adjustments needed to make the performance area quantifiably meaningful.