Checking the Numbers: Knocking in Those Ducks

December 2, 2009

One of my favorite things to do throughout the baseball season is call upon the Extra Innings television package and watching a multitude of games for teams I would otherwise never watch. Last season, one of my teams was the Washington Nationals, which afforded me the opportunity to see Josh Willingham display his early-season gift for hitting solo home runs. For the first two and a half months of the season, it seemed that was the only situation in which he could launch a long ball. On June 1, Willingham had hit nine home runs, and none of them had come with a runner on base, giving him a jarring lower-third graphic: 9 HR, 12 RBI. The streak eventually extended to 11 solo home runs before he “regressed,” finishing the year with 15 solo jacks out of 24 total, a rate of 62.5 percent. What made this rather curious was that, in spite of the Nationals struggling mightily as a team, their offense really could produce runs. Willingham had the benefit of on-base luminaries like Adam Dunn and Nick Johnson hitting in front of him, as well as Ryan Zimmerman.

At the time, I researched the phenomenon to find the highest percentages of solo home runs at different cut-off points. For instance, of those players with 15 or more blasts in a year, Ken Singleton has a perfect score, knocking exactly 15 balls out the yard in 1975, none of them with a man aboard. Up the minimum to 20 home runs, and Curtis Granderson‘s 2007 season featured 21 of his 23 homers being solo shots; Dave Winfield came in second place with 18 of his 20 being one-man shows in 1974. Of those with 25 or more home runs in a season, Toby Harrah‘s 1982 campaign featured a ratio of 22/25 going solo, with Bobby Bonds‘ 22 one-run blows out of 26 total in 1970 just a smidge behind. Getting really out there, when the minimum is set to 40 home runs, Richard Hidalgo leads the pack with a 35/44 rate in 2000; if Barry Bonds were more selfish in the 2003 season than his 35-of-45 split, he’d have another record.

The leaders in each scenario tended to lead games off, which makes intuitive sense since, aside from batting first with nobody on base, they would need the eighth- and ninth-place hitters to reach base later in games; in the senior circuit, those spots are generally reserved for the worst-hitting regular and the pitcher. This is all anecdotal and fun, but it got my mind motoring about RBI in general. Most avid fans understand the dependency of the statistic on teammates-reverting to Willingham, we would expect a player in his situation to knock in a decent amount of runners, especially given the success at reaching base of the hitters before him in the order.

To that end, Baseball Prospectus publishes the statistics OBI and OBI%. OBI measures the raw number of RBI minus individual RBI that result from home runs, and OBI% is the percentage of baserunners knocked in out of the total number of runners on base for a hitter in a given season. In 2009, the entire league averaged 13.9 percent of their baserunners knocked in, which broke down further to 5.0 percent of runners on first, 15.1 percent of runners on second, and 36.4 percent of runners on third. There were a grand total of 187,079 plate appearances, 84,248 of which came with runners on base in any of the various configurations, providing batters the opportunity for knocking in others in 45 percent of the aggregate trips to the dish. One of the uses of these stats is being able to find efficient run-producers whose abilities may be masked by batting behind poor hitters, or who hit in a batting slot less traditionally known for cranking out ribbies.

For an example of such a situation, look no further than the 2009 leaders in OBI% with at least 400 plate appearances:


Player               PA    OBI%
Andrew McCutchen    493    19.8
Bobby Abreu         667    19.8
Howie Kendrick      400    19.8
Gordon Beckham      430    19.8
Gerardo Parra       491    19.2
Torii Hunter        506    19.2
Ryan Howard         703    19.2
Hanley Ramirez      652    19.2
Joe Mauer           606    19.2
Todd Helton         645    19.1

McCutchen hit in the leadoff spot in all 108 of the games he played in, yet he managed to lead the league in the percentage of baserunners knocked in. His more traditional RBI total of 54 comes off as solid given the partial season and lineup position, but it understates his in-season success at plating runners. If he’d been able to keep it up while batting with more runners on base in front of him-either through an improved offense or a drop in the batting order-the Bucco rookie’s batting line may have been that much more aesthetically pleasing. We also break the OBI% down by specific bases, leading to another important point worth discussing after the boards:


Player         R1BI%    Player         R2BI%    Player            R3BI%
Ryan Howard    12.9     Yunel Escobar  25.4     Andrew McCutchen  57.5
Jim Thome      11.4     Derrek Lee     25.0     Miguel Tejada     53.7
Pablo Sandoval 10.7     Carlos Lee     24.7     Craig Counsell    53.7
Lyle Overbay   10.4     Torii Hunter   24.4     Todd Helton       50.6
Adam Lind      10.3     Joe Mauer      24.1     Four Tied*        50.0

*: Orlando Cabrera, Delmon Young, Kurt Suzuki, Pedro Feliz

Note Ryan Howard’s inclusion atop the R1BI% leader board-Howard hits the ball harder than any other player in the game, and the bulk of these come on the heels of homer, which automatically plate runners. However, Howard has the benefit of three smart and speedy runners hitting before him in Jimmy Rollins, Shane Victorino, and Chase Utley. If Rollins stands on first and Howard hits a double down the right-field line, it is usually a ‘gimme’ for Rollins, the same of which cannot be said for, say, Jermaine Dye or Carlos Quentin, or anyone else that happened to hit in front of Thome. By calculating OBI% on the basis of total runners knocked in out of total runners on base, I fear we run the risk of over-crediting the ability of the hitter while understating the value of the speed of the baserunners. Additionally, as has been noted before, the 2007-09 Phillies are the most efficient base-stealing team in major-league history, making the pre-Howard hitters more than capable of putting themselves in a better position to score on one of his hits. For Willingham, teammates like Dunn and Johnson might have reached base aplenty, but they’re also famously lead-footed; with one of them on second, a single to the outfield is in no way a sure thing to result in a run scored-or an RBI.

Kicking the concept around in a conversation with Christina Kahrl led to the idea of calculating OBI% a bit differently, by taking the total number of plate appearances in which a baserunner scored and dividing that figure by the total number of plate appearances when runners were on base. How does this change things? As noted, Bobby Abreu tied McCutchen for the league lead in OBI% at 19.8 percent, but changing gears and using this calculation would peg his rate at a 23.1 percent success rate, having knocked in runs other than himself in 77 PAs out of a grand total of 332 with at least one runner on base. This would not completely rectify the situation-perhaps Abreu did not hit the ball hard enough to plate multiple runners as frequently as Howard, but it does bypass the speed-of-the-runners issue by putting hitters on equal footing in this regard.

The ideal situation, to potentially bypass all of these issues, would involve an expected value approach that works to incorporate the probabilities of knocking in a runner given the averages at specific bases and comparing that to the actual tally. This would not necessarily replace the current iteration of OBI% but would rather add some clarity; a hitter’s OBI% could be adjusted for the poor speed of preceding hitters and runners on base. If he hit a single up the middle that an average runner would have scored on if at second base, but Dunn fails to do so, he could be credited as knocking in a runner.

Ultimately, however, two of the goals of any statistic are to accurately model the game of baseball, and to be consistent on some level from a predictive point of view. Front-office types in both real and fantasy leagues want to know what to expect next year-thus one of the reasons to use component-based ERAs as opposed to plain vanilla ERA. The OBI approach makes much more sense than raw RBI totals given the dependency of the individual on the team, but what does an intra-class correlation think of the three different methods here-the raw RBI total, the OBI%, or the percentage of plate appearance with runners on resulting in any number of runs knocked in? From 2005-09, OBI% produced an ICC of 0.34, with the newer idea falling slightly behind it at 0.29.

The raw RBI total was each of their “daddies,” with an ICC of 0.63. At that juncture, it dawned on me that the raw RBI total involved home runs, which are consistent in nature and should be subtracted out before being put through the ringer. The resulting raw OBI figure managed to beat its predecessor with a 0.72 ICC, essentially suggesting that over a five-year span, regardless of how the player rates as far as knocking in baserunners by a percentage, his raw figure is going to be more consistent than anything else. This does not imply that he is as skilled at driving in runners, especially with the much lower correlations for the OBI percentages, but rather that the actual raw tally should stay the same.

Of course, it would not be accurate to declare a “winner,” because we are yet to build a metric based on the aforementioned expected value and speed adjustment approach, which in theory would more accurately model reality and describe a batter’s ability to drive in runs. Given the various team dependencies, perhaps RBI more so than batting average is equivalent to pitcher W–L records in terms of limited utility, but while the latter can be rectified to an extent through adjustments OBI% on its own does not necessarily succeed 100 percent. It needs a bit more clarity to better reflect what happens when a ball is put in play with runners on base.

So I throw it out to the BP faithful, assuming you even care about metrics measuring the ability to plate runners, for your thoughts on the ideas involving expected values and adjustments for speed. They seem worth exploring and worthy of perfecting in order to bypass the various adjustments needed to make the performance area quantifiably meaningful.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Eric Seidman

Latest Articles

You need to be logged in to comment. Login or Subscribe

dianagramr

12/02

This kind of analysis, and tinkering with it as time goes on, is one of the biggest reasons I love BP.

Reply to dianagramr

newsense

12/02

The expected values should probably be based upon the components of EqBRR for each runner that address advancing on hits and outs. The problem is that these are confounded by factors that also affect OBI%: whether a runner on second scores on a single has as much to do with the quality of the hit (line drive vs. infield slow roller) as the runner's skill in advancing.

Reply to newsense

hjw099

12/02

Perhaps a simple average of the old OBI% divisor (number of men on) and the new OBI% divisor (PAs with men on) and then a EqBRR+ (team's average relative to league) weight added. Maybe you'd have to weight it more for baserunners rather than the simple average because of the significant difference between 1 man on (usually a single/walk) verusus multiple men (always at least 1 RISP). That seems too simple though, and doesn't address the problem of distinguishing Michael Bourn singles-type runners versus Granderson power-type runners who would already be advanced.

Regardless, I very much enjoyed the article and look forward to minds more capable than mine of working this BI statistic problem.

Reply to hjw099

joeboxr36

12/02

These ads that expand themselves when you mouse over are extremely annoying. Just saying.

Reply to joeboxr36

MJMcC0

12/02

I had complained about them some time ago and got a very positive reply from BP's tech staff, but they said they needed a screen print to isolate which ads were causing the problem. Needless to say, it didn't happen again for several days, by which time I had discarded the staff's E-mail.

Reply to MJMcC0

eighteen

12/02

Same here.

Does anyone else think the pop-ups have reduced significantly the past couple weeks? Seems that way to me.

Reply to eighteen

MJMcC0

12/02

An alternative adjustment to the ones Eric discusses is to normalize for the kind of batted-in opportunities batters have faced. It's a little like what we know is true of BABIP, that line drives generate more hits than fly balls. Simliarly, runners on third are more likely to be batted in than runners on first. If a particular batter faces an above-average number of R3 opportunities, we should expect him to have a higher OBI% but it would not be meaningful. For example, I took the OBI numbers from last September and recomputed an Adjusted OBI%, giving every batter the same proportion of runners on 1st, 2nd, and 3rd (49%, 33%, and 18% respectively). Among the big movers upward in performance were Brian McCann (from 18.6% to 19.8% [rank: 25 to 8]) and Yorvit Torealba (18.2% to 20.8% [36 >> 5]). The downward movers include Bobby Abreu -- of the famed bat control -- (19.9% >> 18.2% [7 >> 35]) and Todd Helton (18.8% >> 17.0% [24 >> 68]). Because the proportion of runners in any particular configuration is largely a matter of chance, I would anticipate that this adjusted OBI% is more consistent than straight OBI%.

Reply to MJMcC0

DrDave

12/02

I did some work a few years back using linear programming to identify, for a fixed OPS, what batting line would lead to the most/least RBI for a player who ended up batting in a league-average mix of base-out situations, and got league-average runner advancement on his singles and doubles. I don't have those results with me, but there was a surprising amount of 'play' in RBI rate for a fixed OPS, even if you limited yourself to batting lines that might occur in MLB.

Reply to DrDave

blcartwright

12/02

Reminds me of way back in 1973, before anyone had heard of computers, but during a multi-year stretch when I scored every Pirates game, then spent the winter (classtime) devining some kind of stats from them.

In that year of 1973, Willie Stargell hit 30 come from behind homeruns out of 44. 30 Times his homer came with the Pirates trailing, and either tied the score or put the Pirates ahead. And Pete Rose won the MVP.

Reply to blcartwright

BurrRutledge

12/03

Apparently 'clutch' goes in and out of vogue, but 'grit' is always in style.

Reply to BurrRutledge

ckahrl

12/04

That's because it always gets caught in the gears, whereas a clutch doesn't alway pop.

Reply to ckahrl

davestasiuk

12/04

*rimshot*

Reply to davestasiuk

brucegilsen

12/06

I'm probably being a moron here, but I've seen nothing about the number of outs. Doesn't that matter? If you hit with less than 2 out, you can get an RBI without a hit, which is *much* easier than with 2 out. Given sample sizes for a season of plate appearances, couldn't that matter?

Reply to brucegilsen

EJSeidman

12/07

Bruce,

Yeah, that would be included in any expected value approach. If RBIs occur 4% of the time with runners on third and two outs, than such a runner knocked in would be counted differently than if the runner was on third with one out.

Reply to EJSeidman

Checking the Numbers: Knocking in Those Ducks

Thank you for reading

Latest Articles

Picking Guys Out of a Lineup 2024 $

Box Score Banter: Dealin’ Dylan Does the Deed B

To Swing and Miss Less is Tough Business $

Do Sophomores Still Slump? $

The Heat Check: Loperfido Looms, Collier Crushing $

Eric Seidman

Latest Articles

Picking Guys Out of a Lineup 2024 $

Box Score Banter: Dealin’ Dylan Does the Deed B

To Swing and Miss Less is Tough Business $