If you’ve followed us at Baseball Prospectus for any length of time, you’re probably familiar with Equivalent Average, or EqA, one of our signature hitting stats. If you’re not, here’s the skinny: it’s the expression of how many runs a player created per plate appearance, translated to the familiar and easy-to-understand scale of batting average.
A .350 mark is outstanding; last year Albert Pujols (.368) and Joe Mauer (.346) led their respective leagues. A .300 mark is very good; last year Justin Upton and Jorge Posada both put up .301 EqAs. A .260 EqA is the definition of league-average figure; Rafael Furcal (.262) and Stephen Drew (.259) were both right around that mark. A .230 mark is replacement-level, the caliber of what a waiver-wire pickup or a Class AAA player could provide; a team has almost nothing to lose by trying something different than a player at this level. Note that the Rockies‘ Garrett Atkins (.230) and the Marlins‘ Emilio Bonifacio (.228) both lost their starting jobs last year.
There’s a lot of sausage grinding involved in turning hits, walks, total bases, stolen bases, caught stealing and other data into this batting average-like form. We even build park and league adjustments into the formula, so that a .300 EqA in hitter-friendly Coors Field has the same impact on scoring as it does in pitcher-friendly Petco Park, and a .300 today has the same impact as it did in the low-scoring 1960s or the high-scoring 1930s. It’s all worthwhile, because EqA does a much better job of predicting scoring levels than batting average, on-base percentage, slugging percentage, OPS (on-base plus slugging), OPS+, and more complicated run estimators.
This spring, we at BP have chosen to rebrand EqA as True Average (abbreviated TAv). Why? Because we feel strongly that the new name underscores our ability to get a “True-r” grasp on the quality of a hitter than the aforementioned traditional or more modern stats do. Quite frankly, we’re hopeful that this simple, easy-to-remember name can reach a wider audience.
The best way for those unacquainted to understand True Average is to look at several examples using our 2010 PECOTA hitter projections. Below are five players whose True Averages are higher than you might expect given their batting averages, OBPs, SLGs and other stats, and five whose True Averages are lower.
TAv higher than you might think
Prince Fielder, 1B, Brewers
Forecast: .287 AVG/.409 OBP/.586 SLG, 41 HR
True Average: .326
Say what you will about the impact of Fielder’s unique physique on his long-term prospects but the man can hit, and PECOTA loves him for it, forecasting Fielder to bop more homers than any other major-leaguer, and to post the highest True Average of any hitter this side of Albert Pujols. What boosts his TAv so far above his batting average is that it’s accompanied by 101 walks. Not only does he draw about 20 intentional passes per year, but his unintentional walk rate has risen from 8.3 percent in his 2006 rookie season to 12.4 percent last year.
Adrian Gonzalez, 1B, Padres
Forecast:.287/.393/.533, 34 HR
True Average: .325
At first glance, Gonzalez’s rate stats and homer total don’t look like they belong in the same ballpark as those of Fielder-and they don’t. While the Brewers’ Miller Park is somewhat pitcher-friendly, reducing scoring by about three percent, the Pads’ Petco Park curbs scoring by an MLB-high 11 percent, visibly depressing hitting stats. What True Average tells us is that relative to their environments, Prince and Gonzo are essentially equal in their productivity with the lumber.
Alex Rodriguez, 3B, Yankees
Forecast: .288/.403/.578, 39 HR
True Average: .320
According to PECOTA, A-Rod gets a 38-point boost in slugging percentage from the new Yankee Stadium, but even when his numbers are adjusted for context (and really, how often is the ever-controversial Rodriguez left in proper context?), he’s still among the game’s top sluggers. Only Fielder projects to hit more homers. Boosting Rodriguez beyond those already-impressive stats is his projection for 17 steals in 20 attempts, good for a few extra runs. True Average properly credits him for that small but measurable gain.
Grady Sizemore, CF, Indians
Forecast: .271/.387/.491, 26 HR
True Average: .306
Elbow woes and a sports hernia turned Sizemore’s 2009 into a painful proposition, but prior to that, he was a top-notch hitter, and PECOTA expects a rebound. Like Rodriguez, Sizemore gets a boost beyond his strong OBP and SLG rates via a projection of 26 steals at a 79 percent clip.
Adam Dunn, 1B, Nationals
Forecast: .250/.387/.493, 31 HR
True Average: .302
A classic low-average hitter who strikes out a ton (he set a single-season record with 195 in 2004) Dunn also provides plenty of power (41 homers per year since 2004). Additionally, he draws a huge number of walks (over 100 per year since 2004) because of his ability to work the count, not to mention the fear factor; pitchers would rather try to get the next guy out, than risk him hitting a homer.
TAv lower than you might think
Yuniesky Betancourt, SS, Royals
Forecast: .289/.313/.428, 10 HR
True Average: .239
Betancourt’s .289 average with 10 homers may pass first muster considering that he’s a shortstop, but his unwillingness to take a walk (one projected for every 30.5 plate appearances) and his ineptitude on the base paths (five projected steals in nine attempts) erode his value to the point where he’s well below average with the stick, and a significant drag on the Royals’ already-wheezing offense.
Delmon Young, LF, Twins
Forecast: .296/.335/.439, 16 HR
True Average: .258
Superficially, the batting average and home run totals look like solid progress for the 24-year-old Young, who has yet to live up to his one-time top prospect status. Alas, his hacktastic ways (a projected 101/28 K/BB ratio, still better than last year’s 92/12) undercut his contributions considerably, leaving him a tick below league average at an offense-first position.
Placido Polanco, 3B, Phillies
Forecast: .305/.355/.425, 9 HR
True Average: .264
The contact-oriented Polanco is very difficult to strike out, but his lack of power or patience mean he needs to hit above .330, not .300, to be a real offensive asset. He did that as recently as 2007, but he’s now 34. In a hitter’s park such as Citizen’s Bank, the above numbers won’t go very far.
Robinson Cano, 2B, Yankees
Forecast: .297/.338/.493, 26 HR
True Average: .270
That batting average and homer total make Cano look like the second coming of Jeff Kent, but the combination of playing in a power-friendly park, a low walk rate (projected 5.2 percent) and atrocious base running (two steals in eight attempts) all chip away at his the true value of his offensive contributions considerably.
Ichiro Suzuki, Mariners
Forecast: .317/.364/.414, 7 HR
True Average: .272
There’s no getting around the fact that Ichiro annually bedevils PECOTA, because the system simply doesn’t expect a man of his profile to keep putting up batting averages on balls in play north of .380, as he’s done twice in the past three years. Ignoring the actual projections for a moment, the take-home is that even with last year’s .352 batting average and 26 steals, Ichiro’s low walk and homer totals (32 and 11) kept his True Average at just .298. He’s risen above .300 just once, in 2004. An excellent hitter, if not an elite one.
A version of this story originally appeared on ESPN Insider .
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.Subscribe now
2. EqA is park adjusted, wOBA isn't, at least as I understand it.
3. The two have virtually identical correlations to runs scored, but TAv produces a smaller RMSE. I'll leave the defense of that statement and the grisly math to Clay Davenport, who's got data showing that. He'll have an article on the topic soon once he gets the PECOTA cards up, but perhaps I can get him to chime in here as well.
And, as Tango points out in his rebuttal, the wOBA published at StatCorner is park-adjusted - both are available so users can choose which they prefer.
Is BP actively trying to alienate the knowledgable audiece, here?
Consider also that information on batted ball types occupies just a tiny sliver of baseball history, and that the power of TAv (which by its original name has been in existence since BP's gestation days in the rec.sport.baseball newsgroup) is that it's built to enable cross-era and cross-environment comparisons, including the century-plus swath for which we have no batted ball data.
I would read that before asserting that it's better or worse than anything. It's an accurate analysis (as far as I can tell) of the basic construction of the stat, and gets at some technical issues.
From the past, also see:
From the more recent past, see the excellent wOBA/EqA analysis done by current BP writer Colin Wyers (sorry if this has already been linked here, seems like a natural place for it):
I imagine the data and discussion will be presented along the lines of Clay's "About EqA" piece from 2004 which was linked above: http://www.baseballprospectus.com/article.php?articleid=2596
One of the key take-home points from both that and Colin's linked THT piece above is the time range of comparison, because these formulas have been "tuned" to a given period. I'm not sure if this has changed, but at the time, Fangraphs only had wOBA going back to 1974. In Clay's piece, which was written in 2004, before wOBA was unveiled, he noted that there were ranges of time where EqA was essentialy on par with other systems, and ranges where it was significantly superior, and that he could improve its performance over recent eras with a greater number of category inputs (remember, stats like sacrifices, intentional walks and caught stealing have relatively limited histories). I imagine all of that will find its way into the discussion.
I assume Colin will be publishing his results however they turn out?
I'm curious to see if he finds his earlier article to have been wrong. Of course wOBA is of the nature that, since it's just straight linear weights converted to a rate stat, that it could be adapted to any set of weights
Perhaps it's true of TAv, but I don't know that it's accurate to say wOBA is "tuned" to a given time period. Instead I believe it is tuned to the run environment of the league for that season. In other words, we don't (or at least shouldn't) apply the correct linear weights from 2009 to figure out Babe Ruth's wOBA in 1920...we apply the correct linear weights from the AL in 1920 to figure Babe's wOBA in 1920. There are also different machinations you can go through to adjust for team-specific run-environments if you like (and I believe Rally at Baseball Projection does this for his linear weights if I'm not mistaken).
I look forward to reading the internal studies that have been floating around BP, but I think it would take quite a substantial result to trump wOBA. In my opinion the tie should go to the system that has better logical foundations, and I think wOBA fits that bill. It finds the "correct" value for each type of batting event and adds them together. If TAv has a tiny edge in RMSE that wouldn't make it better than wOBA because wOBA has a significant edge in principled foundations (also note that wOBA was not engineered for the purpose of minimizing RMSE of predicted team runs scored).
Your other points are important as well.
I realize it wasn't meant as a shocking placement... still make me laugh at how bad he is.
And Yuniesky Betancourt GAINED his starting job back when he was traded to the Royals for actual okay prospects.
I love Dayton Moore.
I do like the idea of using simple words to describe a new age stat instead of acronyms. One thing I'm tired of is guys making fun of new age stats because they sound funny -- VORP; WAR; stuff like that.
I'd stick to the core relationship that it describes and call it Run Scoring Average, or some variant of that which is simple and descriptive.
I'll second it.
BP's use of mixed-case abbreviations hurts broad acceptance. They make the stats look weird and complex, and while BP's particular audience probably tends to love weird and complex, the broad population is going to shy away from weird and complex. So tAVG suffers from the same problem, plus it breaks the long-established two- or three-letter convention of the most widely-accepted baseball stats.
The obvious problem BP faced with just using TA is referred to in computing circles as namespace, and the TA namespace is already well-used in ways unprintable in a family journal. So you threw a lower-case "v" on the end to try to avoid that association. Nice try, no cigar. TBA also has namespace issues with "To Be Announced."
I'd suggest burning the upper/lower rules and going with TAV, and in verbal use call it T-AV (TEE-ahv). Actually sounds kind of cool, imo, which should help it get picked up by announcers..."yeah, his ribbie's are up there, but when you look at that lineup's tee-ahvs it's clear the guys in front of him really deserve a lot of the credit."
The "True" question quickly dives into Plato and shadows on cave walls. My college girlfriend wrote my Philosophy papers (I wrote her History papers), so I'll stay out of that, but buffum's right, the Total Average namespace is already gone anyway.
Comprehensive Average would be ok, I guess, but it's not as slick as True Average, and you're heading into the extreme syllable range of some folks.
These are available every year via a link that looks like:
Basically, its possible to look at a players raw stats (hits, doubles, triples, homers, plate appearances, walks, K's, stolen bases, etc.) and determine how good he is. But few can do it, and its almost impossible to communicate.
The first level of description has been to take a couple of "important" statistics and combine them with a fairly arbitrary rate stat and say, "This his how you judge hitters!". It didn't take long for people to realize the limitations of Batting Average, Home Runs, and RBI's. Comparing, say, Joe Carter vs Brian Downing, for example:
.259 vs .267, 396 vs 275, and 1445 vs 1073. You might think that Carter might be a bit better, considering the huge advantage in homers and rbi's with a roughly similar batting average.
The second level of description is to look instead at OBA, SLG, and Plate Appearances. With this its easy to see the winner in our above confrontation. Downing's .370/.425 in 9309 PA's vs Carter's .306/.464 in 9154. Especially when you say that OBA is certainly more important than slugging. Or is it? you could argue that Carter's slugging outweighs Downing's OBA, especially given that Carter stole bases and Downing really didn't.
That takes us to a third level of description, which is what you are presenting. With that, its Downing's .287 vs Carter's .265 Ok, game over, end of discussion.
to look at single years from each of them, say, Joe Carter's 1996 vs Brian Downing's 1975.
Stat Carter Downing
BA .253 .240
HR 30 7
RBI 107 41
Runs 84 58
PA 682 516
OBA .306 .356
SLG .475 .324
SB 7 13
CS 6 4
Pos LF-1B C
TAv .251 .259
BRAR 1 18
FRAA -21 4
WARP(3) -2.2 2.5
Yes, Downing was a catcher, and apparantly not a bad one, at least for that year. by triple crown stats, Carter is a runaway, with a fairly respecatble .253-30-107 line, but in the end, its obvious who was the more valuable, and by nearly 5 wins no less.
(I'm aware that "Runs Created" has a long history, but is there a specific reason that the "RCA space" has been taken already?
In short, Jay Jaffe, get out of my head!
What a colossal waste of time.
I've been at BP long enough to know that we're never, ever going to please all of the people all of the time. It's a great stat by any name, and this was the opening salvo of an effort to further its reach significantly. Our partners at ESPN Insider didn't want an article with any math in it, they wanted something to introduce the stat to new readers.
To the extent that there is substantive criticism out there - and of course there is, because within our audience is a hardcore contingent which does care about the third decimal - it will be addressed in an article in the near future after we get the PECOTA card situation ironed out.
As for the point above about batted ball types being superior for a stat such as this, if you think there's no noise in that, holy ****. See Colin Wyers' work on line drive rates, press box heights, and scorer bias (http://www.hardballtimes.com/main/article/when-is-a-fly-ball-a-line-drive/) or our lengthy roundtable on BABIP and line drives (http://www.baseballprospectus.com/article.php?articleid=9928).