Prospectus Hit and Run: Call it True Average

February 25, 2010

If you’ve followed us at Baseball Prospectus for any length of time, you’re probably familiar with Equivalent Average, or EqA, one of our signature hitting stats. If you’re not, here’s the skinny: it’s the expression of how many runs a player created per plate appearance, translated to the familiar and easy-to-understand scale of batting average.

A .350 mark is outstanding; last year Albert Pujols (.368) and Joe Mauer (.346) led their respective leagues. A .300 mark is very good; last year Justin Upton and Jorge Posada both put up .301 EqAs. A .260 EqA is the definition of league-average figure; Rafael Furcal (.262) and Stephen Drew (.259) were both right around that mark. A .230 mark is replacement-level, the caliber of what a waiver-wire pickup or a Class AAA player could provide; a team has almost nothing to lose by trying something different than a player at this level. Note that the Rockies‘ Garrett Atkins (.230) and the Marlins‘ Emilio Bonifacio (.228) both lost their starting jobs last year.

There’s a lot of sausage grinding involved in turning hits, walks, total bases, stolen bases, caught stealing and other data into this batting average-like form. We even build park and league adjustments into the formula, so that a .300 EqA in hitter-friendly Coors Field has the same impact on scoring as it does in pitcher-friendly Petco Park, and a .300 today has the same impact as it did in the low-scoring 1960s or the high-scoring 1930s. It’s all worthwhile, because EqA does a much better job of predicting scoring levels than batting average, on-base percentage, slugging percentage, OPS (on-base plus slugging), OPS+, and more complicated run estimators.

This spring, we at BP have chosen to rebrand EqA as True Average (abbreviated TAv). Why? Because we feel strongly that the new name underscores our ability to get a “True-r” grasp on the quality of a hitter than the aforementioned traditional or more modern stats do. Quite frankly, we’re hopeful that this simple, easy-to-remember name can reach a wider audience.

The best way for those unacquainted to understand True Average is to look at several examples using our 2010 PECOTA hitter projections. Below are five players whose True Averages are higher than you might expect given their batting averages, OBPs, SLGs and other stats, and five whose True Averages are lower.

TAv higher than you might think

Prince Fielder, 1B, Brewers

Forecast: .287 AVG/.409 OBP/.586 SLG, 41 HR

True Average: .326
Say what you will about the impact of Fielder’s unique physique on his long-term prospects but the man can hit, and PECOTA loves him for it, forecasting Fielder to bop more homers than any other major-leaguer, and to post the highest True Average of any hitter this side of Albert Pujols. What boosts his TAv so far above his batting average is that it’s accompanied by 101 walks. Not only does he draw about 20 intentional passes per year, but his unintentional walk rate has risen from 8.3 percent in his 2006 rookie season to 12.4 percent last year.

Adrian Gonzalez, 1B, Padres
Forecast:.287/.393/.533, 34 HR
True Average: .325
At first glance, Gonzalez’s rate stats and homer total don’t look like they belong in the same ballpark as those of Fielder-and they don’t. While the Brewers’ Miller Park is somewhat pitcher-friendly, reducing scoring by about three percent, the Pads’ Petco Park curbs scoring by an MLB-high 11 percent, visibly depressing hitting stats. What True Average tells us is that relative to their environments, Prince and Gonzo are essentially equal in their productivity with the lumber.

Alex Rodriguez, 3B, Yankees
Forecast: .288/.403/.578, 39 HR
True Average: .320
According to PECOTA, A-Rod gets a 38-point boost in slugging percentage from the new Yankee Stadium, but even when his numbers are adjusted for context (and really, how often is the ever-controversial Rodriguez left in proper context?), he’s still among the game’s top sluggers. Only Fielder projects to hit more homers. Boosting Rodriguez beyond those already-impressive stats is his projection for 17 steals in 20 attempts, good for a few extra runs. True Average properly credits him for that small but measurable gain.

Grady Sizemore, CF, Indians
Forecast: .271/.387/.491, 26 HR

True Average: .306
Elbow woes and a sports hernia turned Sizemore’s 2009 into a painful proposition, but prior to that, he was a top-notch hitter, and PECOTA expects a rebound. Like Rodriguez, Sizemore gets a boost beyond his strong OBP and SLG rates via a projection of 26 steals at a 79 percent clip.

Adam Dunn, 1B, Nationals
Forecast: .250/.387/.493, 31 HR
True Average: .302
A classic low-average hitter who strikes out a ton (he set a single-season record with 195 in 2004) Dunn also provides plenty of power (41 homers per year since 2004). Additionally, he draws a huge number of walks (over 100 per year since 2004) because of his ability to work the count, not to mention the fear factor; pitchers would rather try to get the next guy out, than risk him hitting a homer.

TAv lower than you might think

Yuniesky Betancourt, SS, Royals
Forecast: .289/.313/.428, 10 HR
True Average: .239
Betancourt’s .289 average with 10 homers may pass first muster considering that he’s a shortstop, but his unwillingness to take a walk (one projected for every 30.5 plate appearances) and his ineptitude on the base paths (five projected steals in nine attempts) erode his value to the point where he’s well below average with the stick, and a significant drag on the Royals’ already-wheezing offense.

Delmon Young, LF, Twins
Forecast: .296/.335/.439, 16 HR
True Average: .258
Superficially, the batting average and home run totals look like solid progress for the 24-year-old Young, who has yet to live up to his one-time top prospect status. Alas, his hacktastic ways (a projected 101/28 K/BB ratio, still better than last year’s 92/12) undercut his contributions considerably, leaving him a tick below league average at an offense-first position.

Placido Polanco, 3B, Phillies
Forecast: .305/.355/.425, 9 HR
True Average: .264
The contact-oriented Polanco is very difficult to strike out, but his lack of power or patience mean he needs to hit above .330, not .300, to be a real offensive asset. He did that as recently as 2007, but he’s now 34. In a hitter’s park such as Citizen’s Bank, the above numbers won’t go very far.

Robinson Cano, 2B, Yankees
Forecast: .297/.338/.493, 26 HR
True Average: .270
That batting average and homer total make Cano look like the second coming of Jeff Kent, but the combination of playing in a power-friendly park, a low walk rate (projected 5.2 percent) and atrocious base running (two steals in eight attempts) all chip away at his the true value of his offensive contributions considerably.

Ichiro Suzuki, Mariners
Forecast: .317/.364/.414, 7 HR
True Average: .272
There’s no getting around the fact that Ichiro annually bedevils PECOTA, because the system simply doesn’t expect a man of his profile to keep putting up batting averages on balls in play north of .380, as he’s done twice in the past three years. Ignoring the actual projections for a moment, the take-home is that even with last year’s .352 batting average and 26 steals, Ichiro’s low walk and homer totals (32 and 11) kept his True Average at just .298. He’s risen above .300 just once, in 2004. An excellent hitter, if not an elite one.

A version of this story originally appeared on ESPN Insider .

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Jay Jaffe

Latest Articles

You need to be logged in to comment. Login or Subscribe

ils4O1

2/25

Why is TAv better than wOBA?

Reply to ils4O1

jjaffe

2/25

1. The fact that the stat is scaled to batting average makes it easier for the average fan to understand than wOBA being scaled to OBP. ".300 is good" is a notion with t over 100 years of baseball history behind it.

2. EqA is park adjusted, wOBA isn't, at least as I understand it.

3. The two have virtually identical correlations to runs scored, but TAv produces a smaller RMSE. I'll leave the defense of that statement and the grisly math to Clay Davenport, who's got data showing that. He'll have an article on the topic soon once he gets the PECOTA cards up, but perhaps I can get him to chime in here as well.

Reply to jjaffe

llewdor

2/26

Though EqA is derived from results, not batted ball types, so there's more noise in the signal on a player-by-player basis.

And, as Tango points out in his rebuttal, the wOBA published at StatCorner is park-adjusted - both are available so users can choose which they prefer.

Is BP actively trying to alienate the knowledgable audiece, here?

Reply to llewdor

jjaffe

2/27

There's plenty of noise when it comes to relying on batted ball types for ANYTHING. See Colin Wyers' work on line drive rates, press box heights, and scorer bias (http://www.hardballtimes.com/main/article/when-is-a-fly-ball-a-line-drive/) or our lengthy roundtable on BABIP and line drives (http://www.baseballprospectus.com/article.php?articleid=9928), or the Seidman/Swartz SIERA series on SIERA.

Consider also that information on batted ball types occupies just a tiny sliver of baseball history, and that the power of TAv (which by its original name has been in existence since BP's gestation days in the rec.sport.baseball newsgroup) is that it's built to enable cross-era and cross-environment comparisons, including the century-plus swath for which we have no batted ball data.

Reply to jjaffe

lukiewerle

2/26

For those interested, you can read Tango's response here: http://www.insidethebook.com/ee/index.php/site/article/eqa_renamed_tav_true_average/

Reply to lukiewerle

devilfingers72

2/26

One classic analysis of EqA/EqR is found here:

http://walksaber.blogspot.com/2008/05/analysis-of-clay-davenports-eqr-and-eqa.html

I would read that before asserting that it's better or worse than anything. It's an accurate analysis (as far as I can tell) of the basic construction of the stat, and gets at some technical issues.

From the past, also see:

http://www.insidethebook.com/ee/index.php/site/comments/why_is_eqa_so_complicated/

From the more recent past, see the excellent wOBA/EqA analysis done by current BP writer Colin Wyers (sorry if this has already been linked here, seems like a natural place for it):

http://www.hardballtimes.com/main/blog_article/is-eqa-better-than-woba/

And,

Reply to devilfingers72

jjaffe

2/27

I'm familiar with that work, and I'm also familiar with data that's been circulated internally within BP which will rebut that. As I said before, I'm leaving the math-level details regarding the formula and its construction to Clay Davenport.

Reply to jjaffe

devilfingers72

2/27

I was just putting it out there for people to read, not to make any particular claim. I eagerly await the publication of the internal studies you mentioned, since transparency benefits everyone. I assume that you wouldn't refer to those studies if they weren't going to be made available to everyone, given your comment above. It would be really interesting for all to see. Was Colin convinced?

Reply to devilfingers72

jjaffe

2/27

I won't presume to know what Colin thinks, but I can tell you that he's been crunching numbers on this, too.

I imagine the data and discussion will be presented along the lines of Clay's "About EqA" piece from 2004 which was linked above: http://www.baseballprospectus.com/article.php?articleid=2596

One of the key take-home points from both that and Colin's linked THT piece above is the time range of comparison, because these formulas have been "tuned" to a given period. I'm not sure if this has changed, but at the time, Fangraphs only had wOBA going back to 1974. In Clay's piece, which was written in 2004, before wOBA was unveiled, he noted that there were ranges of time where EqA was essentialy on par with other systems, and ranges where it was significantly superior, and that he could improve its performance over recent eras with a greater number of category inputs (remember, stats like sacrifices, intentional walks and caught stealing have relatively limited histories). I imagine all of that will find its way into the discussion.

Reply to jjaffe

devilfingers72

2/27

If you're saying that people are going to publish on this, I look forward to it. It will be good to know what data sets there are, so that the results can be independently checked by disinterested parties.

I assume Colin will be publishing his results however they turn out?

I'm curious to see if he finds his earlier article to have been wrong. Of course wOBA is of the nature that, since it's just straight linear weights converted to a rate stat, that it could be adapted to any set of weights

Reply to devilfingers72

mickeyg13

3/01

Currently the "Leaders" on FanGraphs will only show wOBA back to 1974, but you can go back more than that for individual players. For instance, Babe Ruth had a wOBA of .600 in 1920 (and keep in mind that wOBA is purposefully set to be in the OBP scale).

Perhaps it's true of TAv, but I don't know that it's accurate to say wOBA is "tuned" to a given time period. Instead I believe it is tuned to the run environment of the league for that season. In other words, we don't (or at least shouldn't) apply the correct linear weights from 2009 to figure out Babe Ruth's wOBA in 1920...we apply the correct linear weights from the AL in 1920 to figure Babe's wOBA in 1920. There are also different machinations you can go through to adjust for team-specific run-environments if you like (and I believe Rally at Baseball Projection does this for his linear weights if I'm not mistaken).

I look forward to reading the internal studies that have been floating around BP, but I think it would take quite a substantial result to trump wOBA. In my opinion the tie should go to the system that has better logical foundations, and I think wOBA fits that bill. It finds the "correct" value for each type of batting event and adds them together. If TAv has a tiny edge in RMSE that wouldn't make it better than wOBA because wOBA has a significant edge in principled foundations (also note that wOBA was not engineered for the purpose of minimizing RMSE of predicted team runs scored).

Reply to mickeyg13

devilfingers72

3/01

I believe you are correct about the FanGrapphs implementation of wOBA, which I think is based on

http://www.insidethebook.com/ee/index.php/site/article/woba_year_by_year_calculations/

Your other points are important as well.

Reply to devilfingers72

millerho

2/25

Maybe people reading at ESPN think more highly of Ichiro and less highly of Adam Dunn than they should, but I don't think the same can be said of the average reader here at BP.

Reply to millerho

alskor

2/25

I was thinking the same thing when I saw Betancourt atop the the "TAv lower than you might think" category! I bet he would win a poll for who BP readers think is the worst regular in baseball.

I realize it wasn't meant as a shocking placement... still make me laugh at how bad he is.

Reply to alskor

drewsylvania

2/25

"Note that the Rockies' Garrett Atkins (.230) and the Marlins' Emilio Bonifacio (.228) both lost their starting jobs last year."

And Yuniesky Betancourt GAINED his starting job back when he was traded to the Royals for actual okay prospects.

I love Dayton Moore.

Reply to drewsylvania

mswain784

2/25

Don't love the acronym. tAVG? TBA (true batting average)? Not great I know, but TAv just seems weird and not like something that will catch on with an average fan.

Reply to mswain784

phellad

2/25

came here to say this. Although it contains steals, True Batting Average makes more sense if we're scaling it to BA... Actually, the "true" may even come off as comic book guy-style condescending. How about Overall Average or Hitter's Average?

Reply to phellad

yadenr

2/25

I second the second guessing of "true." Is it REALLY true? Or is it more "comprehensive" or maybe "total". Total works since you wouldn't have to change any acronyms.

Reply to yadenr

smitty99

2/25

I thought that too. But I think someone already came up with Total Average. Indeed, it was Tom Boswell way back in the 70s.

I do like the idea of using simple words to describe a new age stat instead of acronyms. One thing I'm tired of is guys making fun of new age stats because they sound funny -- VORP; WAR; stuff like that.

Reply to smitty99

iljabetlem

2/25

I think it's a great stat, and it's a good thing to ditch EqA and it becomes a nice little marketing exercise. But I agree with others, TAv is hardly the answer, that does sound condescending, that little v makes it look too 'mathematical' and with that come knee-jerk responses to reject it.

I'd stick to the core relationship that it describes and call it Run Scoring Average, or some variant of that which is simple and descriptive.

Reply to iljabetlem

PaddyE

2/25

RSA is very good--it more accurately describes what it is than TAV, there's no significant namespace conflicts, it avoids the side-quibble with the style guide, and it sticks to three characters.

I'll second it.

Reply to PaddyE

PaddyE

2/25

Agreed, but I'd go further: any abbreviation with mixed upper & lower case has got a very steep uphill climb to broad acceptance.

BP's use of mixed-case abbreviations hurts broad acceptance. They make the stats look weird and complex, and while BP's particular audience probably tends to love weird and complex, the broad population is going to shy away from weird and complex. So tAVG suffers from the same problem, plus it breaks the long-established two- or three-letter convention of the most widely-accepted baseball stats.

The obvious problem BP faced with just using TA is referred to in computing circles as namespace, and the TA namespace is already well-used in ways unprintable in a family journal. So you threw a lower-case "v" on the end to try to avoid that association. Nice try, no cigar. TBA also has namespace issues with "To Be Announced."

I'd suggest burning the upper/lower rules and going with TAV, and in verbal use call it T-AV (TEE-ahv). Actually sounds kind of cool, imo, which should help it get picked up by announcers..."yeah, his ribbie's are up there, but when you look at that lineup's tee-ahvs it's clear the guys in front of him really deserve a lot of the credit."

Reply to PaddyE

PaddyE

2/25

My response above was to mswain784's comment, btw.

The "True" question quickly dives into Plato and shadows on cave walls. My college girlfriend wrote my Philosophy papers (I wrote her History papers), so I'll stay out of that, but buffum's right, the Total Average namespace is already gone anyway.

Comprehensive Average would be ok, I guess, but it's not as slick as True Average, and you're heading into the extreme syllable range of some folks.

Reply to PaddyE

phellad

2/25

yeah, if you go with TA, does Christina have to get royalties?

Reply to phellad

wcarroll

2/25

It's pronounced "True Average."

Reply to wcarroll

buffum

2/25

Tom Boswell had a "Total Average" in the '70s, so using Total in the term might sound misleading.

Reply to buffum

yadenr

2/25

They had statistics in the 70's? :)

Reply to yadenr

chabels

2/25

Am I correct that TAv is not position adjusted? If that is correct, what is the avergage TAv's for each position?

Reply to chabels

jjaffe

2/25

For last year:

Pos EQA
P .114
C .254
1B .293
2B .267
3B .270
SS .258
LF .277
CF .269
RF .276
Oth .263

These are available every year via a link that looks like:
http://www.baseballprospectus.com/statistics/eqa2009.php#postot

Reply to jjaffe

PBuEsq

2/25

What will happen to archive articles that reference EQA? Will they still contain the term?

Reply to PBuEsq

jjaffe

2/25

We're not going to go back and change what's been published.

Reply to jjaffe

regfairfield

2/25

So since you can't make new stats without major screwups, you just make new names for old stuff?

Reply to regfairfield

BeplerP

2/25

Sir: Snark becomes you, sir, like the conical cap on the dunce. GO AWAY! Thank you very much...

Reply to BeplerP

irablum

2/25

Love the article! I specifically love the way its presented. When talking about how a stat relates its judgment of a player, it has to pass the "sniff" test. (and you understand that...) A stat that jumps out and says something ridiculous like Ichiro is a below average outfielder or that Yunieski Betencourt was an MVP candidate needs to be re-thought. By taking the stat and putting it up against some note-worthy names, it shows better exactly why the stat is usefull and important.

Basically, its possible to look at a players raw stats (hits, doubles, triples, homers, plate appearances, walks, K's, stolen bases, etc.) and determine how good he is. But few can do it, and its almost impossible to communicate.

The first level of description has been to take a couple of "important" statistics and combine them with a fairly arbitrary rate stat and say, "This his how you judge hitters!". It didn't take long for people to realize the limitations of Batting Average, Home Runs, and RBI's. Comparing, say, Joe Carter vs Brian Downing, for example:
.259 vs .267, 396 vs 275, and 1445 vs 1073. You might think that Carter might be a bit better, considering the huge advantage in homers and rbi's with a roughly similar batting average.

The second level of description is to look instead at OBA, SLG, and Plate Appearances. With this its easy to see the winner in our above confrontation. Downing's .370/.425 in 9309 PA's vs Carter's .306/.464 in 9154. Especially when you say that OBA is certainly more important than slugging. Or is it? you could argue that Carter's slugging outweighs Downing's OBA, especially given that Carter stole bases and Downing really didn't.

That takes us to a third level of description, which is what you are presenting. With that, its Downing's .287 vs Carter's .265 Ok, game over, end of discussion.

to look at single years from each of them, say, Joe Carter's 1996 vs Brian Downing's 1975.

Stat Carter Downing
BA .253 .240
HR 30 7
RBI 107 41
Runs 84 58
PA 682 516
OBA .306 .356
SLG .475 .324
SB 7 13
CS 6 4
Pos LF-1B C
TAv .251 .259
BRAR 1 18
FRAA -21 4
WARP(3) -2.2 2.5

Yes, Downing was a catcher, and apparantly not a bad one, at least for that year. by triple crown stats, Carter is a runaway, with a fairly respecatble .253-30-107 line, but in the end, its obvious who was the more valuable, and by nearly 5 wins no less.

Reply to irablum

joelefkowitz

2/25

On a related note, (as far as sabermetrics becoming more accessible to the masses) MLB 2k10 the video game tracks WPA as you play through the game. After big events, a little chart called "Pepsi's Win Probability" pops up and shows the change.

Reply to joelefkowitz

dgreene007

2/25

I share the concerns about "true" anything and awkward acronyms. What the stat describes is "how many runs a player created per plate appearance," scaled to the familiar batting average. Why not just call it RCA (runs created average)?

(I'm aware that "Runs Created" has a long history, but is there a specific reason that the "RCA space" has been taken already?

Reply to dgreene007

PaddyE

2/25

Radio Corporation of America.

Reply to PaddyE

dgreene007

2/25

Oops! I suppose that qualifies as taking the space. The company hasn't really existed since 1986 but the trademark still has lots of cultural and economic resonance. Oh well...

Reply to dgreene007

Gugilymugily

2/25

Seriously, yesterday, a friend and I got into a discussion of why Adrian Gonzalez and Prince Fielder had similar EqAs, which led to much furious scribbling of their numbers to get the raw EqA, and then concluding it was probably a result of Petco.

In short, Jay Jaffe, get out of my head!

Reply to Gugilymugily

lkarns6135

2/25

Is EqA not sortable for pitcher handedness or am I just being not smart?

Reply to lkarns6135

llewdor

2/26

Umm, why? You went to all this trouble to change the label on a stat line? There's no new information here.

What a colossal waste of time.

Reply to llewdor

nosybrian

2/27

Jay: A lot of nit-picking on the name here but none of the criticisms is substantive that I see. TAV is as good as any alternative.

Reply to nosybrian

jjaffe

2/27

Thanks.

I've been at BP long enough to know that we're never, ever going to please all of the people all of the time. It's a great stat by any name, and this was the opening salvo of an effort to further its reach significantly. Our partners at ESPN Insider didn't want an article with any math in it, they wanted something to introduce the stat to new readers.

To the extent that there is substantive criticism out there - and of course there is, because within our audience is a hardcore contingent which does care about the third decimal - it will be addressed in an article in the near future after we get the PECOTA card situation ironed out.

As for the point above about batted ball types being superior for a stat such as this, if you think there's no noise in that, holy ****. See Colin Wyers' work on line drive rates, press box heights, and scorer bias (http://www.hardballtimes.com/main/article/when-is-a-fly-ball-a-line-drive/) or our lengthy roundtable on BABIP and line drives (http://www.baseballprospectus.com/article.php?articleid=9928).

Reply to jjaffe

redspid

2/28

There's way too many math dorks on this site. Nice article

Reply to redspid

bmmcmahon

3/01

FWIW, I don't see the point of changing the name of a stat that already has a lot of history. Then again, I think there should be a federal ban on changing the names of ballparks.

Reply to bmmcmahon

holgado

3/01

I have to agree with this. Bill James said that baseball statistics, or at least the best ones, are distinct from those of other sports because and to the extent they have "acquired the power of language." I assume it is this concept (and BP's ever-growing desire to expand its horizons to reach a broader audience) that led ESPN to recommend (and BP to embrace) a change of the name of this stat from EqA to TAv. Somewhat ironically, however, I think the change ignores that -- for the narrower audience of long time BP subscribers, at least -- EqA had *already* acquired the power of language. I'm sure in a few years TAv will acquire a similar power for many, though I'm at a loss for why anyone would think that it has greater potential to do so than EqA. And more to the point, I'm a bit frustrated by the fact that BP would make this change without recognizing that, for that narrower audience, this is akin to telling us that the word "pickle" has been stricken from the English language, and we now have to call pickles... I don't know... fermented cucumbers. I'm'na say pickle, if that's alright.

Reply to holgado

Prospectus Hit and Run: Call it True Average

Thank you for reading

Latest Articles

TA: The Orioles Stock Up, While Relievers Go Cheap $

Fantasy Starting Pitcher Planner ’24: Week 17 $

Deep League Landscape ’24: Week 17 $

TA: Seattle Mariners Press Randy Arozarena Into Service $

MLU: Brannigan’s Law $

Jay Jaffe

Latest Articles

TA: The Orioles Stock Up, While Relievers Go Cheap $

Fantasy Starting Pitcher Planner ’24: Week 17 $

Deep League Landscape ’24: Week 17 $