Checking the Numbers: MauerQuest!

July 17, 2009

Prior to the 2006 season during which Joe Mauer produced a gaudy .347/.429/.507 triple-slash line, no junior circuit catcher had ever won a batting title. Last season, Mauer once again found his name atop the batting-average charts thanks to a .328/.413/.451 showing, and this season he is currently in solid position for a third batting title by hitting at a .373/.447/.622 clip. The 26-year-old backstop has always been lauded for his plate discipline, gap power, and his defense behind the plate, but as a hitter he has kicked the gears into overdrive this season, reaching base and smashing the ball around at Pujolsian levels. After going hitless in four at-bats against the Astros on June 21, Mauer’s line rested at .407/.475/.727, naturally inviting speculation-similarly to that surrounding Chipper Jones at this time last season-as to whether or not the .400 batting average could be sustained over the season’s remaining balance.

Mauer’s playing time has thrown a wrench into the mix, however, mainly in that catchers naturally accrue lower tallies of plate appearances due to the rest that’s prescribed against the normal positional tear of catching, but also because he spent the first month of the season on the disabled list to boot. These playing-time issues are generally considered to be of benefit to the Twins catcher, as an extremely high batting average would be much tougher to sustain over larger amounts of at-bats.

Probabilities have been discussed in this space before, per our earlier look into the likelihood that someone with such a historically poor record of reaching base like Pedro Feliz would shoot up to a .360 OBP through a third of the season. The methodology behind determining how likely it is for Mauer to finish the season at, or above, .400 is a tad different. Mauer entered this season with a specific projection, and in order to calculate the probability of an event occurring, we need to reconcile what has occurred so far with pre-season expectations. Mauer is not a true .373 hitter, but with the season past the halfway point, it can be safely assumed that his estimate moving forward has increased. In order to find the probability that he will hit .400 we need to first wed the current stretch and prior knowledge of his performance, with regression to the mean serving as best man, in order to determine his likely success rate in any given at-bat.

As with any form of regression to the mean, the more we know about a player, the less weight the regression toward the league average carries. Even with the knowledge of Mauer’s past performance, until he has an infinite supply of plate appearances we will never know if his exact talent level is an eHarmony match for his actual numbers. As more appearances are amassed, Mauer’s own statistics become more prominent, similarly to us being more comfortable in stating that a 1,000-for-3,000 player is more likely a .333-level talent than someone who went 1-for-3 in a single game, but the regression will always carry some weight. Players drastically above or below the mean in a particular area are likely better or worse than the average, but luck cannot be ignored as a factor. Regressing to the mean affords analysts the opportunity to bypass the luck factor to a certain extent, and gauge a player’s true level of effectiveness.

Research from Tom Tango showed that the recipe to find the true talent estimate for a player of whom no prior knowledge exists calls for the addition of approximately 400 league-average at-bats to the pot. With a league average of .269-right around the AL’s batting average since 2007-Mauer would add 107 hits out of 400 at-bats to his current totals of 90 and 241, respectively. Therefore, if we knew nothing about Mauer other than that he had gone 90-for-241 to date in a league that hits .269, his true talent level would be 197-for-641, or .307. For what it’s worth, PECOTA pegged him as a .307 hitter entering the season.

We do have prior knowledge about Mauer, though, given his marks of .347, .293, and .328 over the last three years. Progressively weighting the last three years introduces an additional 380 hits in 1173 at-bats. Therefore, to gauge Mauer’s talent level at this point in time, we add together 90-for-241, 380-for-1173, and 107-for-400 to ultimately arrive at .318. Given the league, his track record, and his current performance, Mauer is a true .318 hitter, but one with an actual mark of .373. If his current 90-for-241 stretch was eliminated and we merely utilized the other components, the resulting talent level hovers right around .309; Mauer entered the season as a true .309 hitter and has since jumped up to .318 with his fantastic data to date. Of course, with information such as his balls in play rates, age, position, and physical type we can get much more granular, but for the purposes of our look today, the above methodology works fine.

With the true talent in tow, the next step involves the determination of how many hits he would need in the remaining seasonal at-bats in order to hit .400. Due to previously mentioned playing-time constraints on his year, I am comfortable in estimating that Mauer will finish this specific season with right around 480 at-bats, meaning he would then need a grand total of at least 192 hits in order to boast a .400+ batting average; since Mauer has already gone 90-for-241, he will need to record 102 hits in his final 239 at-bats in order to hit .400. Incorporating the current .373 batting average proves confusing relative to appropriate standards because it is much more likely for a success rate of .373 to yield 102 successes in 239 chances. The issue is that the .373 merely acts as one of several ingredients in the ‘true talent’ stew; the .373 itself is not the sole determining factor in his likelihood of hitting .400, serving rather as a reviser of expectations.

The goal then becomes determining the probability that someone with a .318 success rate of getting a hit in a given at-bat would experience at least 102 successes out of 239 chances. Since we are interested in at least 102 successes in 239 chances as opposed to at most, it actually becomes easier to input the reverse-the probability of at most 137 failures in 239 chances. The probability of success would then be 1-0.318, or 0.682, making the formula equal to the cumulative distribution function: BINOMDIST(137,239,0.682,TRUE). The result will inform us of the likelihood that a player who makes outs in 68.2 percent of his at-bats would make no more than 57.3 percent outs in the total number of chances relative to this experiment. Once entered in, the formula outputs 0.000267, which rounds to a 0.0267 percent chance this occurs. Relative to odds, that equates to 3744 to 1, meaning that Mauer would be expected to hit .426-the percentage representation of 102-239-over the theoretically remaining 239 at-bats this year just once every 3745 such stretches. So, you’re saying there’s a chance!?

Now, if Mauer had been in action since the start of the season, holding all else constant, he would be more likely to finish the year with 600 at-bats. Running through the same formulas, Mauer, who starts the year at 90-for-241 in this hypothetical, would need to go 150-for-359 from that point on to end the season at .400 on the dot. The probability of at least 150 successes in 359 chances given a .318 success rate amounts to 4.222 x 10^-5, or 22,612:1. So, yeah, the lost at-bats certainly play into his favor, but the difference is essentially akin to comparing the likelihood Adam Eaton throws two straight no-hitters to his chances of throwing one period.

Nate Silver penned a post similar in content last season when Chipper Jones got off to his magical 92-for-219 start, running through a complex simulation in order to better replicate real-life working conditions, incorporating aspects such as the types of pitchers Jones would face. While such a simulation would be fantastic to utilize, we need not go to such great lengths in this instance, as the probability is going to remain incredibly low regardless, and it isn’t as if Mauer will face extreme opposites of pitcher quality in every trip to the dish. He is in the midst of a remarkable season, one that might finally convince those with voting privileges to give him more credit than Justin Morneau, but he is not going to hit .400, even with the decreased playing time that comes with a missed month and catcher rest patterns. This should not detract from his seasonal merits in any way, shape or form, but let’s not get too carried away with the .400 talk, especially given that he has already begun his slow climb down from that mark.

A version of this story originally appeared on ESPN Insider .

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Eric Seidman

Latest Articles

You need to be logged in to comment. Login or Subscribe

arcee555

7/17

Lets not get carried away. Joe Mauer would need approx 441 AB's to qualify for the batting title (assumes a 14% walk rate, his career avg.). Joe currently sits at 90 hits in 241 Ab's, to hit .3995 at 441 Ab's he needs to get 176 hits. Simply that means he would have to go 86 for 200 the rest of the season, a .430 avg. Ain't gonna happen. I dont care how big of a Joe Mauer fan you are.

Reply to arcee555

EJSeidman

7/17

Please read the article, not the little one sentence synopsis, before commenting.

Reply to EJSeidman

crafty35a

7/17

Love the Dumb and Dumber reference.

Reply to crafty35a

newsense

7/17

I think you underestimate Mauer's "true" level. If you take a Bayesian approach using his PECOTA as a prior, you get a "true" expectation based on his performance this year of a .332 batting average, which makes the probability of hitting .400 to be 0.135% or 1 in 740.

Reply to newsense

EJSeidman

7/17

There was a ton of debate about this on Tango's blog when Nate's article came out last year, but ultimately I feel that the regression approach utilized here is the most accurate method. But that is one of the great things about the numbers - you can adjust them in this case and re-calculate. I am much more comfortable in saying he is now a true talent .318 hitter with a .373 clip than using the PECOTA as a prior with a Bayesian approach. But it sort of boils down to sample size taste - you might be convinced that with his 2006-08 seasons and current numbers that Mauer really is a .330 hitter, which he very well may be. I might be more inclined to regress him to the mean more than simply accept the .330 as gospel.

Reply to EJSeidman

newsense

7/17

Not to reopen the debate, but the chance of a "true" .318 hitter hitting .373 over half a season is pretty low (less than 4%)zs opposed to about 10% for a .332 hitter.

Reply to newsense

EJSeidman

7/17

Right, I'm not debating that at all. He very well COULD be a true talent .325-.330 hitter, but in going through the appropriate regression to the mean methodology we arrive at .318 right now. I don't have as big an issue with Nate's Bayesian approach last year as some others did, but this is a different approach. No matter what approach you use, however, you are going to get something well below 1 percent.

Reply to EJSeidman

smokeyjoewood

7/17

Great article. Two questions (i'll post separately):

First, would an age weighting to the components of Mauer's true talent level be appropriate? He's 26 now and presumably right about at his peak given his athleticism at a skill position. Those input ABs came at age 23-25. Does BA develop significantly, like ISO, as a player peaks or is it independent? Put another way, if a 24 year old player hits .300 over 400 ABs, is that any more predictive of true talent level than if a 31 year old does so?

Reply to smokeyjoewood

smokeyjoewood

7/17

Second, rather than regressing with league-average ABs, why not regress using ABs at the BA his LD% predicts (or BA predicted by regressing his Line, Fly, Ground rates to the league averages)?

I had the same question when I read The Book - it seems that the Tango method, while statistically accurate over the entire population of hitters, will always under-estimate true talent level for good hitters - Jeter and Pujols and Mauer will see their "True Talent Level" go up a few points almost every season as they continue to out-hit their regressed projections until they've accumulated 10,000 ABs or so.

I'd love to see data on how career .300 hitters, or the top 10% of hitters in the past 3 seasons, or some other measure of who the very good hitters are, fare relative to Tango-style regressed projections. And whether using the LFG components to generate an average to regress towards is more predictive than a flat league-average.

Thanks again for the article, its interesting to compare it to Nate's approach in the Chipper article :)

Reply to smokeyjoewood

EJSeidman

7/17

With the second question first, are you referring to weighted regression towards some form of an expected BA based on the number of different balls in play hit? As in, if Mauer hit .347, .293, .328 from 2006-08, instead of adding in the 380-1173 (his weighted prior three seasons), find his EXPECTED HITS given the number of each batted ball multiplied by the 0.73, 0.24, 0.15 expected values, and divide that by the 1173?

That would certainly be a worthwhile study, as your second paragraph hit the nail on the head, or whatever that expression may be: Mauer MIGHT be a true .325-.330 hitter right now but the regression approach isn't going to agree until he gets more and more ABs at such a high level.

Reply to EJSeidman

smokeyjoewood

7/18

That's actually not what I was thinking, although I wish I had.. good idea to remove most of the luck from Mauer's historical ABs.

I was suggesting that the 107 for 400 (400 league average ABs) should be replaced with 400 ABs using his actual HR and K rates and a BABIP generated using his actual batted ball rates and multipliers. (Or regress each of his batted ball rates individually towards the league averages and use these modified rates to generate an "average Mauer" BABIP).

In other words, don't regress him towards a league average hitter when he has a skill set that is not league average. The first approach is effective in a sample of all hitters, but I suspect the latter approach would be as good overall and better at projecting performance in each quartile of hitters - it would allow very good hitters better projections, and bad ones (or TTO) worse.

Reply to smokeyjoewood

rowenbell

7/17

Very good, Eric.

It's nice to see a BP author be receptive to Tango's criticism of Silver's methodology from last year's Chipper article. Frankly, I think that type of openness only improves BP.

Reply to rowenbell

EJSeidman

7/17

Rowen, I agree... now trade me Chase Utley. Is there such a thing as Strat-tampering?

In all seriousness, being receptive and opening discussions is of tantamount importance to growing as an organization. Otherwise, an echo chamber is created wherein no new ideas can develop. This is what teams like the Royals and the Bavasi-led Mariners were essentially criticized for.

Reply to EJSeidman

ckahrl

7/20

Amen to that. ;)

Reply to ckahrl

sbnirish77

7/17

Hell ... Mauer is even putting up numbers comparable to Matt Wieters PECOTA projection ... so you know he must be good

Reply to sbnirish77

blcartwright

7/18

But then you would be regressing Mauer's historical stats to a different set of Mauer's historical stats...and how do you know that the estimate of BA (or BABIP) from batting ball components are any less lucky than the weighted historical record?

Reply to blcartwright

blcartwright

7/18

Sometimes when I click 'reply' it still puts the post at the very bottom...the above was in reply to

"I was suggesting that the 107 for 400 (400 league average ABs) should be replaced with 400 ABs using his actual HR and K rates and a BABIP generated using his actual batted ball rates and multipliers."

Reply to blcartwright

oneofthem

7/20

Was that bit about calculating the failure rate really necessary, since this is not a intro stats lecture.

Reply to oneofthem

EJSeidman

7/20

While you may be well-versed in it, others might not be, and some others may find it useful to learn that for their own "experiments."

Reply to EJSeidman

Checking the Numbers: MauerQuest!

Thank you for reading

Latest Articles

Box Score Banter: Dealin’ Dylan Does the Deed B

To Swing and Miss Less is Tough Business $

Do Sophomores Still Slump? $

The Heat Check: Loperfido Looms, Collier Crushing $

Has the Universal Designated Hitter Affected the DH Penalty? $

Eric Seidman

Latest Articles

Box Score Banter: Dealin’ Dylan Does the Deed B

To Swing and Miss Less is Tough Business $

Do Sophomores Still Slump? $