Prior to the 2006 season during which Joe Mauer produced a gaudy .347/.429/.507 triple-slash line, no junior circuit catcher had ever won a batting title. Last season, Mauer once again found his name atop the batting-average charts thanks to a .328/.413/.451 showing, and this season he is currently in solid position for a third batting title by hitting at a .373/.447/.622 clip. The 26-year-old backstop has always been lauded for his plate discipline, gap power, and his defense behind the plate, but as a hitter he has kicked the gears into overdrive this season, reaching base and smashing the ball around at Pujolsian levels. After going hitless in four at-bats against the Astros on June 21, Mauer’s line rested at .407/.475/.727, naturally inviting speculation-similarly to that surrounding Chipper Jones at this time last season-as to whether or not the .400 batting average could be sustained over the season’s remaining balance.
Mauer’s playing time has thrown a wrench into the mix, however, mainly in that catchers naturally accrue lower tallies of plate appearances due to the rest that’s prescribed against the normal positional tear of catching, but also because he spent the first month of the season on the disabled list to boot. These playing-time issues are generally considered to be of benefit to the Twins catcher, as an extremely high batting average would be much tougher to sustain over larger amounts of at-bats.
Probabilities have been discussed in this space before, per our earlier look into the likelihood that someone with such a historically poor record of reaching base like Pedro Feliz would shoot up to a .360 OBP through a third of the season. The methodology behind determining how likely it is for Mauer to finish the season at, or above, .400 is a tad different. Mauer entered this season with a specific projection, and in order to calculate the probability of an event occurring, we need to reconcile what has occurred so far with pre-season expectations. Mauer is not a true .373 hitter, but with the season past the halfway point, it can be safely assumed that his estimate moving forward has increased. In order to find the probability that he will hit .400 we need to first wed the current stretch and prior knowledge of his performance, with regression to the mean serving as best man, in order to determine his likely success rate in any given at-bat.
As with any form of regression to the mean, the more we know about a player, the less weight the regression toward the league average carries. Even with the knowledge of Mauer’s past performance, until he has an infinite supply of plate appearances we will never know if his exact talent level is an eHarmony match for his actual numbers. As more appearances are amassed, Mauer’s own statistics become more prominent, similarly to us being more comfortable in stating that a 1,000-for-3,000 player is more likely a .333-level talent than someone who went 1-for-3 in a single game, but the regression will always carry some weight. Players drastically above or below the mean in a particular area are likely better or worse than the average, but luck cannot be ignored as a factor. Regressing to the mean affords analysts the opportunity to bypass the luck factor to a certain extent, and gauge a player’s true level of effectiveness.
Research from Tom Tango showed that the recipe to find the true talent estimate for a player of whom no prior knowledge exists calls for the addition of approximately 400 league-average at-bats to the pot. With a league average of .269-right around the AL’s batting average since 2007-Mauer would add 107 hits out of 400 at-bats to his current totals of 90 and 241, respectively. Therefore, if we knew nothing about Mauer other than that he had gone 90-for-241 to date in a league that hits .269, his true talent level would be 197-for-641, or .307. For what it’s worth, PECOTA pegged him as a .307 hitter entering the season.
We do have prior knowledge about Mauer, though, given his marks of .347, .293, and .328 over the last three years. Progressively weighting the last three years introduces an additional 380 hits in 1173 at-bats. Therefore, to gauge Mauer’s talent level at this point in time, we add together 90-for-241, 380-for-1173, and 107-for-400 to ultimately arrive at .318. Given the league, his track record, and his current performance, Mauer is a true .318 hitter, but one with an actual mark of .373. If his current 90-for-241 stretch was eliminated and we merely utilized the other components, the resulting talent level hovers right around .309; Mauer entered the season as a true .309 hitter and has since jumped up to .318 with his fantastic data to date. Of course, with information such as his balls in play rates, age, position, and physical type we can get much more granular, but for the purposes of our look today, the above methodology works fine.
With the true talent in tow, the next step involves the determination of how many hits he would need in the remaining seasonal at-bats in order to hit .400. Due to previously mentioned playing-time constraints on his year, I am comfortable in estimating that Mauer will finish this specific season with right around 480 at-bats, meaning he would then need a grand total of at least 192 hits in order to boast a .400+ batting average; since Mauer has already gone 90-for-241, he will need to record 102 hits in his final 239 at-bats in order to hit .400. Incorporating the current .373 batting average proves confusing relative to appropriate standards because it is much more likely for a success rate of .373 to yield 102 successes in 239 chances. The issue is that the .373 merely acts as one of several ingredients in the ‘true talent’ stew; the .373 itself is not the sole determining factor in his likelihood of hitting .400, serving rather as a reviser of expectations.
The goal then becomes determining the probability that someone with a .318 success rate of getting a hit in a given at-bat would experience at least 102 successes out of 239 chances. Since we are interested in at least 102 successes in 239 chances as opposed to at most, it actually becomes easier to input the reverse-the probability of at most 137 failures in 239 chances. The probability of success would then be 1-0.318, or 0.682, making the formula equal to the cumulative distribution function: BINOMDIST(137,239,0.682,TRUE). The result will inform us of the likelihood that a player who makes outs in 68.2 percent of his at-bats would make no more than 57.3 percent outs in the total number of chances relative to this experiment. Once entered in, the formula outputs 0.000267, which rounds to a 0.0267 percent chance this occurs. Relative to odds, that equates to 3744 to 1, meaning that Mauer would be expected to hit .426-the percentage representation of 102-239-over the theoretically remaining 239 at-bats this year just once every 3745 such stretches. So, you’re saying there’s a chance!?
Now, if Mauer had been in action since the start of the season, holding all else constant, he would be more likely to finish the year with 600 at-bats. Running through the same formulas, Mauer, who starts the year at 90-for-241 in this hypothetical, would need to go 150-for-359 from that point on to end the season at .400 on the dot. The probability of at least 150 successes in 359 chances given a .318 success rate amounts to 4.222 x 10^-5, or 22,612:1. So, yeah, the lost at-bats certainly play into his favor, but the difference is essentially akin to comparing the likelihood Adam Eaton throws two straight no-hitters to his chances of throwing one period.
Nate Silver penned a post similar in content last season when Chipper Jones got off to his magical 92-for-219 start, running through a complex simulation in order to better replicate real-life working conditions, incorporating aspects such as the types of pitchers Jones would face. While such a simulation would be fantastic to utilize, we need not go to such great lengths in this instance, as the probability is going to remain incredibly low regardless, and it isn’t as if Mauer will face extreme opposites of pitcher quality in every trip to the dish. He is in the midst of a remarkable season, one that might finally convince those with voting privileges to give him more credit than Justin Morneau, but he is not going to hit .400, even with the decreased playing time that comes with a missed month and catcher rest patterns. This should not detract from his seasonal merits in any way, shape or form, but let’s not get too carried away with the .400 talk, especially given that he has already begun his slow climb down from that mark.
A version of this story originally appeared on ESPN Insider .
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.Subscribe now
First, would an age weighting to the components of Mauer's true talent level be appropriate? He's 26 now and presumably right about at his peak given his athleticism at a skill position. Those input ABs came at age 23-25. Does BA develop significantly, like ISO, as a player peaks or is it independent? Put another way, if a 24 year old player hits .300 over 400 ABs, is that any more predictive of true talent level than if a 31 year old does so?
I had the same question when I read The Book - it seems that the Tango method, while statistically accurate over the entire population of hitters, will always under-estimate true talent level for good hitters - Jeter and Pujols and Mauer will see their "True Talent Level" go up a few points almost every season as they continue to out-hit their regressed projections until they've accumulated 10,000 ABs or so.
I'd love to see data on how career .300 hitters, or the top 10% of hitters in the past 3 seasons, or some other measure of who the very good hitters are, fare relative to Tango-style regressed projections. And whether using the LFG components to generate an average to regress towards is more predictive than a flat league-average.
Thanks again for the article, its interesting to compare it to Nate's approach in the Chipper article :)
That would certainly be a worthwhile study, as your second paragraph hit the nail on the head, or whatever that expression may be: Mauer MIGHT be a true .325-.330 hitter right now but the regression approach isn't going to agree until he gets more and more ABs at such a high level.
I was suggesting that the 107 for 400 (400 league average ABs) should be replaced with 400 ABs using his actual HR and K rates and a BABIP generated using his actual batted ball rates and multipliers. (Or regress each of his batted ball rates individually towards the league averages and use these modified rates to generate an "average Mauer" BABIP).
In other words, don't regress him towards a league average hitter when he has a skill set that is not league average. The first approach is effective in a sample of all hitters, but I suspect the latter approach would be as good overall and better at projecting performance in each quartile of hitters - it would allow very good hitters better projections, and bad ones (or TTO) worse.
It's nice to see a BP author be receptive to Tango's criticism of Silver's methodology from last year's Chipper article. Frankly, I think that type of openness only improves BP.
In all seriousness, being receptive and opening discussions is of tantamount importance to growing as an organization. Otherwise, an echo chamber is created wherein no new ideas can develop. This is what teams like the Royals and the Bavasi-led Mariners were essentially criticized for.
"I was suggesting that the 107 for 400 (400 league average ABs) should be replaced with 400 ABs using his actual HR and K rates and a BABIP generated using his actual batted ball rates and multipliers."