I recently heard through the BP grapevine that a prominent figure within the Reds organization has been giving us a hard time about Wily Mo Pena‘s PECOTA projection. The problem, it seems, is not that the initial projection was too optimistic–Pena’s current .285 Equivalent Average matches his 60th percentile projection–but that we were too quick to disclaim it before the season began. Pena’s projection, in fact, became something of a running joke within the group, and just as surely outside of it.
I think it’s a mistake to assume that your tools are smarter than you are. At the same time, the advantage of an objective system of measurement–be it VORP, PECOTA, or, hell, SAT scores–is that it prevents you from being fooled by biases that you might not even be aware of. Prior to this season, the name “Wily Mo Pena” conjured up an image of a young, chubby ballplayer with terrible plate discipline and a goofy name, someone had been a “prospect” for seemingly forever (a friend of mine drafted him in a Scoresheet Baseball league way back in 1999), and who was only in the big leagues as the result of an ill-advised contract that had been conceived years ago. It’s easy to be dismissive of this sort of player; it seems like there are hundreds of them who have come and gone over the years. At one time or another, we’ve all been fooled by a big performance in a hitters’ park by a player repeating the Southern League, or a hot run by an old rookie during a September cup of coffee.
But PECOTA saw some things that it liked in Wily Mo Pena, some strengths lurking amidst all the negatives. Let me talk a little bit about those strengths; why did PECOTA like Pena so much better than any rational observer might have?
- Power and youth. PECOTA pegged Pena as having isolated power equivalent to that of an average major leaguer. This is, more or less, a straight application of the equivalent statistics prepared by Clay Davenport and Keith Woolner. We can come up with an Equivalent Isolated Power (EqISO) statistic for Pena by subtracting his Equivalent Batting Average (EqBA) from his Equivalent Slugging Percentage (EqSLG). Here were his figures at various levels over the three seasons previous to this one:
Level AB EqISO 2001 Dayton (A) 511 .179 2002 Chattanooga (AA) 388 .147 2003 Louisville (AAA) 51 .269 2003 Cincinnati (MLB) 165 .143 2003 Combined 216 .173 MLB AVERAGE .170
A league average EqISO figure is .170. Pena did a little bit better than that as a 19-year-old in A ball, a little bit worse as a 20-year-old in Double-A, and matched the .170 almost exactly when combining his 2003 performances at Louisville and Cincinnati. One thing that’s worth remembering is that, if we’re seeking to represent power production, isolated power is a far more reliable metric than slugging average is. Pena hadn’t hit for good batting averages, and that’s a bad thing, but we shouldn’t allow the low slugging averages those BAs create cloud our perception of his power.
A player who demonstrates an ability to hit for a major-league average level of power at the age of 21 has a very good chance to have a solid major-league career, unless he has absolutely nothing else going for him. (Pena doesn’t have a hell of a lot else going for him, but PECOTA found a few other things that it liked; we’ll get to that in a moment.) The reason is that power ability continues to develop rapidly throughout a player’s early twenties in a way that the other statistical categories do not. (Walk rate also increases throughout a player’s career, but doesn’t show the rapid growth that power output does.) If a player is capable of hitting for a league-average level of power at age 21, there’s a very good chance that he’ll be capable of hitting for an elite level of power by the time that he’s 24 or 25.
I should stop for a moment to discuss what I call the Sean Burroughs Exception. This power thing is not an equal opportunity phenomenon; while a young player with good power has a strong chance to develop great power, and a young player with okay power has a strong chance to develop good power, the odds are against a player who has displayed almost no power from ever becoming a significant power threat. Schematically, we’re looking at something like this:
Notice that the gap between the plus power hitters and the minus power hitters tends to widen, rather than diminish, as the player ages. Of course there are exceptions on both sides of the coin–Burroughs devotees were quick to cite the example of George Brett–but for the most part a player needs to have a little bit of power to begin with in order to get on a favorable growth curve.
- Secondary characteristics, e.g. “athleticism”. One of the ironic aspects of PECOTA is that it rewards certain characteristics that statheads have long derided. PECOTA measures things like height, weight and speed through statistics rather than subjective observation, but its conclusion is the same one that scouts have reached for years: bigger, taller, faster players have a better chance of developing favorably than their shorter, scrawnier, slower counterparts. Wily Mo Pena is no Milt Cuyler, but the system does figure him for about league-average speed, and his big frame makes PECOTA more confident that his power will develop.
- Strikeouts. Strikeout rate can have a material impact on the development pattern that we can expect from a player. However, the impact isn’t always negative. In fact, in retooling PECOTA this past winter, I discovered a positive predictive relationship between strikeout rate and power output. That is, a player with a higher strikeout rate, all else being equal, is expected to produce more home runs going forward than a player with a lower strikeout rate. Strikeout rate has the opposite effect on base hits, diminishing a player’s projected batting average, but for a player like Pena, whose value derives from his power, the higher strikeout rate has a positive impact on his forecast.
It’s possible, I suppose, that the effect of strikeout rate is non-linear; that is, a reasonably high strikeout rate is more favorable for a power hitter than a relatively low strikeout rate, but an extremely high strikeout rate, such as Pena has, is not favorable. As much as I played around with PECOTA’s formulas, however, I could not find any evidence of this, and many of the players who had very high strikeout numbers and very optimistic PECOTAs–Pena, Adam Dunn, Hee Seop Choi–have had outstanding seasons.
That still leaves open the question of just how and why a high strikeout rate can be a good thing. I think the best way to think about it is not as a negative–how often does this guy fail to make contact?–but rather as a positive: what does this guy do when he does make contact? Pena, in an otherwise very poor season with the Reds last year, hit .321 and slugged .527 when he didn’t strike out. Compare that to Nomar Garciaparra, who hit .332 and slugged .578 when not striking out, or Brian Giles (.339/.583). Pena’s numbers aren’t quite as good, but the differences aren’t nearly as profound as the pre-strikeout figures are.
As it happens, Pena’s breakout this season has come without a reduction in his strikeout rate, which raises some questions about whether he can sustain the production. If Pena is able to reduce his strikeouts, the rewards could be considerable, because the ball goes a long way when he hits it.
- Ballpark effects. A final factor in Pena’s optimistic projection is the shape of the ballpark effect in Cincinnati. Great American Ball Park, at least in its first season, tended to increase performance in the power categories while reducing batting average. This fact is helpful to a player like Pena, whose value is concentrated in his power; some players are better “suited” to certain ballparks than others. Pena’s weighted mean EqA projection, for example, would have been about eight points lower if his home park were Comerica Park, which reduces power but increases batting average, rather than GABP.
So there you have it. There’s no black magic in PECOTA, only a lot of science and hard work. Sometimes a number of factors come together to produce a forecast wherein a player is predicted to perform radically better or worse than he has in the past. We’re not going to bat 100 percent with those, but we hope to get more of them right than wrong, and so far the Wily Mo Pena projection is looking pretty good.