Believe it or not, most of our writers didn't enter the world sporting an @baseballprospectus.com address; with a few exceptions, they started out somewhere else. In an effort to up your reading pleasure while tipping our caps to some of the most illuminating work being done elsewhere on the internet, we'll be yielding the stage once a week to the best and brightest baseball writers, researchers and thinkers from outside of the BP umbrella. If you'd like to nominate a guest contributor (including yourself), please drop us a line.
Without further ado, let's kick off the series by extending a warm BP welcome to Jeff Sullivan. Jeff Sullivan has been writing about the Mariners since 2003, and has been running Lookout Landing since 2005. Additionally, he briefly ran Beyond The Box Score and serves as an editor at SBNation/MLB. He can be found in Oregon bars.
—
I’ve been granted the honor of going first in the brand-new ProGUESTus series, which is one of those things I probably find a lot cooler than any of you do. I don’t know why I was asked to lead off, since I was a pitcher in high school who wasn’t allowed to hit, but that’s more a problem with the manager, not me. That’s a baseball joke. All right, good start.
To much fanfare, Colin Wyers released the 2011 PECOTA projections earlier this week. PECOTA is probably the most well-publicized, well-known, and complex projection system on the planet, and so the date of its release is always one of the most exciting days in the early part of the year. When there isn’t any baseball, the best substitute is thinking about and analyzing the baseball to come, and PECOTA grants us that opportunity.
As soon as I heard the projections were available, I downloaded the spreadsheet and, like most other people with access to it, scrambled to find out what PECOTA thinks of Rajai Davis. (Not much.) This is where a lot of the fun with PECOTA lies. Once you get the numbers, you want to see what the system thinks about certain players. A common activity is to group the projections for players on your favorite team, figure out projected playing time, and then turn that into a projected record. That’s what we all care about, right? How many wins and losses our teams are going to end up with?
And that’s great. That’s PECOTA’s strength, and that’s PECOTA’s purpose. But the thing about PECOTA that I don’t think gets enough attention is the inclusion of player comparisons. As many of you know, the whole PECOTA system is built upon these comparisons, and for reader convenience, they’re included as part of the output. Scroll to the right of the 2011 spreadsheet and you’ll find a “Comparables” column. There you’ll see a selection of the players throughout baseball history to which the current player is being compared.
There’s fun to be had here. Jose Guillen as Julio Franco? Jason Bergmann as Shaun Marcum? Twenty-two different pitchers as Antonio Bastardo? The comps add some color and make the final product more entertaining. And they’re not without their analytical value, either. The strength of a given player’s comps, one figures, ought in theory to be directly related to the strength of a given player’s projection, and said strength, or confidence, or whatever you want to call it is shown here in parentheses after the comparable names.
So I’m here now to make an effort to get people to pay more attention to this part of PECOTA. That number in parentheses—Baseball Prospectus calls it “Similarity Index,” and it’s defined in the site glossary. The higher the number, the easier it is to find similar players with similar performances. The lower the number, the harder it is to find similar players with similar performances.
That’s good and sensible, but let’s go ahead and call it the "Ordinary Index" instead. Changing the name doesn’t change the meaning; it just makes it more descriptive. It follows, then, that the higher the number, the more ordinary the player, and the lower the number, the more extraordinary the player.
Who are the most ordinary and extraordinary players in baseball? Before, this would’ve been a difficult—if not impossible—question to answer. There are so many things to consider. But now we have an Ordinary Index. It’s right there in the name. All of a sudden, coming up with an answer to the question couldn’t be easier, and the results are shown below. I hope we can all learn a thing or two from this exercise.
—–
Most Ordinary Hitter, Minors: John Murphy (Ordinary Index: 88)
The Yankees’ second-round pick in the 2009 draft, Murphy is a right-handed catcher who last year posted a .703 OPS with A-ball Charleston. Ordinary from birth, his parents gave him the most common first name in the United States. Additionally, Baseball-Reference lists his middle name as “R.”, a sign that, while his parents understood the necessity of providing a middle name, they in no way intended to suggest that their son was in any way unique, so they opted for an all-encompassing initial.
Most Ordinary Hitter, Majors: Danny Worth (OI: 87)
Worth is a righty middle infielder who crawled his way up the ladder despite mediocre numbers and broke into the bigs with the Tigers last season. He’s fond of vanilla ice cream, the color blue, Jeff Dunham, hanging out with his friends, Bud Light commercials, and fast food french fries that are salty but not too salty.
Most Ordinary Pitcher, Minors: Mason Tobin (OI: 93)
Tobin has been drafted three times—once by the Braves in 2005, again by the Braves in 2006, and once by the Angels in 2007. The righty pitcher missed most of 2009 and all of 2010 after undergoing Tommy John surgery on his elbow. As a bullied teenager in middle school, all Tobin wanted was to be like everyone else. In time, he got his wish.
Most Ordinary Pitcher, Majors: Cesar Jimenez, Yhency Brazoban (Tie! OI: 91)
Former big league relievers Cesar Jimenez and Yhency Brazoban are so ordinary that they can’t even get a rank to themselves on a list. Jimenez is a lefty with an injury history and a high-80s fastball, and Brazoban is a righty with an injury history and a low-90s fastball. Oooh.
Most Extraordinary Hitter, Minors: Brandon Belt (OI: 57)
The Giants’ fifth-round draft choice in 2009, Belt broke into the professional ranks last season and, between A-ball San Jose, Double-A Richmond, and Triple-A Fresno, batted .352 with a 1.075 OPS. PECOTA had so much trouble finding comparable players that his third listed comp on the spreadsheet is someone named “Curt Blefary,” whom I’m pretty sure didn’t exist.
Most Extraordinary Hitter, Majors: Ichiro Suzuki (OI: 43)
The perennial bane of any and every projection system, Ichiro has continued to excel despite repeated statistical assertions that he would be Matty Alou. Ichiro is so historically unusual that he once consulted his dog before signing a contract extension.
Most Extraordinary Pitcher, Minors: Tyler Matzek (OI: 47)
Matzek was selected 11th overall by the Rockies in 2009 and handed a large signing bonus. Fresh out of high school, the lefty reported to A-ball Asheville and struck out a batter per inning while allowing a ton of walks. PECOTA projects a six percent chance of improvement and a one percent chance of collapse, meaning that PECOTA projects a 93 percent chance that Matzek either stays the same or gets a little bit worse. Thus, Matzek is evidently so extraordinary that PECOTA seems to think he ages at twice the normal rate.
Most Extraordinary Pitcher, Majors: Winston Abreu (OI: 32)
Abreu narrowly beat out Craig Kimbrel and Jamie Moyer for the honor. He hasn’t thrown a pitch in the majors since July 2009, but what makes him extraordinary is that, over his last three years as a reliever in Triple-A, he’s struck out 241 batters in 158.2 innings. The other thing that makes him extraordinary is that he’s a 6-foot-2, approximately-20-pound Dominican with half the name of a British prime minister.
—–
PECOTA, more than anything else, is a projection system, and like all the other projection systems, it’s popular for its statistical forecasts. But unlike all the other projection systems, PECOTA’s got more to it than a triple slash line. PECOTA’s an animal, and while you could simply remove the meat from its bones, doing so leaves so many other bits for which you could find a handy use.
Look at the player comps section. Savor the comps, and consider the Similarity Index (whatever you choose to call it). They’re included in part for your enjoyment. Enjoy them.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
And thank you, BP, for kicking off this column.
Sorry, but us older folk who know that baseball didnt start in 1990 expect a bit more.
Thanks for reminding me of the maturity level of too many of the readers of this website. I am amused.
As Brian DewBerry-Jones pointed out, it seems odd that the current version of PECOTA seems to compare minor leaguers only to major leaguers, rather than to other minor leaguers. This, perhaps, is why so many current minor leagues who are unlikely ever to rise as far as AA have as their top comps either Willie Mays or Mickey Mantle.
The comps for Carlos Santana are David Wright, Alvin Davis and Carl Yastrzemski. Players this year who have Gary Sheffield as a comp include Gordon Beckham, Bobby Abreu, Kosuke Fukudome, Shin-Soo Choo, Magglio Ordonez (who also has Stan Musial as a comp), Jacob Smolinski (who also has Ron Santo), and Carlos Beltran. Jay Austin's comps are Roberto Clemente, Robin Yount and Adrian Beltre.
Rich Poythress top comps are Adrian Gonzalez, Prince Fielder, Kent Hrbek. Clint Robinson's are Adrian Gonzalez, Joey Votto, Ryan Garko. Gerald Sands' are Prince Fielder, Willie McCovey and Mark McGwire. Matt Carpenter's top comp is Chipper Jones. Trent Oeltjen's comps are Carlos Beltran, Vada Pinson, and Roberto Clemente. Thomas Field's top comp is Tim Raines.
As for body type, it's rare for a players listed weight to be accurate, so that wouldn't be of any help. Having someone go through baseball history and label every single player as "thin," "stocky," etc. would be subjective, and just weird.
Anyone feel confident enough about the current version of PECOTA to take me up on either of those offers?
That doesn't mean PECOTA has no value. As with any system, its value depends on the user's expertise.
I'm not familiar enough with PECOTA to comment on what you perceive to be its limitations/shortcoming for players under 25 - but I'm certain those who are can ably defend its methodology and conclusions.
Let's take a particular example -- Matt Wieter's PECOTA projection from two years ago, which BP has admitted were based on faulty Davenport translations (as were all of PECOTA's minor league projections from the two minor leagues in which he played that year). How were those projections, based on faulty analysis, useful to anyone?
If, as I suspect, this year's PECOTA projections for younger players suffer from some similarly overarching flaw, how are those projections anything other than misleading?
Randall Butts, Charlie Smith, and Mark Merchant.
This year Adrian Gonzalez, Price Fielder and Kent Hrbek.
In between he did go all .315/.381/.580 on the California League, but is that really that unusual for the Cal League ? And it's not like he was young for the Cal League, he turned 23 in August.
There was very little data on him for 2009, and he was a 2nd round pick, but it feels like PECOTA is a touch excitable about guys in the minors.
(I'm writing up a blog post that explains a bit more about how we come up with the comps and the resulting age curves right now.)
It seems to me where we're heading with Colin's increasingly transparent explanations of what PECOTA does and doesn't do, is that a lot of its supposedly differentiating features like the comps sinply don't matter very much at all.
If changing a players comes from Butts, Smith, Merchant to Gonzalez, Fielder, Hrbek doesn't make much of a difference, then it's hard to buy the argument that the entire comps system makes much of a difference. As a result, what was once portrayed as a key feature could be revealed as nothing more than pretty bells and whistles.
We'll see though.
We have a 1B who at age 22, played well if the Calif. League and is projected as a below replacement level guy if he played in the majors at age 23.
At age 22, Hrbek hit .301/.363/.485 IN THE MAJORS in over 500 AB's.
Presumably there is some growth built into Poythress's 2011 projection which means if he is supposed to be below replacement level in 2011, he probably was in 2010 as well. So, how on earth can Poythress, a below replacement level guy have as one of his best comps Hrbek who was an All-Star and 2nd in Rookie of the Year voting at the same age ?
Fielder by the way hit .271/.347/.483 at age 22 in the majors. Gonzalez posted a .821 OPS in AAA at age 22. I don't think either would be classified as below replacement level.
Oh, one other thing, my understanding is that the comps only affect the arc of the player's career, given that we only have the weighted means for 2011, it seems unlikely that the comps would be having a HUGE impact on the results. IF there is a problem, my guess is that it would relate to the way the minor league translations are being done. For example, what would happen if the minor league run environment didn't change, but the ML did(which it has in recent years), how would that affect translations of minor league stats ?
That's not a statement that it is broken -- on the contrary, I would love for Colin to engage in this thread, or with jrmayne in the other thread, and explain why the hall-of-fame comps are correct, why Kila, Dan Johnson, and others are going to be huge this year, and why it's more likely that young stars like Longoria will regress with age rather than grow. For the same reason that it was useful to discuss the Wieters and McLouth predictions (as well as the Colby Lewis predictions), and what did and didn't make sense about them, it will be useful to engage in that conversation this time around -- as I am sure that Colin will do, once he's finished with the hard work he is putting in now to meet subscriber requests for player cards, PFM, etc.
If PECOTA's not broken, but has just reached a new level of insight, I'll be the first guy cheering as I ride to the league title with Dan Johnson at 1B and Kila hitting DH. I'd just like to understand if BP's really confident that those results are intentional, and not the result of something being broken in PECOTA.