May 31, 2009
Prospectus Idol Entry
You Can Beat PECOTA Without a Computer Model
"You... can be a millionaire... and never pay taxes! You can be a millionaire... and never pay taxes! You say... 'Steve... how can I be a millionaire... and never pay taxes?' First... get a million dollars."
--Steve Martin, Saturday Night Live, 1978
I can't tell you how to be a millionaire without paying taxes, but I can tell you how to beat PECOTA without a computer model. First, get the PECOTA projections. Here, I will explain how you can beat PECOTA once you do.
Many of the people in your fantasy league have heard of PECOTA and use its projections while drafting. Certainly many of the people who win their leagues do. So if almost everyone has a mini version of Nate Silver with them on draft day, how can you get an edge?
The key is to stay one step ahead by knowing what PECOTA's strengths and weaknesses are. But let's make one thing clear-PECOTA is very smart. It knows a lot of things that the naked eye does not. It goes through a hundred years worth of baseball players, finds the guys that are most similar to the player it wants to project, and generates a projection. You and I don't have that kind of memory. However, we can incorporate some information that PECOTA can't, and that give us an advantage over our competition.
WHAT PECOTA DOESN'T KNOW
A lot of my personal research has been on batted ball statistics, such as groundball, flyball, and line drive rates, as well as BABIP on each of these three types of batted balls. I have found that this information can be incredibly useful in projecting player performance, but here's something you probably don't know about PECOTA-as strong as it is, it does not use batted ball statistics. That's not an error, but a sacrifice that had to be made. There are no records of batted ball statistics for the 1950's or consistently measured statistics for the 1980's, nor are there records of BABIP on groundballs, flyballs, and line drives. That type of information is only available from 2003 and on. To use that information, PECOTA would need to sacrifice choosing from hundreds of thousands of pre-2003 player-seasons to find comparables and would fail miserably.
However, you do have that information at your disposal, and you can use it to your advantage. If you find the more recent players in PECOTA's list of comparables for a particular player, you can check some of their basic statistics and compare them with the player in question. If you do this, you can improve on PECOTA, resulting in better player projections, and a better chance at winning your fantasy league.
In my recent research, I developed a quick and dirty method to project BABIP and projected 277 players for 2009. Of those with sufficient plate appearances, my quick and dirty method has a .45 correlation with BABIP so far this year, and PECOTA has a .42 correlation with BABIP this year. Though this is clearly too small a sample size to judge conclusively, it's worth noting that I have done as well on that component using a model that includes no comparables, doesn't adjust for age, doesn't adjust for position, doesn't adjust for handedness, and only uses data starting in 2003.
This isn't to say you should use my model in place of PECOTA for BABIP. Rather, what you should do is use a hybrid of the two by looking at batted ball statistics to see where PECOTA is being tricked and can be adjusted.
Geovany Soto and BABIP on line drives, groundballs, and flyballs:
PECOTA's projected BABIP for Soto: .334
My projected BABIP for Soto: .306
Current 2009 BABIP for Soto: .270
On the heels of Soto's .337 BABIP in 2008, PECOTA projected him to repeat that number this year. However, what PECOTA doesn't know is that last year Soto's BABIP on line drives was .805, which is .087 points above the 2008 league-wide average. You might expect something like that out of a monster power hitter, but not Soto. In fact, line drive BABIP has far lower year-to-year correlation than groundball BABIP and flyball BABIP (.11 for LD-BABIP, .32 for GB-BABIP, and .32 for FB-BABIP), so we should expect Soto's inflated BABIP on line drives to regress to the mean. If Soto had a league average LD-BABIP, he would have had a BABIP of .313. Comparing him to his most recent top comparable, Chris Shelton, shows that Shelton has a career .334 BABIP but due to a high line drive rate of 23.1% (Soto's is 20.5%). Shelton does not have the high BABIP on line drives that Soto had last year (Shelton's was .746 for the comparable year, and Soto's was .805). PECOTA doesn't know Soto and Shelton differ in this way, which explains why it thought Soto's BABIP luck would stick when it was much more likely to disappear this year.
Jeff Francoeur and infield fly rate:
PECOTA's projected BABIP for Francoeur: .307
My projected BABIP for Francoeur: .286
Current 2009 BABIP for Francoeur: .271
Franceour's most recent comparables (the ones for whom there are infield fly data) are Paul Konerko and Torii Hunter. They have 11.9% and 13.9% career infield fly rates, respectively. Francoeur's is 15.4%, which is far higher than the MLB average of about 10% and both Konerko's and Hunter's. PECOTA is right to compare him to players with above average infield fly rates, but even those players' rates are not as high as Francoeur's. It is extremely difficult to have a .307 BABIP when you hit so many infield flies, since they are so easy to catch. This means that you can scale back his projected average and his runs and RBIs as well.
Luis Castillo and groundball rate
PECOTA's projected BABIP for Castillo: .300
My projected BABIP for Castillo: .324
Current 2009 BABIP for Castillo: .322
Castillo is an excellent example of the importance of groundball rate. For his career, Castillo's groundball rate is 64%. That's probably about 22% more grounders and 22% fewer flyballs than the average player. Statistically, groundballs are hits about 10% more than flyballs. This translates to roughly 2.2% more hits per balls in play, or .022 of BABIP. Castillo's top two comparables are Mark McLemore and Tom Herr, who had 47% and 49% career groundball rates respectively but not 64% like Castillo. A higher groundball rate than his comparables means Castillo should outperform his PECOTA BABIP projection this year.
Michael Young and line drive rate
PECOTA's projected BABIP for Young: .323
My projected BABIP for Young: .342
Current 2009 BABIP for Young: .366
Michael Young has a career line drive rate of 25.1% (MLB average is 20%). Looking at his comparables, we see that they do not have similar line drive rates. Jeff Cirillo's is 19.3%, Edgar Renteria's is 22.7%, and Mark Grudzielanek's is 23.6%. Having 5% more line drives than the average hitter, holding everything else constant, is going to lead to a .030 higher BABIP. Line drive rate has less persistence than most people think. It only has a correlation of about .17. Groundball/flyball ratio has a correlation of .77. However, if certain players are line drive rate standouts every year, then unless their comparables are too, PECOTA will underestimate them as they have with Young.
PECOTA projections are some of the best tools fantasy baseball players can use. But you can combine PECOTA and a little bit of extra knowledge to beat competitors that rely on PECOTA alone. The key is to know how the system is created and how its deficiencies can be exploited.
For hitters, there are a few key things to look for that can help you identify overestimated and underestimated projections.
Check baseball-reference.com to see whether a young player's BABIP was high due to high BABIP on line drives or high BABIP on groundballs or flyballs. If a player's BABIP on line drives was significantly different than .720, then he may be due for a regression to the mean, and PECOTA may not know this.
PECOTA doesn't know a player's infield fly rate. Go to fangraphs.com, and check it out. Average infield fly rate is about 10%. If his recent comparables have very different infield fly rates than him, that's a sign his projection might be off.
PECOTA doesn't know a player's groundball rate. Check out if there is a major difference between his groundball rate and his comparables using fangraphs.com (or calculate it from baseball-reference.com).
PECOTA doesn't know a player's line drive rate. If a player consistently puts up high line drive rates and his comparables don't, then PECOTA will probably underestimate him.
The Steve Martin quote at the beginning says you can be a millionaire and never pay taxes, but the catch is that you first need a million dollars. PECOTA is your million dollars in this case, and you already have it. But, PECOTA has taxes of its own-shortcomings due to limited information. With the tips here, you won't need to pay them.
Matt Swartz is an author of Baseball Prospectus.
Click here to see Matt's other articles.
You can contact Matt by clicking here