CSS Button No Image Css3Menu.com

Baseball Prospectus home
Click here to log in Click here for forgotten password Click here to subscribe

<< Previous Article
Prospectus Q&A: Ken Bu... (09/27)
No Previous Column
Next Column >>
Reintroducing PECOTA: ... (09/28)
Next Article >>
Premium Article Another Look: Baseball... (09/28)

September 27, 2010

Reintroducing PECOTA

What A Long, Strange Trip It Has Been

by Dave Pease

Welcome to PECOTA week here at Baseball Prospectus. All week, we'll be running content on the state of our projection system, covering where we're at and where we're going. To kick things off, let's pull back the curtain and have a look at the history of PECOTA production, which should answer a lot of questions readers have asked.

The original PECOTA process, as long-time readers know, was designed by Nate Silver and first offered at Baseball Prospectus when we went to a subscription model in 2003. The basis of the system was groundbreaking for the time: similarity modeling of all possible comparables in baseball history to determine likely future performance, separated into percentile bands. We heard from people who just loved that they could take guys who "they had a feeling about" and kick them up or down the percentile ranges a little, while taking the weighted mean for players they didn't have a feel for. We had some neat graphs on the PECOTA cards which helped break out of the sea-of-tables presentation that was common for player cards of the day. BP staffers and subscribers alike had fun using PECOTA projections to do mean things to their fantasy leagues and predicting, quite seriously, that Nate had a future in PR or politics.

Behind the scenes, the PECOTA process has always been like Von Hayes: large, complex, and full of creaky interactions and pinch points. It started with Clay Davenport and Keith Woolner delivering Nate the data he needed, which took some time to compile after every season. Once Nate got the data, he picked up a case of Red Bull. Then he took that data and built constants and preprocessed the dataset with STATA, which took days for each iteration. He took that output and loaded the results into the heart of PECOTA—a Rube Goldberg contraption of an Excel spreadsheet. He'd finish making his changes to the PECOTA methodology for the year to the spreadsheet, kick off the macros, and generate the output on a player-by-player basis on his laptop. Just the Excel portion of the processing took days, barring a memory leak or computer crash that might necessitate starting the entire Excel process over. The output would be a monster CSV and a bunch of image files that I'd run a Perl script on to build the PECOTA cards. The output seemed to change in small ways every year, so the script had to change to accommodate those. For my part, I wrote the card-building script in late 2002, and just thinking of looking at the code today makes me cringe.

Every year, there would be errors and omissions, which really isn't surprising for a system of this level of complexity. With the system constructed as it was, though, we were especially ill-prepared to fix them because of the turnaround time, and because Nate couldn't use his computer while it was steaming through its Excel gyrations. Nate didn't own another computer, and he was writing for and managing the operations of Baseball Prospectus during most of the PECOTA generation time, so even if he didn't have any other interests or hobbies online, this was a problem. One obvious avenue of relief would have been to put the process on dedicated, non-laptop-form-factor hardware, but there are people in this world who think nothing of configuring, maintaining, and using multiple computers in their homes, and Nate Silver, who has often led off discussions about the PECOTA process with "now, I'm not a programmer," is not one of these people.

The numbers crunching for PECOTA ended up taking weeks upon weeks every year, making for a frustrating delay for both authors of the Baseball Prospectus annual and fantasy baseball players nationwide. Bottlenecks where an individual was working furiously on one part of the process while everyone else was stuck waiting for them were not uncommon. To make matters worse, we were dealing with multiple sets of numbers. The 'official' Baseball Prospectus statistics lived on our database server by the middle of the decade, in permutations and schema originally designed by Keith Woolner. The Davenport Translations and many of the eventual inputs to PECOTA came from Clay Davenport, who has his own statistics, processes, and player identification scheme. Like a Bizarro world subway system where texting while drunk is mandatory for on-duty drivers, there were many possible points of derailment, and diagnosing problems across a set of busy people in different time zones often took longer than it should have. But we plowed along with the system with few changes despite its obvious drawbacks; Nate knew the ins and outs of it, in the end it produced results, and rebuilding the thing sensibly would be a huge undertaking. We knew that we weren't adequately prepared in the event that Nate got hit by a bus, but such is the plight of the small partnership.

Nate didn't get hit by a bus, but he did get crazy famous—you might have heard—and that was close to the same thing as far as a predictable and orderly PECOTA generation process went.

The 2009 season was a tough year for us PECOTA-wise. There was the infamous Matt Wieters projection, but we’ll have more on that later in the week. From a process standpoint, we continued to use the original spreadsheet, but it took even longer than usual to get the projections run considering Nate's other obligations. We got the code running on a dedicated machine, but the lack of organizational expertise in the PECOTA generation process gave away the processing time advantage and then some. We ended up giving Fantasy subscribers a free upgrade to Premium for the delay.

As the season progressed, we had some of our top men—not in the Raiders of the Lost Ark meaning of the term—look at the spreadsheet to see how we could wring the intellectual property out of it and chuck what was left. But in addition to the copious lack of documentation, the measurables from the latest version of the spreadsheet I've got include nice round numbers like 26 worksheets, 532 variables, and a 103MB file size. The file takes two and a half minutes to open on this computer, a fairly modern laptop. The file takes 30 seconds to close on this computer. There's some color coding, and a few notes, but you're not going to sit down with a nice cup of tea and pick this thing up in an afternoon. More than one of the big brains on the team threw up their hands while saying uncle. Finally, Clay Davenport stepped up and, essentially by himself, produced PECOTAs based on the logic from the original spreadsheet, and Baseball Prospectus 2010 was saved. We thought we’d reached the promised land.

Then January and February rolled around, and we still didn't have PECOTA cards. The complexity of generating the multi-year projections and producing the expanded output of the player cards, versus just the book projections, was proving to be a much more difficult problem to solve, and the well-documented issues we were having re-rolling the depth charts processes from scratch were just screwing things up further. Clay works in Fortran, and those on staff with Fortran experience didn't want to admit how long ago we last used it because we've gotten self-conscious about sounding old, so collaborative problem-solving wasn’t going to happen. Clay was still working with his own data on his own systems, and the linkages between our database server and the PECOTA data were as shifty and error-prone as ever. Worst of all, even if everything was working tip-top, all we'd done is switch victims in the Murphy's Law BP-staffer-getting-hit-by-a-bus scenario.

We eventually produced a release of our standard Fantasy package, and there were some tantalizing big-picture advantages to the new PECOTAs versus the old—the more automated process and better integration with Clay's raw stats meant we could run PECOTA projections and cards for over twice as many players as we did with the Excel process, for example. Still, it was late, there was understandable uncertainty about the product, and we ended up giving Fantasy subscribers the free Premium upgrade and extended Premium subscribers by a month for the trouble... and it was such a hectic time I don't think we actually announced this. Enjoy the free baseball coverage, folks; when we screw up, we try to make things right, and the suits can't stop us from doing it because we are the suits.

We’ve continued to push out PECOTA updates throughout the 2010 season, but we haven’t been happy with their presentation or documentation, and its become clear to everyone that its time to fix the problem once and for all. The year 2003 seems like an eternity ago; we’ve undergone a huge amount of change since then, and so has the competitive marketplace for baseball analysis. We want PECOTA to be hands-down the best baseball performance projection system in the world, and over the next few days we’re going to break down what we’re going to do—and what we’ve already done—to get there. Stay tuned.

Dave Pease is an author of Baseball Prospectus. 
Click here to see Dave's other articles. You can contact Dave by clicking here

42 comments have been left for this article. (Click to hide comments)

BP Comment Quick Links


PECOTA week! Awesome. Let me take this opportunity to request a 2011 feature where PECOTA projections of prospects are analyzed, critiqued, and compared with Kevin Goldstein's projections. Pretty please?

Sep 27, 2010 06:16 AM
rating: 8
Patrick M

A couple of years back, when Nate described his process for cranking out the PECOTAs, I offered my assistance in trying to optimize the macros in return for a free subscription. Nate seemed interested, and said he'd get back to me, but he never did.

Based on Dave's recounting above, I am really, really glad Nate never got back in touch :)

Sep 27, 2010 06:18 AM
rating: 7

Thanks for the history. Puts things into perspective. That said, if anyone ran their baseball teams the way you guys ran this process you guys would have been all over them calling for their heads!

Sep 27, 2010 06:47 AM
rating: 3

Of course, if BP had the resources of a baseball team, it would be quite a different organization to begin with. Above all though, particularly with this community and online technology, transparency is a powerful tool in the customer service game.

Sep 27, 2010 09:49 AM
rating: 1

As a guy who built a fantasy draft day database completely around the PECOTA spreadsheets, I would really like to continue using PECOTA instead of having to start over with something else.

Progress is good, thanks for keeping us in the loop.

Sep 27, 2010 08:32 AM
rating: 1
Shaun P.

Fortran? Really?


My respect and admiration for Clay Davenport just doubled.

And I say that with all the respect of someone who came thisclose to taking a class on Fortran, before at the last minute, my college decided that it really was better to teach future engineers and computer scientists C/C++.

Sep 27, 2010 09:21 AM
rating: 2

Now, I'm not a programmer, but I would like to know how NAte Mclouth's projections could get so wrong.

Sep 27, 2010 09:26 AM
rating: -2
BP staff member Colin Wyers
BP staff

Well, that one's pretty easy. Looking at TAv:

2007: 0.292
2008: 0.298
2009: 0.285
PECOTA Weighted Mean: .295

PECOTA was basically saying that a guy at age 28 (where the age curve is typically pretty flat) is going to do about what he's done the past three seasons. I mean, that ain't what I'd call sticking your neck out. And any other projection system I've seen said roughly the same thing.

The only trouble is Nate McLouth didn't cooperate. And it may be a while before we know if it was just him having one bad season or if he's the newest Marcus Giles.

Sep 27, 2010 09:53 AM

He definitely didn't cooperate. I'm just glad I dropped him before the Braves did.

Sep 27, 2010 10:13 AM
rating: -2

That concussion didn't help matters either. Let's hope for an Aaron Hill style resurgence next year.

Sep 27, 2010 13:28 PM
rating: 0

Shandler cautioned against Mclouth, but I ignored him to my peril. To be more specific, I understand how he failed to fall in the weighted mean projection, but how many others fell outside their outlier 10 or 90% projections? Anything to be gleaned from that?

Sep 28, 2010 11:07 AM
rating: 0

I won't be satisfied unless it can make my toast in the morning.

Sep 27, 2010 09:27 AM
rating: 6

and what about the pony? I thought we were all getting PECOTA ponies? :-)

Sep 27, 2010 10:22 AM
rating: 2

Just slap a digital clock on it and call it the "New and Improved" PECOTA.

Sep 27, 2010 10:50 AM
rating: 0

As my co-worker often points out, it's impossible to be both new and improved.

Sep 27, 2010 12:34 PM
rating: 4

As one of the pitchfork-and-torch wielders from March Madness this year, I'm glad to see the effort's being made. The proof, as ever, will be in the results (I don't subscribe to Shandler's accuracy-is-irrelevant theory).

I look forward to the remaining articles. As ever, the proof will be in the results - I think PECOTA, once in front, now has a little catching up to do, and I think CHONE is going to try to stay ahead - but this should be fun.


Sep 27, 2010 11:29 AM
rating: 5

Perhaps we could just re-name it Bloomquist after everyone's favorite replacement player :)

Sep 27, 2010 11:33 AM
rating: 2

Just not Miles, please ... ANYTHING but Miles ...

Sep 27, 2010 12:22 PM
rating: 2

Transparency in any sort of commercial venture is so completely rare, that I just want to thank you all for your honesty. I’ve had a hard time figuring out what separates BP from some other sites (aside from KG and the great analysis) but the level of honesty with your readers might just be that separating factor.

Sep 27, 2010 11:37 AM
rating: 11

I've used PECOTA since 2007 in my drafts. the debacle in 2009 was so bad that I ignored PECOTA completely in 2010 in favor of other systems which I could trust a whole lot more. I will again ignore PECOTA in 2011, see how it does, and potentially use it in 2012.

I mean, I'm glad you guys are taking the time to really fix this thing, but trust is a hard thing to earn back, and perhaps even more importantly the CHONE's and ZIP's of the world have become solid and free contenders

Sep 27, 2010 11:47 AM
rating: 4

Thanks for the extra month on my premium subscription. Thanks even more for the candor.

I do have a quibble though. Late PECOTA projections last spring was a major problem. However, an even bigger problem was poor communications with subscribers about (i) when various PECOTA related projects were going to be released or fixed and (ii) what the quality problems were. Just admitting that you should have put the word "BETA" on the projections for a couple months doesn't begin to scratch the surface of the problems. You may consider this ancient history, but I read today's article and I question whether the miscommunication and noncommunication lessons were learned.

Sep 27, 2010 12:02 PM
rating: 6
BP staff member Dave Pease
BP staff

I didn't mean to give the communication problems we had this year the short shrift. Colin will be enumerating specific changes to our production process going forward this week which I believe will resolve your concerns, but please do let us know after you see what he has to say.

Sep 27, 2010 22:42 PM

Yes, my fantasy teams have suffered enough over the past 2 years. I think I'll give CHONE a chance next year.

Sep 27, 2010 12:31 PM
rating: 3
Morris Greenberg

You can't solely rely on PECOTA. Like any projection, there will always be error. Mark Normandin doesn't only do his fantasy projections based off of PECOTA. If he finds a player whose projections seem quite high, he may rank the player higher than he would have before he saw the projections, he won't automatically put him somewhere based on his numbers. There is no point to playing fantasy baseball if the owner doesn't put any input into his draft.

Sep 27, 2010 13:08 PM
rating: 2

...and there is no point is using a projection system as an input unless it is expected to be one of the very best available. A state-of-the-art algorithm from five years ago, awkwardly dragged through several obsolete/inappropriate technologies while putting little effort in actually improving the algorithm, is not necessarily one of the best available inputs in 2011.

That said, I look forward to future installments of this series to see if sincere and thorough efforts to improve the system are in process.

Sep 27, 2010 15:50 PM
rating: 0
BP staff member Colin Wyers
BP staff

Tomorrow, you're going to see a number of accuracy tests of the 2010 PECOTAs, compared to some of the other forecasting systems available. I'll let those results speak for themselves.

What I don't believe in is the Ron Shandler attitude that projection accuracy doesn't matter. As long as I'm the guy running the PECOTAs (and right now, I am that guy) we're going to be forthright about what our performance has been and diligent in making sure that we're doing the best we can going forward.

Sep 27, 2010 16:14 PM

Shandler's position seems more subtle than that. He says for fantasy purposes exact ordinal listing of each player is not necessary as injuries/luck/playing time fudge the margins such that baskets of approximately similar players is enough---the talent is in bidding strategy, draft strategy, guesses on roles, etc.

Sep 28, 2010 15:46 PM
rating: 1

Shandler makes an excellent point with his article.

As I've shown, there's a half-dozen legitimate ways to test for accuracy, all depending on exactly what it is that you want. In one test, I had Marcel as #1 in a group of 22 forecasting systems. In another, it was middle of the pack.

My preferred method is to run all half-dozen ways, report the results, and let the reader choose which way most closely aligns with his needs.

Sep 28, 2010 19:55 PM
rating: 2

... or HER needs. :-)

Sep 28, 2010 20:29 PM
rating: 0

I've also suffered the last two years. Some of which is bad luck and some I have a theory on. My opponents are more equipped. The more internet-based projection systems out there the more fantasy players will use them to build their rosters. Its not the edge it once was and it probably has little to do with the accuracy of the systems. Using a projection system has become the baseline from which players develop their draft lists. You need to go above and beyond to get better.

Sep 27, 2010 13:31 PM
rating: 4

Agree. I developed a calculation a few years ago for my standard 5x5 league that uses the stats from PECOTA to give me a single 5x5 score for all players. Until this year, I have finished in the money every year of my league, but I don't blame PECOTA. I try to build my staff around Ks which usually bodes well for me. This year, while my Ks were high, ERA and WHIP were in the basement and never recovered.

Sep 27, 2010 13:53 PM
rating: 0

I was brought to BP through PECOTA, via the fantasy baseball side of the house. Through a couple of years worth of data and working with the numbers provided here I have come to a few conclusions.

First off, anyone who has abandoned PECOTA, I think you have put too much faith in the system in the first place. If this was an exact science, every member of the staff would have retired by now.

Finally, from all of my preparation, I don't think any one person or site has nailed fantasy coverage or projections. Ultimately, I just don't see how people can put blind faith into anything, be it PECOTA, CHONE, ZIP, or any other flavor of the month.

Sep 27, 2010 13:06 PM
rating: 1

correct, I don't put blind faith in anything.

however, PECOTA had enough errors that I ignored it completely

probably even more important than PECOTA though is the $ valuation engine on this site. I have begun to greatly favor an alternative source which makes a lot more intuitive sense, and gives 1 truer number. the $ engine here has all kinds of inputs that don't make any sense - at a very base level I think they are going about fantasy valuation incorrectly. you shouldn't have to pick "moderate" or "aggresive" or decide what kind of positional adjustment you want. these aren't questions and end user should have to answer, that's approaching the problem the wrong way. standard scores based off projections and projected playing time are all an end user should want or need for a basic $ valuation

Sep 27, 2010 14:02 PM
rating: 0

I do not think proper resources are being directed to developing a valuation engine (it is not a trivial process), nor to updating playing time projections as accurately as possible. There is no quantitative way to project playing time; it requires significant expertise and time. I would think this expertise exists somewhere on the staff, but not convinced it is being applied fully at this time.

Sep 27, 2010 16:00 PM
rating: 2

Guys, I can see both of your points on that front. I do not use dollar amounts. Based on playing the system to come up with the calculations I need for my drafts, I could definitely see that as being an issue with the system.

Another item, and I know this will fall on deaf ears, because no one ever really responds. A forum where people who use the software could talk would do wonders for both the system, and the customers. It might put PECOTA into the proactive category as opposed to the reactive category. I understand it's complex, I read the article.

Couple of PECOTA experiences.....

While I know it's not perfect, I nailed it this year. First off, pitchers have been stellar for my entire duration of using the software. I never once had an issue and we run 12 teams, 10 starters per. Some teams slightly more, some less.

Bats, I have had issues with this before. I flat out ignore any PECOTA SB projection. Way over valued in my opinion. I just think there is way too much weight put on SB, and especially as categories increase. The more you have, the more they should dilute, and they do not. But again, I worked around that and just ignore them. I learned my lesson last year on that one.

Other than that, you are going to have injuries and bad luck in baseball. For the most part I think the projections were decent. The system allows you to identify value through the later rounds, which is generally what you need to win leagues.

Sep 28, 2010 06:36 AM
rating: 0

Have to totally disagree with this comment, as I found both dollar projection systems to be useful. It helps you identify, for one, what type of draft you're in. The idea isn't to end up with the best bargains for $200, the idea is to end up with the best team for $260. If you don't know what sort of draft you're in it's entirely possible to be left holding the bag with $30 at the end of the draft, have the best "value" players, and yet lose to someone else who got more total value out of his $260.

Sep 28, 2010 12:51 PM
rating: 0
Luke in MN

Humility, transparency, intellectual and regular-type honesty. These are things I like to see from BP. I think a lot of us have seen a gap between what BP has been recently and what an industry leader in this department should be. I'm looking forward to seeing that gap shrinking.

Sep 27, 2010 16:19 PM
rating: 1

I think a big helping of humility is essential in reference to any system of projecting human affairs. It's simply not possible to achieve slightly more accuracy than educated guessing. As long as we all -- BP staff particularly -- treat PECOTA as a suggestion rather than as destiny, we'll all be a lot saner.

This goes double for Jay's playoff odds.

Sep 27, 2010 19:16 PM
rating: 3

So why didn't you use some of these 40 dollar subscriptions to buy Nate a second computer?

Sep 27, 2010 17:09 PM
rating: 3
BP staff member Dave Pease
BP staff

In addition to what was said in the article itself, I would put this thought in the "If I only knew then what I knew now" file.

Sep 27, 2010 22:38 PM

It would be interesting to know how you scope out the production process now. And how much time you envision for each.

Data collection.

DT's of current player stats; historical player stats.

Generation of baseline PECOTA numbers for the book. Does this still require STATA estimation, or is this integrated into a single large program, e.g., using R?

Generation of depth-chart adjusted PECOTA numbers for spreadsheet. Requires depth charts, of course, which will be done when?

Generation of charts, etc., for PECOTA cards.

Question 1: In one communication last year, Dave referred to it taking days for Clay to process a full run -- but was that due to slow processor time or low capacity? Or was he having to take out his wrenches and fix the data during that stage?

Question 2: What's your projected timetable for 2011 season data releases for internal use (BP201 authors) or for us consumers of the 2011 PECOTA's?

Sep 28, 2010 10:30 AM
rating: 0

Thanks, Dave. I appreciate the integrity it takes to put all of it out there this week. As you all know, PECOTA is one of the legs upon which BP stands. It's nice to see that you're not going to let it atrophy.

I also appreciate the effort from this season to identify the issues that PECOTA was experiencing, and even for putting together your beta-test team from subscriber-volunteers. Seems so long ago already, hard to believe it was just a few months ago.

FWIW, I was suckered into several of the notorious fantasy busts this year. Fielder in the first round, and other later round picks like Beckett, Nolasco, McLouth, Figgins, Lopez, and Iannetta. And yet, I'm still going to finish in the money in my league.

I'm very much looking forward to reading what the BP team has planned for BP!

Sep 28, 2010 21:03 PM
rating: 0
You must be a Premium subscriber to post a comment.
Not a subscriber? Sign up today!
<< Previous Article
Prospectus Q&A: Ken Bu... (09/27)
No Previous Column
Next Column >>
Reintroducing PECOTA: ... (09/28)
Next Article >>
Premium Article Another Look: Baseball... (09/28)

Premium Article What You Need to Know: June 2, 2015
Premium Article The Call-Up: Manny Banuelos
Fantasy Rounders: Split the Bit
Premium Article Painting the Black: #HugWatch2015: The Hugge...
Premium Article Transaction Analysis: They're No Angels
Premium Article Rubbing Mud: The Cole Hamels Decision
Premium Article The Call-Up: Miguel Sano

Premium Article Contractual Matters: A Bull Market For Bauti...
Premium Article Transaction Action: Dodgers, Padres, Giants
Premium Article Kiss'Em Goodbye: Milwaukee Brewers
Premium Article Kiss'Em Goodbye: Florida Marlins
Premium Article Kiss'Em Goodbye: Oakland Athletics
Premium Article On the Beat: Septembers To Remember For Some...
The Week in Quotes: September 20-26

2010-10-15 - BP Unfiltered: The Curious Case of Madison B...
2010-10-13 - BP Unfiltered: Internet Baseball Awards and ...
2010-09-29 - BP Unfiltered: After The Knife
2010-09-27 - Reintroducing PECOTA: What A Long, Strange T...
2010-07-29 - BP Unfiltered: Manufacturing Promotions
2010-04-23 - BP Unfiltered: Server Switch this Weekend
2010-04-13 - BP Unfiltered: HACKING MASS 2010

2010-09-30 - Reintroducing PECOTA: Aches and Pains
2010-09-29 - Reintroducing PECOTA: The Hits Just Keep On ...
2010-09-28 - Reintroducing PECOTA: Whatever Happened to t...
2010-09-27 - Reintroducing PECOTA: What A Long, Strange T...

2010-09-28 - Reintroducing PECOTA: Whatever Happened to t...