Changes for the art of projecting player performance are afoot, so you'll want to talk to Colin Wyers about the shape of things to come.
Colin Wyers: There was a man with seven wives. Each wife had seven sacks, each sack had seven cats, each cat had seven kittens. Kittens, cats, sacks, wives - how many were going to Saint Ives? My phone number is 5-5-5 and the answer.
JRMayne (Above ground): Great series!
Any thoughts as to using velocity or other stats for pitching projections? It seems that's a mostly untapped resource.
Colin Wyers: I don't know if velocity tells us much of anything - there are a lot of crafty guys who are very successful and some fireballers who aren't. We have considered using a pitcher's repertoire - what pitches he throws, essentially - as an input to the PECOTA projections. To do it in a way that I feel would add significant value would be rather resource intensive. That isn't to say we won't do it, there's just some other things that are a higher priority and we are working against a deadline. So we'll see.
Mountainhawk (Salem, MA): Based on today's PECOTA article, it really looks like PECOTA is dramatically underestimating the variability of the results around the estimate. If this is all playing time based, you should be able to select a subsample of players with enough ABs to get something looking uniform, but based on the discrepancies we see in the graphs, that seems unlikely.
Any thoughts on what could be missing in the variability estimates, and what you need to do to fix it?
Colin Wyers: I go into this in the article, I think, but the short version is you have player projections with more uncertainty than the comps' projections, and so you underestimate the spread. That's something that will be fixed before this year's (next year's? The 2011 PECOTA forecasts) cards are published.
Matt (Chicago): Give the state of their team , are the Cubs better off going after Dunn(OBP & major power) or trolling the trade waters for another teams stalled prospect at that spot?
Colin Wyers: I've made no secret that I really like Adam Dunn (and the Cubs), so I'm probably not the most unbiased person here.
Matt (Chicago): Please give me some hope for next yr's Cubs rotation. I see some back-end guys ( Gorz,Wells,Silva), a big ?(Z), and a solid #2-3(Dempster). Shouldn't they stretch out Cashner?
Colin Wyers: The Cubs have plenty of starting pitching depth (enough to where they could stop trying to make Jeff Samardzija a starting pitcher, if nothing else for my health and well-being). They have some guys with front-end talent. It's not the biggest hole on a team riddled with holes.
Randy K (Kansas): My unsubstantiated observation is that free swinging young players (ie Pablo Sandoval, Delmon Young, etc) who don't take many walks peak earlier in their careers than others (Vlad would obviously be an exception). This would be the result of pitchers doing a better job of not giving them good pitches to hit. Do you think my hypothesis would bear out empirically? and can Carlos Gonzalez be an exception if that's indeed the case?
Colin Wyers: Walks are key for a few reasons. One, they're valuable in their own right. Two, they're a sign that a player is able to be selective, and won't swing at as many pitcher's pitches. When a player is younger, they can get away with it more, because they can compensate for it with good bat speed. (This is why walk rates tend to be one of the skills that peaks late.) As far as Carlos Gonzalez - his batting line is so, so weird. And it's hard to learn discipline when you're getting results like this.
Matt (Chicago): Are you buying Z's resurgence? I must admit that I'm a little concerned about the walk rate.
Colin Wyers: I don't think it's a resurgence, I think the Cubs just overreacted in the first place - they didn't want to displace Carlos Silva from the rotation. Zambrano won't be as good as he was a few years ago, but he's still an asset as a starting pitcher.
myshkin (Santa Clara, CA): What are the other things that contribute to the uncertainty of a player forecast besides playing time and skill set?
Colin Wyers: The distribution of that playing time is a big one - if you have two players with 3000 PA in the past 3 years, the one with the most PA in the most recent season is easier to predict.
KRS (Loop): John mentioned today that the Red Sox may be interested in trading for Aramis Ramirez if they can't sign Adrian Beltre- I'm thinking the Cubs should be all over this, especially if they don't have to pick up much (or any) of his salary. Considering Ricketts says they may decrease payroll, gives them some flexibility to sign Adam Dunn. Guess they have to come up with another third baseman though (Vitters?)...thoughts?
Colin Wyers: At this point it doesn't look like Vitters is ready for AA, much less the majors. DeWitt could play third, and they could use some combination of Barney/Baker/etc. at second base (the Cubs organization isn't known for its lack of second basemen). Is it a GOOD idea? I dunno. It's hard to see a scenario where they contend next year anyway.
frampton (persnicketyville): Of course, the historic first line of the riddle was, "As I was going to St. Ives, I met a man with seven wives." So the answer was, one; all the wives and cats etc. were coming from St. Ives.
Definitely appreciate the work you all are putting into PECOTA, though I think some folks have a higher expectation for projections than is ever going to be warranted.
Colin Wyers: ...huh, yeah, I kinda wrecked that.
Rex Little (Big Bear, CA): In my Strat-O-Matic league, Hellickson, Garcia, Bumgarner, Colby Lewis and Niese will be in next year's rookie draft along with Strasburg. Teams are allowed unlimited keepers, and next year's games are played using this year's stats. Given that, would you take any (or all) of those guys ahead of Strasburg?
Colin Wyers: I'm afraid I don't know how Strat works well enough to answer this - does that mean that Strasburg will be limited to his actual innings pitched, or can you use him for a full season?
dianagramr (NYC): So Colin ... what are you drinking?
Colin Wyers: I have developed a backstock of Pepsi Throwback - the one with real sugar. I am hopeful, not necessarily optimistic, that Pepsi will start selling it again before I run out.
crperry13 (Houston): Will there be a "PECOTA for Dummies" explanation at a future date that can summarize how this all works for readers without a masters degree in economics and statistical analysis?
Colin Wyers: There will certainly be something where we give an explanation of PECOTA top to bottom here at some point - one of the big problems is that the best explanation of PECOTA (Nate's original essay in BP '03, I think it was) is no longer in print. So we definitely want something we can point to that's still readily available.
Matt (Chicago): What type of market do you see developing for Fielder this offseason? The A's stand out as an obvious trade partner but I'm guessing MIL's haul might not be that great.
Colin Wyers: Very few teams will be inclined to give up a lot for him, since he's made no secret of the fact that he intends to become a free agent when he gets the chance.
Joe Lefkowitz (NJ): JRMayne kinda took the question I wanted to ask, but can't pitch f/x data be used to come up with better comps for pitchers? Also, while there are successful crafty pitchers and poor power pitchers, wouldn't a *change* in an individual pitcher's velocity be able to tell us something about him going forward?
Colin Wyers: You need more years to do comps than what Pitch F/X alone gives us. As for change in velocity - you have to be careful, given questions about the calibrations of the cameras and pitch classifications. Is a pitcher's velocity up because of something he's doing, because a camera moved, or because the classification algorithm is calling more pitches "changeups" than it used to? It's not impossible to do, but there are a lot of things you have to be careful of.
jdouglass (Chicago): Hi Colin,
Re projecting pitchers based on what they control v the noise that exists. Is it possible to project the pitchers at only what they can control, and then additionally but separate from that do team noise projections that give us a factor by which we can expect a pitchers non-DIPS to inflate/deflate during the upcoming season?
Colin Wyers: It's certainly possible. With the percentiles for pitchers, a lot of the questions I have are what's most useful to look at - definitional questions, in other words. And I'm hearing different things from different people, but so far the consensus seems to be to include "noise," that is sequencing and batted ball variance.
EJSeidman (Accountingville, USA): Chester A Arthur. 1881 to 1885. Nominated vice-president in
1880. Did you know he was collector of custom right here in
Colin Wyers: Well, now I know who my second least favorite customs official in American history is. Thanks!
The Flying Bernard (Acton, MA): You're drinking Pepsi Throwback? I thought you said you were drinking St. Ides.
Colin Wyers: That stuff is probably more suitable for cleaning driveway grease stains than it is for human consumption.
Wendy (Chicago): I'm sure you're getting this a lot, but... What are you guys going to be doing to replace Will? In the last year, you've lost Joe, Nate and Will. Who do you see stepping up to replace them? I like Perotto
s stuff, but he's nothing more of a complementary piece. So you're down to Kevin and Christina. I don't mean to be coming down on you guys, but please reassure us.
Colin Wyers: I think it's pretty clear that, well, I'm intended to fill the niche Nate did. Those are some big shoes to fill, and I'm sure y'all will let me know how I'm doing. We've talked a bit about integrating injury stuff into PECOTA, and I think you'll find some of that overlaps with what Will was doing. But yes, there's a cycle where you have to keep replenishing the talent. If there's someone out there you see that you think ought to be part of that discussion, please drop us a line.
myshkin (Santa Clara, CA): I find both this past week of PECOTA articles and the news of a PECOTA overhaul (with more time allotted to the schedule, and now with unit testing!) very heartening. What database and programming tools are you using to develop the new methodology?
Colin Wyers: We use a lot of MySQL. There are some things that right now are still heavily in flux, and so once I get them a bit more mature, they may be replaced with something a bit more math-heavy to improve performance. I work in GNU R (which is primarily for scientific and statistical computing) and Python, so those are candidates. For quick one-off stuff I don't think I'm likely to do again, I sometimes work in Excel (I'm not proud) and Gretl.
Christina Kahrl (BP Volcano Hideout): Hey now, President Arthur did something of value as far as civil service reform. Which, as a man with intimate, rapacious knowledge of how to make the system work for him, he was extremely well positioned to do, so that it would be difficult to do as he'd done on his way up. Perhaps there should be a medal for patriotic hypocrisy in his honor?
Colin Wyers: It should be pointed out that my knowledge of historic customs officials is pretty shallow.
batts40 (IL): I liked the departed BP members as much as anyone. But I think the new guys are doing outstanding work, and the quality of BP is as high as ever.
Colin Wyers: Not surprisingly, we like hearing comments like this. Thanks.
Eric (Denver): Who should the A's target either in terms of trades or free agency to build an offense this offseason?
Colin Wyers: I think the As are a little closer to a respectable offense than most people would believe - it's a pitcher's park they have that makes them look worse than they really are. That said, it's hard to read the market this far in advance. I think the key for a good front office is to be flexible, and not fixate on any one player to the extent that you lose sight of the rest of the options. (Or to be the Yankees, and be able to fixate because you know you can top any bid.)
The Flying Bernard (Acton, MA): As the Cubs took 3 of 4 from the Padres, I couldn't help but wonder if it made Lou Piniella happy to give Bobby Cox a little help getting to the postseason one last time.
Colin Wyers: I doubt he cares - Lou Piniella was a very competitive person. He's probably happy the Cubs are winning more (I don't know how much attention he's paying - he left the team to attend to some more important things, after all).
Matt (Whippleville, NY): How feasible would a James Shields for Lucas Duda and Kirk Nuewenhuis trade be for the Mets and Rays? With Johan out, the Mets could use a guy like Shields and there's plenty of reason to believe Duda can be most of what Carlos Pena is/was.
Colin Wyers: I have no idea.
BJ (Newark, CA): Hi Colin,
It seems to me that in previous years what I liked best about PECOTA was its ability to (successfully) identify players who were ready for a breakout season that other systems missed. Just based on my experience, this year's PECOTA didn't seem to be taking a stand on certain players the way it did in the past--and the season seemed to bear out the thought that it didn't identify many surprise players.
Just wondering whether you think that's true; if so, why; and will next year's system identify more of these types of players?
Colin Wyers: Identifying "surprise" players is hard, by definition. And I think as sabermetrics moves more into the mainstream, you'll see it get harder - some of the guys who would surprise you a few years ago might not, now that things like PECOTA are being used to drive expectations.
Matt (Chicago): The Cardinals need some serious tinkering, no? That middle IF is unacceptable and they could use another OFer after foolishly giving away Ludwick.
Colin Wyers: The Cardinals always seem to carry about a third of a great team and two thirds of a mediocre team. So long as they're keeping that 1/3rd on the field, they always have a chance to contend.
Greg27 (San Fran): Thoughts on the PECOTA projections for Sandoval next year? I can't get any kind of read on this guy.
Colin Wyers: In a down year, he's been able to be pretty close to a league average hitter. I suspect he'll be alright.
The Flying Bernard (Acton, MA): Oops, you're right. I somehow forgot that Piniella left the team last month to be with his ailing mother, so I thought he was still with the team. As a Braves fan I don't tend to follow the Cubs closely, but I was happy to see them beating the Padres!
Colin Wyers: You've caught me. The reason I've been cagey about what I'm drinking is because it's really Marc Normandin's tears.
garsonf (Chicago): I promise not to hold your answer over your head two months from now, but do you have any sense of the time frame (just generally) in which we'll see initial projections for next year?
Colin Wyers: I think last year we had the first PECOTA spreadsheet out around the end of January. I'm hoping to beat that by at least a month or so this year, and then all the other timelines or so follow from that.
Damon Rutherford (Purdue University): My mother and your mother went out to hang clothes. My mother hit your mother right in the nose. What color was the blood?
Also, any intention to provide results of projections of *past* seasons with today's current methodology? For example, it should be fairly straight forward to crank out projections for the past 20 seasons and then compare them to reality, yes?
Colin Wyers: Right now I'm doing 60 years of back forecasts for testing purposes. It seems a lot of people are interested in this, so yes, when we have PECOTA ready to publish, we'll find a way to make something available that shows the back-forecasts.
Bernal Diaz (Ohio): Do vampires poop?
Colin Wyers: Of course they don't. They don't "eat" in the biological sense - they subsist off the very life force of the living.
Martin (Waco): Jose Bautista: Great player, or greatest player?
Colin Wyers: Only 649 behind the career home run leader. He's a guy to keep an eye on.
Joe Bivens (SEMASS): Hey
Colin Wyers: Hey!
Michael K (NY, NY): Do you use back forecasts as a way to test possible improvements?
Colin Wyers: It's an iterative process - run the back forecasts, test them, make improvements, run the back forecasts, test them, make improvements. Sometimes by "improvements" I mean "totally break everything," but that's why you do the testing.
lucasjthompson (Mpls. MN): Which pitching stat that PECOTA projects shows how good pitchers are in an entirely neutral context? (I can never keep all the different blankERAs clear despite many trips to the glossary)
Colin Wyers: Honestly, neither can I. Some of them are being sent off to a nice farm where baseball stats can run and play all day, and there's plenty of sunshine and warm air and they get to play with other baseball stats. We'll have more detail on this in the coming weeks.
Thufir Hawat (Arrakis): Any interesting news on Hit f/x you can pass along to us? Remember, the universe is full of doors.
Colin Wyers: Right now, it doesn't seem like public Hit F/X is in the cards - some teams are buying the data, but I don't really know what they're doing with it. I have some Hit F/X data from the two Summits Sportvision has held, which is enough to see what you might be able to do with the data, but not enough to actually do any of it.
Craig N (Akron): Do you think Greinke is still an elite level pitcher, or was 2009 a bit of a tease?
Colin Wyers: He'd been good - just not as good - the past two seasons before that. I think next year he'll be closer to '09 than '10.
The Flying Bernard (Acton, MA): Speaking of the career home run leader, every time I chmod a file to 755 I think of Hank Aaron. Why did Barry Bonds have to stop at an ugly number like 762? Couldn't somebody have signed him so he could hit a few more and get to a (somewhat) common access mode like 775 or 777? Sorry if that was too nerdy.
Colin Wyers: 764 would even be better, because it's conceivable that someone, at some time, would want to do that.
Mahasamatman (Celestial City): One thing that's always bugged me about minor league translations is how harsh they seem to be, and even someone succeeding at a young age at a high level is often translated to be a barely useful player. An obvious exception to this is Matt Wieters.
Anyway, what would someone have to hit in low A to project to be Pujols in the majors? .600/.850/1.450?
Colin Wyers: Well, that's all based on historic data of how well they've done. It's not that we don't like minor leaguers. And I'm not even sure that would do it.
dantroy (Davis): Can you talk about how PECOTA utilizes college #s?
Colin Wyers: It really doesn't. College numbers are difficult - they really violate the assumption that strength of schedule is a minor effect. Without at least game level detail, it's really hard to make good use of them.
Not a stat guy (Chicago, IL): Can't you just 100% accurately tell me who to pick on my fantasy team so that everyone will rely on BP and have the exact same valuations and draft order?
But seriously, I had a tough time following today's article. Can you tell me if this year's projections for players who had significant playing time generally fell within the expected bell curve for standard deviations (i.e. a normal distribution)? (Or, it sounds like, with a slight push toward the "better than predicted" side of the curve because of playing time bias).
Colin Wyers: Right, there's a selective sampling effect that pushes observed results toward the high side of the percentiles (which you can see if you look at the projected PA at each percentile). Observed results are mostly a normal distribution in practice, at least if you weight by PA. The PECOTA percentiles were a little too tight, but the distribution seems about right - we'll loosen those up a bit.
Ghost of SFP (Lounge): Colin, what do you think the Red Sox will do this winter? How about the Mariners?
Colin Wyers: The Red Sox are a good team that somehow managed to come pretty close to the wild card while suffering through a number of Biblical plagues. Their biggest off-season quandary is how to avoid paying Papelbon $12 million dollars while still extracting some value from him. The Mariners, on the other hand, have a Chosin Resevoir roster - they can pretty much attack in any direction.
Damon Rutherford (Purdue University): Using RMSE as the be-all, end-all evaluation tool for comparing various prediction methods seems to be lacking in information gained.
Perhaps multiple tools can be used? For example, what percentage of player predictions were within X% of reality? What percentage were X% away from reality? Did the predictions over-predict or under-predict, since RMSE treats them equal and cannot reveal that?
I understand why RMSE is used to evaluate the predictions, but I still feel there must be a better way to do the evaluation. It is missing something key, something not mentioned above, and I still cannot quite figure out what it is.
Colin Wyers: It's basically a high-powered average error, yeah. I agree there's more that can be done to test projections. The nice thing about the RMSE test is that it's standard, easily repeatable and pretty common. If I come up with a more involved, but less transparent, test that shows PECOTA as the best projection system, it raises questions - by sticking to simpler tests, I avoid that. I'd love to see someone without a dog in the fight do some more involved tests, though.
Ken (The Sconnie Office): Sleep: A big thing you miss since joining BP, or the biggest thing you miss since joining BP?
Colin Wyers: What's sleep?
TFTIOhio (Cleveland): Are you working on those playoff stats? Preview?
Colin Wyers: I will be putting together some stuff for the playoffs - one of the projects I have is a set of "playoff PECOTAs." They won't have the aging adjustments built in that the regular PECOTAs do (you don't age much between now and October), so they should be simpler to run than the full PECOTAs.
Michael K (NY, NY): Have aging patterns changed enough that this might pose a problem when you use comparable players from prior eras? Or do you account for that?
Colin Wyers: It's an interesting question. I haven't gotten that far with things yet, but I'll be looking into it.
Colin Wyers: Alright, folks, it's been a pleasure chatting with you all, and I hope to do this again soon. Thanks!