Nate Silver is an author of Baseball Prospectus.
Nate Silver: Folks, Nate's experiencing some connectivity issues, so he'll be along as soon as he can. Sorry for the inconvenience, and thanks for your patience.
Mike (Salem): Wily Mo Pena. Explain.
Nate Silver: Folks, you can thank Mike's question for causing serious problems with Nate's internet connection. He'll be along as soon as possible.
Anthony (Long Island): Is there any player PECOTA likes better than Adam Dunn? A 59.3% breakout rate...are you kidding me?
Nate Silver: Hi gang. Couple things before we start:
1. The focus today is on PECOTA stuff, and fantasy.bp stuff, though I can't promise that I won't be tempted by a Cubs question or two if I get them.
2. I know that I say this every time, but I've been real busy lately, so if you sent me an e-mail within the last couple of weeks, there's a good chance that I haven't even read it, and I almost certaily haven't responded to it. Trust me, it's nothing personal; I've also neglected my friends, my Scoresheet leagues, my laundry ... for a variety of reasons, I should be much better about this stuff going forward.
3. M y b a n d w i d t h i s t e r r i b l e right now, which is why this thing is starting late. If there are any futher unexpected delays, that's why. I'll run through 2:30 Eastern or so barring further technical nightmares.
re: the Adam Dunn question.
It sure does like Adam Dunn, and when he's drawing comparables like Jim Thome, Harmon Killebrew, and Fred McGriff, it's easy to see why. Players who hit for as much power as Dunn at such a young are an elite group, and have a very high success rate. Also, Dunn's value last year was mitigated by his having a very low batting average ... batting averege is the flukiest component of offensive performance, which also implies that he'll rebound.
I think the optimistic scenario probably involves Dunn becoming more aggressive at the plate. Interestingly, PECOTA actually sees his walk rate going down, but his batting average way up, and his isolated power up a hair. That seems plausible to me. The guy is only 24.
That said, I think the system *might* not be as concerned as it should be about his strikeout rate. Having a moderately high strikeout rate is not that much of a developmental concern for a hitter, especially if it's accompanied by a high walk rate; in fact, strikeout rate in year n is *positively* correlated with isolated power in year n+1. However, players with an *extremely* high strikeout rate - guys like Dunn and, for that matter, Wily Mo Pena - perhaps belong in a different class, and I'm not sure if PECOTA is doing a good job of picking up on that. At some point, it has to be a concern if a guy simply isn't making contact, especially if he has as large a strike zone as Dunn has.
sfhubbard (Cleveland, OH): Nate, Just saw the the forecaster spreadheets in the new Fantasy section. I noticed that the playing time estimates (and likewise, VORP) are different than in the PECOTA cards section. Can you please explain the differences? Thanks.
Nate Silver: Yep, the playing time estimates are different. We went through all thirty teams, and had the chapter author for each team put together a depth chart of projected playing time based on a combination of factors, one of which is the PECOTA playing time estimates, but also taking into account things like managerial tendencies, current information about injuries (a.k.a. Will Carroll), organizational depth, and so forth. There was a *lot* of work involved in putting these together, and we think they should prove to be pretty darn reliable. We should have the depth charts themselves available for your perusal by tonight or tomorrow morning, and they'll be updated continually throughout the season.
Eric (Somerville, MA): Hi Nate,
Thanks again for releasing the fantasy projections. I have two methodological questions:
1. Are the fantasy projections built from the 50% PECOTA forecast, the weighted average, or something else?
2. How did you derive team-dependent counting stats from the team-independent rate measures that PECOTA provides? I'm particularly concerned about the saves forecasts -- and the resulting dollar values -- your system forecasts for closers. They feel too high for me.
Nate Silver: 1. They're based on the weighted mean projections.
2. The team dependent stats required a whole new set of models. In order to derive the runs and RBI projections, we took projected batting orders for each team, and developed a Markov chain process in order to calculate run scoring ... the results are as accurate as what you'd get if you simulated the season an infinite number of times, but the process is quite a bit more elegant, and much easier to update on a real time basis. I suspect that most of you don't care about these nuances, but the point is that, by actually accounting for batting order, and the players that surround a given hitter in the lineup, we should have considerably more accurate projections for runs and RBI than what you'd get if you simply dumped a player's team-independent statistics into some kind of one-size-fits-all equation, which is I think what a lot of the other forecasters do.
In order to come up with the saves projections, we have to work backward. It turns out that if you know two things - the amount of games that a team is expected to win, and the amount of runs that it is expected to allow - you can do a pretty good job of estimating the number of saves that team will finish with (as you'd anticipate, teams that win more games have more saves, and lower scoring teams have more saves than higher scoring teams if the number of wins is held equal). The model the we've developed knows - or thinks that it knows - these things; it comes up with an estimate of team RS, RA and W-L record. After estimating the number of *team* saves, we assign them to individual pitchers based on the depth chart process that I described earlier. So the model might tell us, say, that we expect the Indians to have 42 saves this year, and we use our more subjective forms of knowledge to figure out how to best allocate those out.
Angelo Grasso (New York, NY): What percentage of players under PECOTA get a 50% or higher chance of improving, ie, the system believes that there's a better chance than not they'll improve their baseline? Am I correct in guessing that most of these players are concentrated in those under 27 years old?
Nate Silver: Hi Angelo,
The overall percentage really is close to 50-50; if not, the system would not be calculating the baseline correctly. Certainly, younger players have a relatively better chance to improve, but the difference is not as great as you might think for a couple of reasons:
1. There's a *lot* of noise/randomness/unpredictability in a player's progress from year to year, which often overwhelms the importance of age.
2. Many players who are very young (e.g. 21 years old and under) simply never develop at all. The age 26/age 27 peak theory is more or less correct *for players who have gone on to become established major leaguers*, but may not hold true for scrubby minor leaguers.
Art (London, UK): Nate, what do you (or PECOTA) have against Sosa? Did he steal your girlfriend?
Nate Silver: Is Sosa's projection all that poor? The system thinks he'll be notably better than he was a year ago, especially in the second half, and he's old enough now that we shouldn't take that as a given. Subjectively speaking, since I see Sammy play a lot, my impression was that he had started bailing out on a lot of pitches after he took his beaning back in May, which really hurt him, especially when he was behind in the count. My guess is that the time off will do him some good - and he appeared to have a much better approach at the plate in the post-season - but I don't think we're going to see another year like 1998 in his career.
Rob (Wisconsin): Hey Nate, please answer my Cubs question?
Can we expect Todd Walker to hit his way into PT at 2nd, and (haven't checked PECOTA), but what should we expect from a full season of Aramis?
Nate Silver: I think there's some hope that Todd Walker will get significant playing time, but it will depend more on Mark Grudzielanek's badness than Walker's goodness. Then again, Dusty ran Shawn Estes out there for like 27 starts last year. The man can be stubborn; it's just a question of whether Grudz is "one of his guys".
Ramirez gets a nice PECOTA; he's one of those players who has displayed all sorts of different offensive skills at various points in time, and it wouldn't shock me if he put a very good season together. He seemed to be taking a somewhat improved approach at the plate toward the end of the year, improving his patience a little bit.
Rudy (Decatur, Ga.): Thanks for chatting, Nate. What rookie position player does PECOTA like best? Is it too soon for a Jeremy Reed projection?
Nate Silver: Reed is the one hitter that PECOTA likes the best, though I think he's probably buried for this year unless one of the expensive guys is dealt, since he's not really a center fielder. Justin Morneau falls in something of the same category. Among guys that are more or less assured of playing time, it seems to like Bobby Crosby the best.
Mike (Salem): Derek Jeter's stats as a 19-year-old look almost identical to B.J. Upton's stats as a 19-year-old (though Upton showed more power and plate discipline.) It seems odd, then, that Jeter isn't listed among the top 20 in similarity scoring to Upton. I realize it probably can't be done in a forum like this, but can you briefly explain how similarity scores are determined?
Nate Silver: I'll answer only the simpler half of your question; there's a good explanation of how similarity scores are dervied in this year's book. I agree that Jeter represents a good snapshot of Upton's Up-side. Problem is that Jeter was in the minor leagues at the time he was 19, which was back in 1994, but the PECOTA database only includes minor league seasons going back to 1998.
Chris Hartjes (Toronto): Just got my copy of BP2004 (thank you Amazon.ca) and enjoy reading your section on PECOTA and wanted to ask you how much you think factoring in defensive ratings have altered how PECOTA works. My concern about it is that there doesn't seem to be enough variance in the FRAA numbers to really make a difference.
Nate Silver: You're correct that the defensive numbers are more an accessory than anything else; we have enough trouble quantifying defense, let alone projecting it. Still, I think the defensive projections add something in terms of completeness, and they can have a meaningful influence on comparable player selection, especially for minor leaguers.
Felton (New Orleans): Shazam! You are Paul DePodesta. What freely available talent might you sign to supplement the offense? Would you think about playing Adrian Beltre at short sometimes? Thanks.
Nate Silver: I think the Dodgers are about 80 runs shy of having an offense that can win that division, so this might *not* be a case in which the marginal gains to be had from freely available talent would do much for them. Apart from hiring DePo, the team had a pretty poor winter; adding Magglio Ordonez would have been a good fit, but that deal was one of the casualities of the pilot version of the A-Rod trade. My guess is that DePodesta keeps quiet this year and hopes that Edwin Jackson pulls a Fernando Valenzuela or something, and then regroups next winter to strive for a roster with a bit more balance.
NotArnold (Oakland, CA): I need your help with today's California primary ... who does PECOTA project to have a better career four years from now, Kerry or Edwards?
Nate Silver: It depends on whether you think that Michael Dukakis or Al Gore had a better career path. Other prescient PECOTA comparables:
Howard Dean <-> Joe Charboneau
Ralph Nader <-> Rex Hudler
George W. Bush <-> Rube Waddell
Scott (Seattle, WA): Nate,
Care to comment on PECOTA's five year forecasts? I'm no math major, but there seems to be a flaw in the system. Every top young player that had a better 2003 than 2002, seems to have their forecast show a regression to the mean, and then slight attrition for the next few years. I'm sorry, but I just can't believe that Albert Pujols is all of the sudden going to just plateau back to a 6 win level through his 26-29 years. Or a guy like Vernon Wells, who PECOTA says will revert back to 2002 levels, and pretty much show no growth after that? Maybe the worst is Mark Prior, who is projected to go from a 6.1 win level to 2.9 in 2007 - I just don't see that happening.
I know that the lack of comparables for guys like Prior is a part of it, but am I just reading the forecasts wrong, or is there something weird about EVERY player regressing to their mean and leveling off from there...
I think that it's easy to underestimate the about of things that can go badly with a baseball player. Take somebody like Pujols, who is already performing at such a high level ... there are basically two things that can happen with Albert Pujols. He can hold steady, perhaps improving a little bit, and be the next Hank Aaron (his #3 comparable). That's entirely likely. But he could also have a cataclysmic decline on the order of somebody like Bob Horner (his #10 comparable). If you take the average of the career values of ten Hank Aarons and ten Bob Horners - that's essentially what PECOTA is doing - the result is somewhat worse than the level at which Pujols has currently been performing, because the Aarons improve only a little bit, whereas the Horners decline tremendously. The thing is, it's tough to see the Horners before they happen; nobody thought, in 1983, that Horner would be out of professional baseball by the time that he was 30. But there are a lot of Bob Horners.
harry (springfield, ma): What do you think of Garrett Atkins, and specifically, who do you foresee getting the majority of the PT at third for the rockies this year, him or Vinny Castilla?
Nate Silver: Castilla. If Atkins could really pick it at third base, it might be a different story.
Mind you, I have no idea what the Rockies are doing. I don't think either Castilla or Atkins are a good long-term solution.
Jonny K (Boston, MA): Hey Nate! I have a question about PECOTA's predictions for players who breakout in their age 24 or 25 seasons. It seems like most players who finally got the playing time they deserved and busted out with solid VORP seasons at these ages have PECOTA projections going down into their age 27 season. (examples: Alex Cintron, Angel Berroa, Jody Gerut, Ty Winnington)The consensus view is that most players have their best seasons when they are 27 or 28. It looks like PECOTA only likes the players that break into the Majors at 21-23 to have that peak. Have you noticed this trend? The decline is far more pronounced for pitchers...but I assumed that they were being penalized for injury risk.
Nate Silver: One thing that it's important to keep in mind when evaluating young players is that they're capable of having flukishly good seasons, just like veterans are. Berroa is a good example. There's no doubt that he's improved a ton as a hitter, and I expect him to put together a decent career, as does PECOTA ... but the degree of improvement he exhibited from 2002 to 2003 was so high that it was almost certainly a combination of development in his intrinsic skill level, as well as a healthy dose of luck. He could well be a better player in 2004 than he was in 2003, but nevertheless wind up with stats that are notably worse.
Medea's Child (Los Angeles): Since someone has already beaten me to the Wily Mo Pena question, let me ask which PECOTA projections you subjectively disagree with the most. Who do you like better than PECOTA? And who does PECOTA like better than you?
Nate Silver: Yo MC,
I think that Joe Crede will outhit his projection. Joe Mauer: I don't see him being an overnight star, but the system has trouble identifying comparables for him and I think he'll do a little better than indicated. Roy Halladay. If Mark Prior has added 3 MPH to his fastball this winter, as Will Carroll reports that he has, then I think his projection is low. I'm a big Johan Santana fan. Ben Sheets.
On the downside, I love Javier Vazquez, but that projection seems a little bit aggressive to me. Shawn Green worries me a bit. A lot of the Montreal guys might not too as well as advertised, since PECOTA applied a pretty goofy park factor to them.
michael (San Jose, CA): Great to see the 2004 PECOTA forcasts out. But where did the 2003 PECOTA forcasts go?
The article "PECOTA Takes on the Field" was a great read, but for those of us who want to look a little deeper in how well PECOTA did on various subsets of players and/or how well it did in each of its percentile estimates is there some way to get this 2003 data?
Nate Silver: We hope to make the 2003 PECOTA cards available in an archive somewhere. It's certainly our intention to have those available to subscribers. Probably not in the next week or two, though as this is a ridiculously busy time of year for us.
Jimy James (Boston): I am highly interested in many of the processes involved with the prospectus book and even more with statistical analysis. I find myself wishing more things in life could be analyzed, like my average with women/9*(dates+declines). The question is how can I, as a college student, enter into the field of stats, baseball and otherwise?
Nate Silver: Just start doing stuff. Write, study, analyze ... if you think it's good, publish it, push it, annoy people until you get a larger audience. Pretty much every member of BP was doing stuff on their own before being recruited into the group. PECOTA was developed before I knew that I'd have an audience before it. Same with VORP, EqA, and so forth. Derek Zumsteg was subjecting the usenet to his badassness long before he joined the group.
p.s. Back in my college days, I found the following to be pretty reliable: Pr(booty) = ln(BAC) + n.
Amos (Madison, Wisconsin): Other than Wily Mo Pena, who are some of your favorite PECOTA projections for comedic reasons?
Nate Silver: My favorite PECOTA projection is Alex Gonzalez, since his third best comparable is the other Alex Gonzalez. Nothing will ever beat the serendipitous moment that occurred last year, however, when Jose Lima drew Oil Can Boyd as his #1 comp.
Evan (Vancouver): Any chance you guys can release the projected batting orders you used in the BP fantasy Markov chain calculations?
Nate Silver: Yes, look for that stuff within the next 24-48 hours.
Steven (Cleveland): Nate, a direct quote from the recent prospect roundtable discussion: "Nate Silver: PECOTA does not yet use minor league comps for forecasting pitching prospects." Given this fact, how much confidence can we have in forecasts for minor league pitchers (Hamels, Kazmir, etc)?
Nate Silver: That's a fair question, especially when we're dealing with minor league pitchers under the age of 21. For those guys, I do think that you need to take into account a higher-than-usual amount of non-statistical information. That said, I think the same would be true even if PECOTA did use minor league pitching comps.
Laurence (New York): How were the changes in production expected as a result of ballpark changes in Kansas City and the new Ballparks in Philadelphia and San Diego figured into PECOTA projections especially pitchers? Thanks for all your great work.
Nate Silver: We made Philly a neutral park.
San Diego we gave a park factor of ~95, figuring that about half the reason that Jack Murphy (which had a park factor of 90 or so) had played as such a pitchers' park was because factors intrinsic to San Diego's geographical conditions and would carry over to the new facility.
Kansas City ... we didn't change the park factor at all. I'm not entirely convinced that moving the fences back is going to make that much difference. The park hasn't only played as a good park for HR, but also as a good park for singles, doubles, and triples, and larger dimensions are generally good for non-HR types of hits. They've got a good hitters' background there, the weather conditions are conductive to offense, and there aren't as many hitters' parks in the AL as there used to be, all of which are good reasons to believe that KC will continue to inflate offense.
Paul Covert (Lynnwood, WA): Suppose I take each team's predicted total VORP in the BP Fantasy spreadsheet, subtract the league average predicted team VORP, divide by 10 (runs/win), and add 81 (average wins). Would that produce a reasonable league standings forecast? (I.e. is that a fair use of your method?)
Nate Silver: That'd produce a pretty reliable figure. We should have our own version of the team standings available soon, though.
Anthony (Long Island): PECOTA doesn't like Derek Jeter (29.8% improve, 34.4% collapse). 1) How could his expected decline affect the Jeter/ARod scenario; and 2) has the projection prompted death threats from Tim McCarver yet?
Nate Silver: Last question. I haven't eaten lunch yet and I'm starving.
Jeter is another guy that I like a bit better than the system does. PECOTA is suspicious of him because a lot of his value came from his BA last year, but his power was held down because of his injury. If he's healthy - and Will Carroll gave him a yellow light - I think he'll post numbers somewhere along the lines of the 300/370/470.
Later, guys. Check out the BP Player Forecast Manager if you haven't yet; it's really cool stuff!
Nate Silver: FIN.