Notice: Trying to get property 'display_name' of non-object in /var/www/html/wp-content/plugins/wordpress-seo/src/generators/schema/article.php on line 52
keyboard_arrow_uptop

Some changes coming down the pipe for the Depth Charts (and by extension, the Player Forecast Manger) and PECOTA I wanted to share with you:

  • We are now projecting player’s rest-of-season stats to a 2011 baseline, rather than a 2010 baseline, and
  • We are now using an updated PECOTA that incorporates a player’s 2011 performance (major league numbers only–minor league translations will be incorporated at a later date), and
  • Rest-of-Season PECOTA will be updated daily.

We’re going to be pushing these updated forcasts into the Depth Charts and PFM (as well as the tops of the player cards) on a daily basis. The last updated fields in the DC and PFM will still only change when a player’s playing time has been manually tweaked by us. (If you come across any player news or have an update for our depth charts, you can let our Fantasy crew know at our comments page, sending an e-mail to dc@baseballprospectus.com or sending @BProDepthCharts a message on Twitter.)

I’m sure you’re all more interested in the PECOTA aspects of things, and I’ll be going into more detail on that in just a moment, but I do want to emphasize that we’ve shifted baselines as well. If you see a pitcher with a lower ERA forecasted than what you’ve seen in the past, but he hasn’t pitched well to date, that’s because we’re expecting all pitchers to have a lower ERA than we would have prior to the start of the season. The change in baselines is not necessarily meant to reflect our expectations of what the rest of the season will be in terms of offensive levels (as it gets warmer, we should see the baselines rise), but to facilitate easier comparison between a player’s season-to-date performance and his rest-of season projection.

Now, as to the PECOTA updates: We are not rerunning the entire PECOTA process on a daily basis. First off, that would simply be impractical; by the time we got done the next day’s stats would already be waiting for us. Secondly, it would be the wrong tool for the job. Much of the computational horsepower behind PECOTA is spent figuring out how a player will change with age. The effects of age between now and late August are minimal enough to be ignored, and the aging process used to figure a player’s aging between seasons would be very ill-suited to help us capture them anyway.

Instead, we are taking a player’s season-to-date numbers and, in effect, “regressing” them toward the pre-season PECOTA forecast. The weighting is determined by two things: (1) a player’s playing time so far this season and (2) the reliability of a player’s preseason forecast. The more a player plays this season, the more the rest-of-season forecast can move, but at the same time, the forecast for a rookie is more likely to move than that of an established veteran.

I’m anticipating some questions from y’all, so I’ll start off with the first one I expect to see: Why aren’t we projecting Jose Bautista to hit like Barry Bonds for the rest of the season?

It’s a good question. Let’s say this up front: Bautista is very nearly a singular case in the history of baseball, insofar as his transformation from journeyman to premiere hitter. It is possible that a model like PECOTA, based on historic data, is having a hard time coping with a player as unique as Bautista. The trouble is that since Bautista is unique, it is impossible for us to test this proposition any other way than to just let Bautista play and see what he does next. But the updated PECOTA is by nature conservative, not just for Bautista but for all hitters. The reason this clashes with our expectations is because of a phenomenon called recency bias. Humans have a tendency to overweight more recent information at the expense of older information. Oone of the big benefits of using a forecasting system like PECOTA is that it forces us to confront our recency bias and to account for all of the information we know about a player.

The next question I expect is, "So why does Fangraphs have a much higher rest of season projection for Bautista than you do?” My answer is simple: they are wrong. Since this is obviously a statement from interest, I shall explain why this is the case:

Fangraphs uses a stat called wOBA as their all-encompassing batting rate; conceptually it and TAv are very similar. For our purposes, the main difference between them is that wOBA is baselined to the OBP scale rather than the batting average scale. (In Fangraph’s implementation, it is reconciled to the league OBP for that season, unlike TAv where the average is held constant over time.) Prior to the season, ZiPS (the projection system designed by Dan Szymborski) projected Bautista to have a .381 wOBA. This is not too far from where PECOTA had him; depending on how you want to handle converting between wOBA and TAv, these could be identical forecasts in terms of overall batting productivity.

Since the start of the season, Bautista has hit for a .516 wOBA (by Fangraph’s reckoning; other sites such as Statcorner figure wOBA slightly differently and thus come to different results) in 235 plate appearances. That gives Bautista a projected .415 wOBA for the rest of the season, equivalent to a TAv somewhere around .330 (depending on the assumed OBP for the league rest of season), significantly higher that what rest-of-season PECOTA says. If we were to take a weighted average of his preseason forecast and his season-to-date performance, in order for the numbers to equal his rest-of-season projection you'd have to treat them as worth 698 plate appearances, right around the number of PAs Bautista had in 2010 alone, whereas a projection that took into account the previous three seasons would be closer to 1500 PAs. ZiPS is underweighing Bautista's preseason projection in favor of his most recent performance. If the point of a projection system is to help overcome recency bias, this kind of a rest-of-season forecast helps less than it hurts–instead of combatting recency bias, it reinforces it.

And this is not an issue related to Bautista’s singular nature; let's look at the top twenty players in terms of absolute change between the preseason projection and the current rest-of-season forecast:

Name

PA

wOBA_Obs

wOBA_Pred

wOBA_ROS

wOBAscale_diff

TAvscale_diff

Russell Branyan

93

.243

.374

.339

.035

.028

Jose Bautista

235

.516

.381

.415

.034

.027

Reed Johnson

69

.452

.295

.328

.033

.026

Matt Joyce

206

.440

.327

.357

.030

.024

Greg Dobbs

150

.369

.298

.325

.027

.021

Jose Molina

75

.391

.264

.290

.026

.020

Alberto Gonzalez

83

.215

.277

.253

.024

.019

Matt Kemp

254

.432

.337

.361

.024

.019

Eric Chavez

39

.357

.255

.279

.024

.019

Laynce Nix

139

.389

.312

.335

.023

.018

Jason Michaels

44

.215

.316

.295

.021

.017

Ryan Raburn

182

.256

.347

.326

.021

.017

Jhonny Peralta

201

.391

.318

.339

.021

.017

Brett Hayes

39

.427

.259

.280

.021

.017

Lance Berkman

207

.433

.359

.380

.021

.017

Paul Janish

180

.237

.293

.273

.020

.016

Alex Avila

180

.373

.306

.326

.020

.016

Chone Figgins

224

.209

.311

.291

.020

.016

Dan Uggla

242

.244

.353

.334

.019

.015

Now, looking at in-season PECOTA:

Name

PA

TAv_Obs

TAv_Pred

TAv_ROS

wOBAscale_diff

TAvscale_diff

Michael Saunders

152

.182

.247

.231

.020

.016

James Loney

230

.226

.269

.254

.019

.015

John Jaso

127

.221

.264

.250

.018

.014

Chase Headley

227

.286

.266

.253

.017

.013

Nate Schierholtz

129

.266

.265

.252

.017

.013

Emmanuel Burriss

45

.220

.230

.217

.017

.013

Gordon Beckham

211

.249

.263

.251

.015

.012

Buster Posey

185

.265

.296

.285

.014

.011

Brandon Wood

93

.210

.249

.238

.014

.011

Jerry Hairston

171

.237

.242

.231

.014

.011

Mark Ellis

222

.214

.247

.236

.014

.011

Jeff Mathis

118

.212

.213

.202

.014

.011

Omar Infante

241

.239

.262

.251

.014

.011

Drew Butera

111

.156

.209

.198

.014

.011

Ramon Castro

50

.229

.259

.248

.014

.011

Skip Schumaker

96

.184

.256

.245

.014

.011

Brandon Belt

67

.235

.291

.280

.014

.011

Cesar Izturis

29

.171

.218

.208

.013

.010

The ZiPS projections, first of all, show a lot more movement, equivalent to .028 points of TAv at its most extreme, compared to .016 for PECOTA. Saunders, in fact, is the only player on the PECOTA list with a larger change than the lowest player on the ZiPS list. The next notable thing is that the players on the ZiPS list seem to be much more likely to be established veterans, while the PECOTA list leans much heavily towards rookie players. Veterans, as a rule, should be less amenable to projection changes than younger, inexperienced players – when you have three full seasons of a guy in the majors, it should take more information to change your mind than it should for someone whose projection is based on less than a full MLB season and some translated minor league data; these lists show PECOTA behaving that way but not ZiPS.

The next question I anticipate is, when will these rest-of-season forecasts be available outside of the PFM? Right now I am working on incorporating the rest-of-season forecasts into the rest of our PECOTA offerings, including the 10-year forecasts, which I anticipate being able to debut sometime next week. We will also be offering updated in-season numbers for players who are not included on the depth charts in the very near future.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
cdmyers
6/07
Interesting. Does this mean that these updates are going to be reflected in the playoff odds reports too?
markpadden
6/07
Was research done as to the optimal weight to give to in-season stats vs. prior-season stats [that is, what weighting system has best predicted ROS stats historically]? Or was it assumed that current year stats are exactly as informative those from three years ago (for veterans)? I.e., is there evidence that Fangraphs' weighting system is not optimal, and yours is?

Also, I think there is a typo in Headley's numbers above. He has outperformed projections thus far, but his ROS is lower than his pre-season projection.

mrdannyg
6/07
I'm curious as to the level of testing as well. It is a significant statement to say that fangraphs is "wrong", then simply point out different methodologies. By this logic, PECOTA without any 2011 data would be even more accurate, since it has less recency to it. Obviously that is not true, as 2011 data will add to the accuracy.

The point of of projection is not to overcome recency bias. Sure it is a benefit of a robust projection system, but hardly the point of it. A significant measure of the accuracy of a projection system is how well it weights current and previous performance. Fangraphs measure appears to more heavily weight in-season performance than BP's. That doesn't make it wrong.

I'd like to see some actual analysis as to why more or less in-season weighting is more accurate, though I'd imagine the reason each measure uses different weights is because each site's analysis yields different results. It is a complex problem, and hopefully we can see more analysis, instead of presumptions.

The statement about veterans being less susceptible to variaton is almost certainly accurate, but the analysis done in the last couple paragraphs/charts is far from robust.
herron
6/07
"I'd like to see some actual analysis as to why more or less in-season weighting is more accurate, though I'd imagine the reason each measure uses different weights is because each site's analysis yields different results. It is a complex problem, and hopefully we can see more analysis, instead of presumptions. "


This what I was trying to get across in my post below.
Guancous
6/07
Wow, this is quite a bombshell to drop on Draft Day.

It's the equivalent to seeing a politician admit to lewd pictures before a holiday weekend.
herron
6/07
"My answer is simple: they are wrong."

ZiPS may be wrong to weight recent performance more heavily than PECOTA, but it's certainly not wrong about Bautista. I will take the over on .414 wOBA all day.

And considering you readily admit (and wouldn't we all?) that Bautista is a once in a lifetime (or rarer!) situation, why bother to even remark on, let alone launch a full fledged defense of PECOTA revolving around the projection of maybe the biggest breakout in the history of baseball? We'll give you a pass for whiffing on that one.

What would be much more interesting to me is a full exploration into whether PECOTA is too conservative or ZiPS is too aggressive, with respect to weighting recent performance. These numbers you've posted show me that this is indeed true, but not which system is more accurate.
triebs2
6/07
Is this why all of the PECOTA cards are broken this morning? OK my sample size for "all" is a handful of Twins starting pitchers, but they were all messed up.
cwyers
6/07
Right now the server is in the middle of running the update process (I had to rerun it this morning to catch some corrections I made to the raw play-by-play data feed we get). PECOTA finished updating a few minutes ago - the cards I checked look okay to me now, but if you see any that are behaving improperly, let me know and I'll take a look.
doog7642
6/07
Two things:

1. I'm really looking forward to PECOTA's work on minor leaguers, which feels long overdue. PECOTA's love for Dustin Pedroia well before he hit the bigs had me well ahead of my dynasty league-mates. I'm looking forward to an objective look at the minors, and I'll say again that a "PECOTA takes on the prospects" series would be a fantastic complement to Kevin Goldstein's work. Of course, some of that hope is tempered by...

2. ...your statement, backed by the evidence of ridiculously low breakout scores, that PECOTA is very conservative for hitters. I suppose people are looking for different things from PECOTA, but I am nervous that making it so conservative has taken away its edge in making bold predictions that help us to find the diamond in the rough. Please don't be so shell-shocked by the Matt Wieters thing that you throw the baby out with the bathwater. Was PECOTA overconfident about Howie Kendrick and Wily Mo Pena? Yeah, but it nailed Pedroia and Zobrist and perhaps even Bruce and Asdrubal, if I remember correctly from a couple iterations ago. I ask that you consider laying off the reigns when it comes to hitter projections, and let us take the wheat with the chaff.
makewayhomer
6/07
agree that before you say PECOTA ROS >> ZIPS ROS, you need data to back that up. maybe more recency bias can be correct? but it's not incorrect just because you say so
markpadden
6/07
You should take a snapshot of ROS projections from Fangraphs and from PECOTA right now, and then evaluate at the end of the season. A single year won't prove anything conclusively, but if the difference in accuracy is huge, it will certainly suggest which method is superior.
fieldofdreams
6/07
Can someone explain Jonny Venters' projection? Guy has been lighting it up for two years and PECOTA things he going to put a 4.00 ERA??
TheRedsMan
6/08
Are you regressing the aggregate TAv or the underlying components? I ask because I assume that the components gain reliability at different rates. For example, a high TAv driven by a batting average spike is less likely to represent a shift in true talent than is one driven by walks or power. Perhaps BP and Fangraphs are handling this differently?
mikefast
6/08
Colin gives more detail on this subject here:
http://www.baseballprospectus.com/article.php?articleid=14171