Happy Thanksgiving! Regularly Scheduled Articles Will Resume Monday, December 1
September 30, 2010
Aches and Pains
Any forecasting system is only as good as the inputs that go into it—once you get rolling from there you can certainly end up far worse than your data, but the quality and amount of data you have is a fundamental constraint.
So if you want to beat a forecaster, one fundamental question you can ask is, “What does the forecaster know and what doesn’t he know?” You’re far more likely to beat a competent forecaster on the second point than the first point.
One thing PECOTA hasn’t traditionally known is who was playing hurt and who wasn’t. Injuries can mean a number of things—sometimes you think a player who was nursing an injury is due for a bounceback season. Other times you think they’re likely to do worse than you’d otherwise expect, due to lingering injury effects.
For this to be useful to PECOTA, there needs to be a way to systemically capture this sort of information, study it objectively, and quantify the effect.
So what we’ve done is taken a publicly accessible injury database, created by Josh Hermsmeyer of RotoBase, and worked on proofing it and improving it for incorporation into PECOTA. (Once we’ve finished updating the database, we will be releasing it at some point during the offseason, for other researchers to use.) This tells us when a player goes on the disabled list, how long he’s there, and what he’s on there for.
As an example, let’s consider hitters who went to the disabled list with an injury to the lower arm (hand, wrist, or forearm). It’s widely accepted in baseball that wrist injuries have a lingering impact on a hitter’s ability to hit for power. This gives us 77 hitters to study, with 32,763 total plate appearances the following season.
Using the same method we used to look at Ichiro Suzuki yesterday, we can come up with an expected batting line for these hitters. As a group, weighted by playing time, they were expected to hit .266/.333/.427 the following year. Instead, they hit .270/.344/.439. So we can see that these hitters as a group tended to exceed their baseline forecasts.
Digging down to the component lines that form the “guts” of PECOTA, what we see is a significant effect on home runs on contact—projected to have a .039 HR/CON rate, they instead had a .047 rate. We also see an increase in unintentional walk rate (per plate appearance, minus intentional walks and hit by pitch)—from .083 to .086. That’s, statistically speaking, less likely to be significant than the finding on home runs on contact. But given the significance of the home runs on contact, I’m inclined to think it’s a result with practical significance. (My feeling is that the causal relationship is that pitchers are more likely to challenge hitters whose power has been sapped by wrist injuries.)
On one hand, this isn’t a particularly interesting finding—it pretty much confirms our expectations. What is interesting is that now we have a way to quantify what our intuition tells us about player injuries, and incorporate it into the forecasts in a systemic way.
What we can do from there is take the component batting lines, as well as the projections, and come up with the difference. We then regress those differences to come up with a set of adjustments to the baseline projections.
We can also use this record of how much time a player has missed to injury to figure out what players are most likely to miss playing time with an injury down the road. Let’s face it, some injuries are a product of circumstance, but there are some players who are more likely to get injured than others. And now we have the data to see who those players are.
This is also something we can deploy in-season; when a player goes on the disabled list, we can search for players in the database with similar injuries. We can then use that information to estimate how long he'll be missing and update his rest-of-season forecast accordingly, using data of how players with similar injuries have been affected upon their return.
PECOTA week continues Friday with our final article, and then at 1 p.m. Eastern I’ll be fielding your questions in a live chat. That’s not the end of the discussion, though—we’ll be talking about PECOTA more leading into the offseason and all the way through to the start of next season.