At the beginning of each season, I sit down with a ridiculous number of resources and attempt to project each team’s runs scored and allowed, and from that, their final record. It’s a silly pastime, to be honest, and I don’t know if my doing it the way I do now-over a period of days surrounded by data-is any better than how I did it when I was a kid doing it for kicks with my best friend.

Last spring, I took the process one step further by evaluating the predictions, to see if there were any lessons to be learned, perhaps biases to be caught or blind spots to be narrowed. It turned out I’d had a better season than I thought I had, although there was something to be said for keeping an eye on how teams align their talent (White Sox), improve their defense (also White Sox), or perhaps have one signature skill that pushes them forward (like the Astros‘ rotation).

I ran the numbers on the 2006 predictions last night, and what I found was that I shouldn’t have. Once again, I missed on the league run level by a significant amount, 4.4%, predicting 22,601 runs in a season that saw 23,599. Offense was up over ’05 in MLB, and might have been up more if not for four months of odd results out of Denver. I didn’t see that coming, nor did many others in the third first year of Sooper Serious Steroid Sternness. For purposes of this exercise, we’ll adjust everyone’s predicted runs and allowed by 4.4%; that’s the global error by missing the run context of the league.

Once you do that, you find that I didn’t have as good a year as I did in 2005. The Net Error Score-the totals of all the misses between my adjusted RA and RS predictions and the actual ones-was 3,606, 803 runs worse than 2005. In other words, if my average prediction missed by about 47 runs in ’05, that jumped to about 60, or about 28% higher, in 2006. I was off across the board: my best NES score in ’06 was 25, with the Mariners, whereas in 2005 I had a microscopic NES of seven for the Mets. At the high end, I might as well have been pulling numbers out of a hat. The max NES in 2005 was 220, for the Devil Rays; I missed four teams by more than that in ’06, including three by at least 300 net runs. Let’s look at those three monster misses:

  • Washington Nationals, 587 AdjRS, 722 AdjRA predicted; 746 RS and 872 RA, actual. The Nats illustrate the difference between using Net Error Score-which dings you for symmetrical misses-as opposed to just looking at the gap between run differentials. The latter would show that I pegged the Nats pretty well. They were outscored by 126 runs, while I had them getting beat up to the tune of -128. I’m still not sure which method is best, and I think both have some value. In any event, I think two things happened here: one, I missed on the park effect; after playing like a canyon in ’05, RFK Stadium was a bit less daunting for hitters in ’06. Second, I never saw Alfonso Soriano coming, and his season was worth at least 30-40 runs more than I would have predicted. Those were the two biggest effects.

    The lesson? I didn’t ding Soriano as much for his home/road splits as a Ranger as for his undisciplined approach (which changed a little bit in 2006), but the splits were a factor. In truth, one-year or even multi-year park splits probably aren’t meaningful at the individual player level. With so much player movement this winter, I’ll have to watch for this kind of simplistic analysis, which isn’t very helpful.

  • Cincinnati Reds, 941 AdjRS, 918 AdjRA vs. 749 RS, 801 RA. I just missed on both sides of the equation again, as injuries to Ken Griffey Jr., a self-defeating midseason trade and a late-season fade by Adam Dunn dragged down what I thought was a great offense. At the same time, though, the Reds stayed competitive thanks to surprising pitching by Bronson Arroyo and the continued development of Aaron Harang, keeping their run prevention solid. As with the Nats, midseason roster changes were a factor, one for which I may never be able to fully account. It’s hard to identify a systemic reason for the miss here beyond erroneous evaluation of individual players.
  • Tampa Bay Devil Rays, 847 AdjRS, 999 AdjRA vs. 689 RS, 856 RA. This is becoming a bit of a tradition, as I missed badly on the Devil Rays in 2005 as well. I made the same mistake again, getting the differential right (off by 15 runs) but vastly overestimating both their offense and their lack of pitching. The latter problem is primarily underestimating Scott Kazmir, as well as the pitching they got from both ends of the Mark Hendrickson/Jae Seo trade, and some random relievers pitching well in a lot of innings. The offense…well, I saw things in Jorge Cantu and Jonny Gomes that never quite came to light. Delmon Young‘s early-season suspension kept him in Durham longer than expected, and B.J. Upton remains more an asset in theory than reality right now.

    As with the Reds, there’s not too much systemic there, although I may pull back on the high end of my runs-allowed estimates in general. There may be some irrational exuberance for young teams, especially young teams who’ve hired off some of BP’s finest.

I’m not seeing any magic bullets here the way I did last year, although hopefully analyzing the data this way imprints some lessons that I can integrate, even if I’m not entirely sure what they are yet.

Some would argue that analyzing won-loss records is a better way to do this. I disagree. Consider those Reds. Because they underperformed their runs scored and allowed, I only missed their won-loss record by three games (80-82 projected, 83-79 actual), and there’s no way I think that accurately reflects my evaluation. These two things generally move in step, but not always, and I’m inclined to get as close to the predictable element-the runs-that I can.

Looking at record, we find that I nailed the Pirates, Angels and White Sox, and was off by two or fewer games on seven others. I don’t buy it, though; I had a 141 NES on the Sox, and nailed the record because I got the differential mostly right. All that means is that I missed badly on both sides of the equation.

What else can we learn…well, I was closest (by NES) on the Mariners, Astros, Blue Jays and Phillies, all under 30 NES. I don’t see a common thread there…wait, I do. All four teams occupy the middle ground of the game, and all are largely veteran teams. It’s easier to predict the inside part of any curve, to be sure. I’m thinking older teams should be easier to predict not because all the players have reliable track records to go on, but because figuring out the distribution of playing time is much easier with a veteran team. I dare say that the four teams above were among the most stable last year in terms of who played and how often, save for trades that altered the rosters in season.

If you just look at differentials, you find that I was within two on the Dodgers, within three on the Mariners and within nine on the Nationals. This figure, the error score (Err), actually came down between ’05 and ’06 by more than 200 runs. The closing gap indicates that while I’m not doing a job of projecting overall RS and RA, I am getting better at sorting out which teams will be ahead in the runs department and which will not. This, to me, is a very basic skill; if I can’t reliably say which teams will be outscored and which ones will not, it’s time to write a resume.

I was disappointed by my work in this area in ’05, when I missed the direction of a team’s run differential in 10 of 30 cases. In 2006, I got that down to eight, and of those, five were near misses, involving teams who were within 10 runs of the line. I still don’t think that’s good enough, because it’s not 22 of 30, it’s more like nine of 17 once you consider that a third to a half of MLB team are locks in this category.

You’ll notice that I haven’t mentioned the Tigers yet. I missed them pretty badly, projecting a 78-84 mark for a team that went 95-67, and a -28 run differential for a team that ended up at +147. They were the second biggest miss by Err, behind only the Rockies, who I’m taking a mulligan on for 2006. In the Tigers case, I had the run environment right-1536 projected runs scored in Tigers games, 1497 actual-but the distribution wrong. That’s why the NES of 175 doesn’t crack the top five.

The two scores each provide valuable information, and I’m not sure I’ll be dropping either any time soon.

Let’s see, anything else notable…after adjusting for the offensive level, I nailed the runs allowed by the Cards, and was within two runs of both the Astros’ and Royals‘ marks in that category. I’d said last year that predicting offense was easier than predicting defense, but that didn’t hold in ’06 I was off by 1825 runs in the aggregate on offense, and just 1781 on defense. Neither score is impressive after last year’s 1312 and 1491.

There may not be a great lesson in this, but I think the exercise is valuable. Predictions don’t have to just be interesting late-March content, they can be a guide to how we evaluate teams. To make them valuable, we have to go back and analyze them-analyze our performance-the same way we would analyze the performance of players. Nate Silver does this with PECOTA, which is how it gets better and better each year. That’s my goal as well.