March 25, 2008
Before I kick off the 2008 season-preview series, with records and runs scored/runs allowed predictions for all 30 teams, I want to take a look back at last year's set to see if I can learn anything. This is sometimes an interesting exercise, and sometimes an excuse for navel-gazing, and I'm never sure which one it will be until I fire up the spreadsheet.
The first thing I noticed is that I absolutely nailed the run environment in 2007. My preseason predictions had 23,193 runs being scored in the major leagues last year. In reality, 23,305 runs were scored, a difference of just 112, or less than half a percent. For purposes of this exercise, I went ahead and adjusted the runs scored and runs allowed totals of every prediction by that half-percent-it's the "global error" in the predictions-to be consistent with past years, but that's splitting hairs. This is mostly luck, although I suppose it's evidence that I have some sense of where the game is at the moment. Nah, it's luck.
Note that the 2007 total is just 294 runs less than the 2006 total, which itself was up over 2005. If you're looking for a connection between "penalties for PED use" and run scoring, you'll have to look elsewhere, because the data doesn't support the hypothesis.
I wish I could say I was so accurate in predicting team performance. Keep in mind that my metric here is runs; while we measure success in wins and losses, those vary from runs scored and allowed in unpredictable ways during a season. We can't predict which teams will, like the Mariners and Diamondbacks, outperform their runs scored and allowed. So with that in mind, I measure my performance by the RS and RA columns, not W and L.
The Net Error Score-that's the total gap represented by the difference in my estimates of runs scored and allowed-was 3,345, more or less splitting the difference between my 2005 and 2006 marks. I was off by an average of 55.8 runs per team, which seems pretty lousy to me. My best and worst projections:
Actual Pred NES Team RS RA RS RA Twins 718 725 727 723 14 Mets 804 750 806 758 18 Yankees 968 777 942 788 36 Padres 733 657 701 671 46 Dodgers 735 727 777 731 53 Royals 706 778 825 901 250 Tigers 887 797 751 721 205 Marlins 790 891 701 791 182 White Sox 693 839 806 775 177 Diamondbacks 712 732 843 765 172
The Diamondbacks' listing as one of my worst predictions points out the difference between evaluating on runs versus record. I had picked the D'backs to go 89-73 for the best mark in the NL. They ended up 90-72 for the best mark in the NL. However, they failed to outscore their opponents, and the disappointing seasons by Stephen Drew, Carlos Quentin, and others helped me miss their runs scored by nearly a run per game--that's an enormous miss. The Marlins finished with exactly the 71-91 I predicted they would, only they did so by scoring 90 more runs and allowing 100 more. I shouldn't get credit for this.
If you wanted to, though, you could look at it this. Instead of using RS and RA figures, consider that what you're really trying to get at is the difference between the two, the margin. In that case, you'd have the Error Score. By this measure, I was third-closest on the Marlins (-90 predicted, -101 actual), with the Twins and Mets retaining their spots as the teams I came closest on. The Royals, however, jump from worst by NES to best by Error Score, with a mark of 4.
I've said in the past that I'm not sure which of these methods is best for evaluating a prediction, and until I can figure it out, I'll continue presenting both.
One area in which I need to improve is something simple: identifying which teams will outscore their opponents, and which ones will be outscored. It's one thing to miss on one side of the line, but if you can't accurately estimate which teams will be-and it is this simple-"good" or "bad", it's time to write a resume. Well, I missed on 40 percent of the field. Some of my errors were small and either meaningless or explainable; the Twins, for example, were -7 last year, whereas I had them as +4. That's insignificant. On the other side of the ledger, though, there were some massive mistakes that take some explaining:
We'll kick off the 2008 previews later this week. As always, consider the final numbers interesting, but pay more attention to the analysis around them. That, rather than the number, is what has the most value.