keyboard_arrow_uptop

In his 1984 Bill James Baseball Abstract, the third mass-market Abstract, Bill James introduced what he called “The World Series Prediction System.” Actually, he re-introduced it—the section in the Abstract was entitled “The World Series Prediction System, Revisited.” He’d developed it in 1972 and updated it in a 1982 Inside Sports magazine article that ran shortly before Inside Sports folded.

James’ system, he reported, picked 70 percent of World Series winners. His system was a Franken-stat that combined hitting, pitching, and fielding features, assigning points to various metrics, and selected the team with the most points as the likely winner.

His system was:

  • Give the team with the better record one point for each half-game difference in won-lost percentage.
  • Give three points to the team that scored the most runs.
  • Give 14 points to the team that hit fewer doubles.
  • Give 12 points to the team that hit more triples.
  • Give 10 points to the team that hit more home runs.
  • Give 8 points to the team with the lower batting average.
  • Give 8 points to the team with fewer errors.
  • Give 7 points to the team that had more double plays.
  • Give 7 points to the team that allowed more walks.
  • Give 19 points to the team that threw more shutouts.
  • Give 15 points to the team whose ERA was more below the league average.
  • Give 12 points to the team with most recent postseason experience. (In case of a tie, give the points to the team that had greater success.)
  • For intraleague series, give 12 points to the team with the better head-to-head record.

I know some of those weights seem screwy, but that’s how the numbers worked out. He looked at every postseason series and checked how often the winning team exhibited certain characteristics. Shutouts got a weight of 19 because, among the series he considered, the team with more shutouts won 19 more times than it lost. The team with the fewer doubles won 14 more times, the team with the lower relative ERA won 15 more times, etc. And there’s an element of intuitive sense; high-average offenses may be dependent on stringing a lot of singles and doubles together, while scoring in the postseason is often long ball-dependent.

As an example of James' system, consider the famous 1969 World Series between the Mets and the Orioles. The Mets had fewer doubles (14 points), more triples (12 points), a lower batting average (8 points), more double plays (7 points), more shutouts (19 points), and allowed more walks (7 points). The Orioles had a better record by nine games (18 points), scored more runs (3 points), had more home runs (10 points), fewer errors (8 points), a lower relative ERA (15 points), and more recent (i.e., ever) postseason experience (12 points). That’s 67 points for New York and 66 for Baltimore. The Mets won the World Series in five games.

Now, there’s a significant limitation to James’ system. His formula was printed in the 1984 Abstract, which means he had data through the 1983 season. That’s only 30 Championship Series to analyze (two each from 1969 through 1983), all in a best-of-five format. (The CS expanded to seven games in 1985.) The Division Series didn’t start until 1995, unless you count the oddball split-season 1981 postseason.

So James’ system, which is based on actual postseason results, is missing:

  • 31 World Series from 1984 to 2015, excluding 1994
  • 62 Championship Series from 1984 to 2015, excluding 1994
  • 84 Division Series from 1995 to 2015

That’s a lot of data!

So I decided to freshen up James’ formula, using data through the 2015 season. I included only seasons in the divisional-play era, from 1969 to present, for two reasons. First, I think one can make a persuasive case that the game has changed a lot since, say, the 1916 season, when the Brooklyn Robins got 10 points under James’ system for out-homering the Boston Red Sox, 28-14. Second, there’s an argument that the multiple-tier playoff system—Championship Series plus World Series beginning in 1969, with the Division Series added in 1995 and the Wild Card play-in starting in 2012—creates different determinants of postseason success, as fatigue and depth become factors.

I also added a few categories that weren’t in James’ initial formula (batter walks, batter and pitcher strikeouts, on base percentage, and slugging percentage), just to see whether they worked out. (By and large, they didn’t.) And I excluded the strike-shortened 1981 split-season. I’ll present the results as a series of questions.

Have the weights changed?

Yes, they have, by quite a bit. Here are the categories James identified, with their original weights and those calculated by looking exclusively at 1969-2015:

Hitting category

Original Weight

Revised Weight

More runs

3

14

Fewer doubles

14

13

More triples

12

(17)

More homers

10

13

Fewer walks

14

Fewer strikeouts

26

Lower BA

8

(23)

Higher OBP

13

Higher SLG

11

Defensive category

Fewer errors

8

11

More DPs

7

(2)

Pitching category

More strikeouts

3

More walks

7

8

More shutouts

19

18

Lower relative ERA

15

13

Overall category

Better overall record

1/.5 gm

8

Recent experience

12

22

Better head-to-head

12

(3)

Some weights have changed significantly. For example, when James did his analysis, teams with a lower batting average had done better in the postseason than teams with a higher batting average, by a little. Since 1969, teams with a higher batting average have done better, by a lot. Where applicable, head-to-head record was meaningful; it’s not so much anymore. Hitting more triples was a good thing, now it isn’t. Scoring more runs was mildly positive, now it’s a big positive.

But before we go too far with this, let’s move on to our second question.

Does the type of series make a difference?

Yes, it turns out, it does. James lumped together World Series and Championship Series, because there weren’t many of the latter. He didn’t have sufficient data to break them apart. Since 1969, there have been (excluding the 1981 and 1994 strike seasons) 45 World Series, 90 Championship Series, and 84 Division Series. How do they differ?

Hitting category

Original Weight

Revised Weight

Div. Series

Champ. Series

World Series

More runs

3

14

(5)

10

9

Fewer doubles

14

13

12

6

(5)

More triples

12

(17)

(13)

(14)

10

More homers

10

13

4

13

(4)

Fewer walks

14

12

(1)

3

Fewer strikeouts

26

21

2

3

Lower BA

8

(23)

(6)

(12)

(5)

Higher OBP

13

4

10

(1)

Higher SLG

11

(2)

8

5

Defensive category

Fewer errors

8

11

10

(3)

4

More DPs

7

(2)

4

(9)

3

Pitching category

More strikeouts

3

4

(10)

9

More walks

7

8

0

(7)

15

More shutouts

19

18

23

(3)

(2)

Lower relative ERA

15

13

6

8

(1)

Overall category

Better overall record

1/.5 gm

8

0

12

(4)

Recent experience

12

22

10

0

12

Better head-to-head

12

(3)

(5)

2

NA

That’s a lot of variance. The team with more triples has won the World Series more often, but has been at a disadvantage in the Division Series and Championship Series. Having fewer batter strikeouts and more shutouts are a big advantage in the Division Series, but aren’t much of a factor beyond that. Recent postseason experience has translated into more success in Division Series and World Series, but not Championship Series. Those and other differences, it would seem, augur in favor of different formulae for different postseason series.

James' original formula had 12 variables, plus one for intraleague head-to-head records. So let’s develop new formulae with roughly the same number of inputs, based on the table above.

DIVISION SERIES:

  • Give 5 points to the team that scored fewer runs
  • Give 12 points to the team that hit fewer doubles
  • Give 13 points to the team that hit fewer triples
  • Give 12 points to the team whose batters had fewer walks
  • Give 21 points to the team whose batters had fewer strikeouts
  • Give 6 points to the team that had a higher batting average
  • Give 10 points to the team that had fewer errors
  • Give 23 points to the team that pitched more shutouts
  • Give 6 points to the team with the lower ERA
  • Give 10 points to the team with more recent postseason experience
  • Give 5 points to the team with the worse head-to-head record

CHAMPIONSHIP SERIES:

  • Give 10 points to the team that scored more runs
  • Give 6 points to the team that hit fewer doubles
  • Give 14 points to the team that hit fewer triples
  • Give 13 points to the team that hit more home runs
  • Give 12 points to the team with the higher batting average
  • Give 10 points to the team with the higher on base percentage
  • Give 8 points to the team with the higher slugging percentage
  • Give 9 points to team that turned fewer double plays
  • Give 10 points to the team whose pitchers had fewer strikeouts
  • Give 7 points to the team whose pitchers allowed fewer walks
  • Give 8 points to the team with the lower ERA
  • Give 12 points to the team with better record

WORLD SERIES:

  • Give 9 points to the team that scored more runs
  • Give 5 points to the team that hit more doubles
  • Give 10 points to the team that hit more triples
  • Give 4 points to the team that hit fewer home runs
  • Give 5 points to the team with the higher batting average
  • Give 5 points to the team with the higher slugging percentage
  • Give 4 points to the team that committed fewer errors
  • Give 9 points to team whose pitchers had more strikeouts
  • Give 15 points to the team whose pitchers allowed more walks
  • Give 9 points to the team with the higher ERA relative to league average
  • Give 4 points to team with the worse overall record
  • Give 12 points to the team with the more recent postseason experience

Two small notes: I found almost no evidence that postseason success related to overall record is scaled by the magnitude of the difference, so I didn’t assign more points, as James did, to teams based on the size of the difference in won-lost records. And I ignored interleague won-lost record for World Series contestants, since the sample sizes, if nonzero, are tiny.

Before we see how what the revised system says about 2016, let’s backtest:

  • Division Series: Correct 59, incorrect 24 (no selection in one series due to a tie)–71.1 percent
  • Championship Series: Correct 57, incorrect 33–63.3 percent
  • World Series: Correct 32, incorrect 13–71.1 percent
  • Overall: Correct 148, incorrect 70–67.9 percent

That’s pretty good! The 68 percent overall success rate compares to James’ 70 percent reported in the spring of 1984. Let’s apply it to this season.

What does the system say about 2016?

Well, the system got both ALDS series right, favoring Toronto over Texas and Cleveland over Boston. It got the Giants-Cubs series wrong, assigning 71 points to San Francisco (fewer runs, fewer doubles, fewer walks, fewer strikeouts, higher BA, fewer errors, worse head-to-head) and 52 to Chicago (fewer triples, more shutouts, lower ERA, more recent postseason), but it saw the Dodgers beating the Nationals.

For the ALCS, the system gives a narrow edge to the Blue Jays over the Indians. Toronto gets points for fewer doubles, fewer triples, more homers, higher OBP, fewer pitcher strikeouts, and a lower ERA. The Indians get credit for a better record, more runs, higher BA and SLG, and fewer double plays, and the two teams issued the same number of walks.

It favors the Cubs (better record, more runs, more homers, higher BA, OBP, and SLG, fewer pitcher strikeouts, lower ERA) over the Dodgers (fewer doubles, fewer triples, fewer pitcher walks, fewer double plays) in the NLCS.

In the World Series, there are four possible scenarios. The system likes the Cubs against both the Indians and the Blue Jays. It favors Cleveland over Los Angeles. And the Dodgers over the Jays.

We’ll see how the system works as the postseason moves forward. And it goes without saying that any complaints should be addressed to Bill James, c/o Boston Red Sox.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
lipitorkid
10/14
Was it refreshing to assign stats to fit a percentage goal? While time intensive, seems easier (and a little more fun) than coming up with a brand new statistical model. Thanks for updating this.
mainsr
10/14
Yeah, lipitorkid, this may be the most unwieldy spreadsheet I've ever developed. Not the largest or most complex by any means, but man, a lot of false starts and error correction and whatnot. Once I had all the data, though, it was easy to figure out what variables to use and what the coefficients to use. The nearly 70% success rate was the last thing I calculated and a nice dividend.
jdg995
10/14
Fun article. Thanks.
mainsr
10/14
I appreciate it!
Grasul
10/14
It would be amusing to know how James' original system would have worked for the 31 seasons after it was published.
mainsr
10/14
That's a good idea. Now that I have all the data, I'll work on this. Check back here in a few days.
mainsr
10/18
I ran the numbers. Here's James's original system vs. the three revised figures in this report. DS: James 46-38 (54.8%), new 59-24 (70.2%) CS: James 50-40 (55.6%), new 57-33 (63.3%) WS: James 24-21 (53.3%), new 32-13 (71.1%) None of this, of course, should be viewed as an indictment of James. He had the creativity to come up with a cool Frankenstat, and I'm sure that if he would've updated the weights similarly to the way I did if he felt so inclined.
collins
10/14
Rather than supposing that it is an advantage in the division series to have scored fewer runs while an advantage in the championship series and world series, isn't it likelier that this is a small sample size issue? I think it would be better to lump all the postseason series together to get a better sample size.
mainsr
10/15
Yeah, John, I thought about but that, particularly with regards to the WS, as I had 45 WS in my study. But I had a pretty robust dataset for the DS (84 series) and CS (90 series). By contrast, if I did my math right, James did his study in '84 based on 105 series, total, and seven of those are from 1981, which, as you could gather from my text, I don't view as a legitimate postseason. Combining the series together would absolutely yield a larger sample size but with lower accuracy, of course. I'm OK with the DS and CS sample sizes compared to what James had, and since the results are in some cases so different (suggesting, to me, e.g., that contact hitting is important in the DS, while overall offense is more important in the CS), I decided to separate them. As with Grasul's comment, I'll try to run the numbers using the combined numbers and let you know what I get; check back early next week.
collins
10/15
Thanks.
mainsr
10/18
OK, using this this equation implied by the first table above: 14 points for more runs 13 points for fewer doubles 17 points for fewer triples 13 points for more homers 14 points for fewer batter walks 26 points for fewer batter strikeouts 23 points for higher BA 13 points for higher OPB 11 points for higher SLG 11 points for fewer errors 18 points for more shutouts 13 points for lower relative ERA 22 points for most recent experience Here's what I get: DS 55-29 (65.5%) using this method, 59-24 (70.2%) using mine CS 51-39 (56.7%) using this method, 57-33 (63.3%) using mine WS 24-21 (53.3%) using this method, 32-13 (71.1%) using mine Overall 130-89 (59.4%) using this method, 148-70 (67.9%) using mine You're right, not that far apart. I'm not surprised the one-in-all formula breaks down for the WS, which is the smallest sample size.