keyboard_arrow_uptop
Image credit: Steve Mitchell-USA TODAY Sports

Introduction

Building a winning team from the ground up is a difficult proposition. Add constraints such as salary ceilings, player positioning, and player incompatibilities and it’s practically impossible. Until now.

In this article, we’ll show you an easy technique to define and build your optimal team, taking into account any limitations on your team structure. You’ll gain insights that will allow you to view baseball with a new perspective, whether you’re a baseball historian or want to make picks for your own fantasy baseball team. 

Today, player value is often captured using metrics such as WARP (Wins Above Replacement Player) that are effective, but one-dimensional in that they don’t allow for layering constraints. By using mathematical and algorithmic techniques, you can add conditions to your player selections and see the game in a new light. What if you were to pick modern players and dictate their maximum combined salary or maximum average age?  Or you create an All-Star team and require that it includes at least one player from each franchise? Or you want to build a team for the Japanese league and are allowed only four foreign players? The possibilities are endless, and the results are often non-obvious, shocking, or even heretical to baseball fans. 

In this article, we’ll explore one scenario by attempting to select an all-time All-Star team with a twist. We will determine the greatest team ever, composed of the most exceptional players to have played the game. Like most teams, we will pick exactly one player in each of the 10 positions (eight fielders plus a LHP and RHP).  However, we want to make this team representative of baseball’s history so we’ll add a condition that our team must include exactly one player from each of the last ten decades.

In the text below, we’ll introduce a powerful technique called integer programming. Integer programming allows for creating different conditions and provides corresponding solutions that satisfy the conditions. We’ll use this technique to select our All-Star team of the ten best players, each from a different decade. And finally, we’ll point you to code that will allow you to examine the data, run your own analyses, and experiment with picking a team subject to your own conditions.

If you want to skip ahead, view the MLB All-Century Team below.


Try building your own all-decade team here


The Rules for the Team

Our team will be composed of 10 players, including:

1 left-handed pitcher (LHP)

1 right-handed pitcher (RHP)

3 outfielders (OF)

1 each of C, 1B, 2B, SS, 3B

We will select exactly one player from each decade from 1920 to 2010. We chose this ten-decade period because it allows for a 1:1 match for a ten-person team, and because pitchers prior to 1920 pitched many more innings and games than modern-day pitchers, so it becomes difficult to compare performances. 

The quality of our team will be measured using the sum of each player’s WARP for the selected decade. Our goal is to maximize our team’s WARP while choosing a team subject to the restrictions above.

The Fine Print

Luck plays a role here. In order to make this team, a player has to both dominate his position and the decade in which he played. Some players were unlucky on the grouping of time periods by decade. For instance, Randy Johnson is one of the most dominant left-handed pitchers of all time, but his best years spanned two decades, running from 1993-2005, so he will have trouble making our team.

We also choose players’ positions based on where they played the majority of their games in a given decade. Thus, we consider Alex Rodriguez a shortstop in the 1990s, but a third baseman in the 2000s.

Although each player is evaluated using WARP, it’s not a perfect measure due to many factors, including differences in games per season in different eras, limitations of recorded data in earlier time periods, differences in methods to calculate WARP, and the effects of segregation on the league talent level. Note that using WARP is just one method to measure players’ value; different measures will undoubtedly lead to selection of a different team. 

We use the best WARP metrics we have available based on recorded statistics. For the years from 1950 to today, we use BWARP and WARP from Baseball Prospectus for batters and pitchers, respectively. For the years prior to 1950, we use WAR from Baseball Reference. These measures take into account both offensive and defensive statistics. 

We did not prohibit a single player from appearing twice on the same team.    For instance, the Giants all-decade team has both a 1990s Barry Bonds and a 2000s Barry Bonds.    We could have eliminated this duplicate by creating a new constraint.

Examining the Players

First, let’s consider the players with the highest WARP per decade for each position by looking at the tables below as examples. To form the best team, notice that we can’t just take the best player at each position. For example, both Babe Ruth and Rogers Hornsby produced dominant WARPs in 2B and OF, respectively, but our constraints do not allow us to pick multiple players from the 1920s. Also, note that the 1950s have a dearth of high-WARP players. We could take Warren Spahn as our 1950s/LHP player, but that would mean selecting him instead of LHP Lefty Grove, who has far and away the highest WARP of any LHP. The selection process of the team is not obvious by manual examination.

As examples we below show tables that indicate the best second basemen, outfielders, and left-handed pitchers since 1920.

Second Base

Name Dec Pos WARP BA OBP SLG H HR RBI AB R
Rogers Hornsby 1920 2B 93.1 0.382 0.46 0.637 2085 250 1153 5451 1195
Charlie Gehringer 1930 2B 61.2 0.331 0.414 0.507 1865 146 1003 5629 1179
Joe Morgan 1970 2B 58.2 0.282 0.408 0.455 1451 173 720 5139 1005
Frankie Frisch 1920 2B 54.1 0.326 0.377 0.456 1808 77 738 5554 992
Joe Gordon 1940 2B 45.7 0.27 0.358 0.459 1165 181 710 4314 680
Bobby Doerr 1940 2B 41.9 0.286 0.361 0.468 1407 164 887 4924 764
Eddie Collins 1920 2B 39.4 0.346 0.436 0.444 1333 22 520 3853 684
Rod Carew 1970 2B 39.2 0.343 0.411 0.454 1787 60 628 5211 837
Craig Biggio 1990 2B 38 0.297 0.389 0.441 1728 136 641 5823 1042
Roberto Alomar 1990 2B 38 0.308 0.386 0.46 1678 135 732 5443 951

Outfield

Name Dec Pos WARP BA OBP SLG H HR RBI AB R
Babe Ruth 1920 OF 102.4 0.355 0.488 0.74 1734 467 1338 4884 1365
Willie Mays 1960 OF 84 0.3 0.379 0.559 1635 350 1003 5459 1050
Barry Bonds 1990 OF 79.6 0.302 0.439 0.602 1478 361 1076 4894 1091
Hank Aaron 1960 OF 79.5 0.308 0.379 0.565 1819 375 1107 5912 1091
Frank Robinson 1960 OF 70.4 0.304 0.405 0.56 1603 316 1011 5265 1013
Rickey Henderson 1980 OF 69.9 0.291 0.405 0.436 1507 137 535 5173 1122
Mel Ott 1930 OF 68.6 0.313 0.42 0.56 1673 308 1135 5341 1095
Ted Williams 1940 OF 65.9 0.356 0.496 0.647 1303 234 893 3656 951
Barry Bonds 2000 OF 64 0.322 0.52 0.724 925 317 697 2871 772
Mike Trout 2010 OF 63.7 0.305 0.423 0.581 1324 285 752 4340 903

Left-Handed Pitchers

Name Dec Pos WARP Record ERA WHIP K IP
Lefty Grove 1930 LHP 80.8 199-76 2.91 1.243 1313 2399
Randy Johnson 1990 LHP 66.2 150-75 3.14 1.197 2538 2063
Clayton Kershaw 2010 LHP 64.3 156-61 2.31 0.962 2179 1996
Randy Johnson 2000 LHP 63.7 143-78 3.34 1.114 2182 1885
Warren Spahn 1950 LHP 62 202-131 2.92 1.18 1464 2823
Sandy Koufax 1960 LHP 60.5 137-60 2.36 1.005 1910 1808
Billy Pierce 1950 LHP 59.3 155-121 3.06 1.237 1487 2383
Carl Hubbell 1930 LHP 56 188-104 2.71 1.118 1281 2597
Hal Newhouser 1940 LHP 54.3 170-118 2.84 1.302 1579 2453
Fernando Valenzuela 1980 LHP 49.9 128-103 3.19 1.265 1644 2145

The final team (WARNING: Gory math ahead)

To select our best team, we draw on a technique called integer programming. Integer programming is closely related to linear programming, which is used in a wide variety of optimization problems including scheduling, crop rotation, and maximizing marketing and R&D effectiveness. Linear programming is a powerful technique that can be used to select optimal values for variables when there are linear constraints on the values of variables, and such that a linear function based on the variables is maximized or minimized. 

A linear programming problem can be formally stated as:

Given a set of coefficients b, and c (where b, and c are vectors of known coefficients), and a Matrix A of known coefficients, choose values for vector x such that we:

Maximize cTx

Subject to: Axb

and x 0

When all values of x are required to be integers, this type of optimization problems is called Integer Programming. When all values of x are required to be a value from {0,1} , this type of problem is called 0-1 Integer Programming. Although integer programming belongs to a class of difficult problems called NP-Complete, in practice integer programming problems can be solved in reasonable time using open source libraries and heuristics.

In our case, we formulate the problem in the following way:

x represents the vector we are trying to solve, where each value of x is either a 0 or a 1 representing whether that particular [player, decade] pair should be on the All-Star team.

c represents the vector of the WARP for each [player, decade] pair.

A represents the restriction on the All-Star team. 

We have two types of restrictions for our team:

  1. Position Restrictions: we are allowed to choose only three outfielders and one each of LHP, RHP, C, 1B, 2B, 3B, SS.
  2. Decade Restrictions: we are allowed to choose only one player for each decade from 1920 to 2010.

By using a python library called cvxpy, we can express our constraints and goals in fewer than 30 lines of code. We’ve included our source code and data here. This code is in the form of a Jupyter Notebook hosted on Kaggle, which makes it easy to examine the data and run the optimization. You can try it out yourself. 

The Results 

MLB All-Century Team (1920 – 2020)

Name Dec Pos WARP BA OBP SLG H HR RBI AB R
Rogers Hornsby 1920 2B 93.1 0.382 0.46 0.637 2085 250 1153 5451 1195
Jimmie Foxx 1930 1B 73.1 0.336 0.44 0.652 1845 415 1403 5495 1244
Lou Boudreau 1940 SS 60 0.3 0.385 0.422 1578 62 692 5268 758
Willie Mays 1960 OF 84 0.3 0.379 0.559 1635 350 1003 5459 1050
Rickey Henderson 1980 OF 69.9 0.291 0.405 0.436 1507 137 535 5173 1122
Barry Bonds 1990 OF 79.6 0.302 0.439 0.602 1478 361 1076 4894 1091
Alex Rodriguez 2000 3B 73 0.304 0.405 0.587 1740 435 1243 5732 1190
Buster Posey 2010 C 51.2 0.302 0.375 0.458 1378 140 673 4558 594
Name Dec Pos WARP Record ERA WHIP K IP
Warren Spahn 1950 LHP 62 202-131 2.92 1.18 1464 2823
Tom Seaver 1970 RHP 76.9 178-101 2.61 1.073 2304 2652

The results are surprising! We chose Rickey Henderson in the outfield and left off Babe Ruth, even though Babe’s WARP for the 1920s (102.3) was almost 10 wins higher than the next highest player (Rogers Hornsby) and almost 20 wins higher than the next highest outfielder (Willie Mays). But Ruth was left off the team because Rogers Hornsby thoroughly dominated his position in the 1920s. If we selected Ruth, we could not select Hornsby (due to our 1 player/ 1 decade constraints) and we’d lose even more WARP.  Similarly, Rickey Henderson made the team despite his 1980s WARP (69.9) being only the sixth highest among outfielders because the 1980s had a lack of high WARP players.

Other teams

The methods we used to create our all-time All-Star team can easily be modified to build other teams with unique conditions.   Below is an example of a Yankees all-decade team. In this case, we’ve gone back to 1900 to choose our players. Like in the previous example, we allow at most one player each decade, but in this case, we do not require a player from every decade since there are more decades than there are roster spots.

Yankees All Time Team (1900 – 2020)

Name Dec Pos WARP BA OBP SLG H HR RBI AB R
Kid Elberfeld 1900 SS 27.2 0.275 0.351 0.348 952 8 416 3464 505
Babe Ruth 1920 OF 102.4 0.355 0.488 0.74 1734 467 1338 4884 1365
Lou Gehrig 1930 1B 73.1 0.343 0.453 0.638 1802 347 1358 5255 1257
Joe Dimaggio 1940 OF 43.8 0.325 0.404 0.568 1156 180 786 3562 684
Mickey Mantle 1950 OF 60.2 0.311 0.426 0.569 1392 280 841 4478 994
Thurman Munson 1970 C 42.7 0.292 0.35 0.411 1536 112 692 5258 690
Willie Randolph 1980 2B 32.6 0.276 0.38 0.351 1326 37 402 4798 754
Alex Rodriguez 2000 3B 73 0.304 0.405 0.587 1740 435 1243 5732 1190
Name Dec Pos WARP Record ERA WHIP K IP
Whitey Ford 1960 LHP 36.5 115-56 2.83 1.173 1041 1609
David Cone 1990 RHP 55.3 141-85 3.21 1.21 1928 2017

While this team has fewer surprises than our first, there are still some unexpected results. For instance, although most of the big names like Ruth, Gehrig, and Mantle made this team, Derek Jeter is missing.  At least as far as this exercise is concerned, Jeter had the misfortune of having his best decade coincide with that of Alex Rodriguez.

Conclusion

Decisions made in baseball, like in the real world, often involve making tradeoffs to reach an end goal. Whether optimizing a team roster to meet salary constraints or choosing this year’s All-Star team while making sure that each franchise is represented, integer programming can be used as a straightforward technique to let your baseball imagination run wild. We encourage you to look at the code, be creative, and design your own teams. 

Watch for future Singlearity articles and tools for baseball analysis and decision making.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
menthol
3/09
Cool! Can you do this for all the franchises?
Joshua
3/09
Yes. Click on the link that says "TRY BUILDING YOUR OWN ALL-DECADE TEAM HERE" .
Craig Goldstein
3/09
Hey Menthol, you can actually check it out and fiddle around with each franchise here: https://www.baseballprospectus.com/all-decade-team-by-warp/
Matthew Gold
3/09
Fun article! How different would the results be if the decades started with years ending in a 5? That doesn't seem to be an option in the "create your own".
joshua.silver
3/09
Interesting question, and I think the answer is "really different". As an example, look at Randy Johnson who was absolutely dominant from 1995-2004 (close to 90 WARP for that 10-year period). Also, Buster Posey would undoubtedly be dropped from the team since he has only played from 2009-2019 so he would not align well on the decades started at the years ending in 5.

The "create your own" option doesn't let you change the 10-year boundary years, although it is a good suggestion.
Will
3/10
TIL Babe Ruth had a .740 SLG for an entire decade...