
Introduction
Building a winning team from the ground up is a difficult proposition. Add constraints such as salary ceilings, player positioning, and player incompatibilities and it’s practically impossible. Until now.
In this article, we’ll show you an easy technique to define and build your optimal team, taking into account any limitations on your team structure. You’ll gain insights that will allow you to view baseball with a new perspective, whether you’re a baseball historian or want to make picks for your own fantasy baseball team.
Today, player value is often captured using metrics such as WARP (Wins Above Replacement Player) that are effective, but one-dimensional in that they don’t allow for layering constraints. By using mathematical and algorithmic techniques, you can add conditions to your player selections and see the game in a new light. What if you were to pick modern players and dictate their maximum combined salary or maximum average age? Or you create an All-Star team and require that it includes at least one player from each franchise? Or you want to build a team for the Japanese league and are allowed only four foreign players? The possibilities are endless, and the results are often non-obvious, shocking, or even heretical to baseball fans.
In this article, we’ll explore one scenario by attempting to select an all-time All-Star team with a twist. We will determine the greatest team ever, composed of the most exceptional players to have played the game. Like most teams, we will pick exactly one player in each of the 10 positions (eight fielders plus a LHP and RHP). However, we want to make this team representative of baseball’s history so we’ll add a condition that our team must include exactly one player from each of the last ten decades.
In the text below, we’ll introduce a powerful technique called integer programming. Integer programming allows for creating different conditions and provides corresponding solutions that satisfy the conditions. We’ll use this technique to select our All-Star team of the ten best players, each from a different decade. And finally, we’ll point you to code that will allow you to examine the data, run your own analyses, and experiment with picking a team subject to your own conditions.
If you want to skip ahead, view the MLB All-Century Team below.
Try building your own all-decade team here
The Rules for the Team
Our team will be composed of 10 players, including:
1 left-handed pitcher (LHP)
1 right-handed pitcher (RHP)
3 outfielders (OF)
1 each of C, 1B, 2B, SS, 3B
We will select exactly one player from each decade from 1920 to 2010. We chose this ten-decade period because it allows for a 1:1 match for a ten-person team, and because pitchers prior to 1920 pitched many more innings and games than modern-day pitchers, so it becomes difficult to compare performances.
The quality of our team will be measured using the sum of each player’s WARP for the selected decade. Our goal is to maximize our team’s WARP while choosing a team subject to the restrictions above.
The Fine Print
Luck plays a role here. In order to make this team, a player has to both dominate his position and the decade in which he played. Some players were unlucky on the grouping of time periods by decade. For instance, Randy Johnson is one of the most dominant left-handed pitchers of all time, but his best years spanned two decades, running from 1993-2005, so he will have trouble making our team.
We also choose players’ positions based on where they played the majority of their games in a given decade. Thus, we consider Alex Rodriguez a shortstop in the 1990s, but a third baseman in the 2000s.
Although each player is evaluated using WARP, it’s not a perfect measure due to many factors, including differences in games per season in different eras, limitations of recorded data in earlier time periods, differences in methods to calculate WARP, and the effects of segregation on the league talent level. Note that using WARP is just one method to measure players’ value; different measures will undoubtedly lead to selection of a different team.
We use the best WARP metrics we have available based on recorded statistics. For the years from 1950 to today, we use BWARP and WARP from Baseball Prospectus for batters and pitchers, respectively. For the years prior to 1950, we use WAR from Baseball Reference. These measures take into account both offensive and defensive statistics.
We did not prohibit a single player from appearing twice on the same team. For instance, the Giants all-decade team has both a 1990s Barry Bonds and a 2000s Barry Bonds. We could have eliminated this duplicate by creating a new constraint.
Examining the Players
First, let’s consider the players with the highest WARP per decade for each position by looking at the tables below as examples. To form the best team, notice that we can’t just take the best player at each position. For example, both Babe Ruth and Rogers Hornsby produced dominant WARPs in 2B and OF, respectively, but our constraints do not allow us to pick multiple players from the 1920s. Also, note that the 1950s have a dearth of high-WARP players. We could take Warren Spahn as our 1950s/LHP player, but that would mean selecting him instead of LHP Lefty Grove, who has far and away the highest WARP of any LHP. The selection process of the team is not obvious by manual examination.
As examples we below show tables that indicate the best second basemen, outfielders, and left-handed pitchers since 1920.
Second Base
Name | Dec | Pos | WARP | BA | OBP | SLG | H | HR | RBI | AB | R |
Rogers Hornsby | 1920 | 2B | 93.1 | 0.382 | 0.46 | 0.637 | 2085 | 250 | 1153 | 5451 | 1195 |
Charlie Gehringer | 1930 | 2B | 61.2 | 0.331 | 0.414 | 0.507 | 1865 | 146 | 1003 | 5629 | 1179 |
Joe Morgan | 1970 | 2B | 58.2 | 0.282 | 0.408 | 0.455 | 1451 | 173 | 720 | 5139 | 1005 |
Frankie Frisch | 1920 | 2B | 54.1 | 0.326 | 0.377 | 0.456 | 1808 | 77 | 738 | 5554 | 992 |
Joe Gordon | 1940 | 2B | 45.7 | 0.27 | 0.358 | 0.459 | 1165 | 181 | 710 | 4314 | 680 |
Bobby Doerr | 1940 | 2B | 41.9 | 0.286 | 0.361 | 0.468 | 1407 | 164 | 887 | 4924 | 764 |
Eddie Collins | 1920 | 2B | 39.4 | 0.346 | 0.436 | 0.444 | 1333 | 22 | 520 | 3853 | 684 |
Rod Carew | 1970 | 2B | 39.2 | 0.343 | 0.411 | 0.454 | 1787 | 60 | 628 | 5211 | 837 |
Craig Biggio | 1990 | 2B | 38 | 0.297 | 0.389 | 0.441 | 1728 | 136 | 641 | 5823 | 1042 |
Roberto Alomar | 1990 | 2B | 38 | 0.308 | 0.386 | 0.46 | 1678 | 135 | 732 | 5443 | 951 |
Outfield
Name | Dec | Pos | WARP | BA | OBP | SLG | H | HR | RBI | AB | R |
Babe Ruth | 1920 | OF | 102.4 | 0.355 | 0.488 | 0.74 | 1734 | 467 | 1338 | 4884 | 1365 |
Willie Mays | 1960 | OF | 84 | 0.3 | 0.379 | 0.559 | 1635 | 350 | 1003 | 5459 | 1050 |
Barry Bonds | 1990 | OF | 79.6 | 0.302 | 0.439 | 0.602 | 1478 | 361 | 1076 | 4894 | 1091 |
Hank Aaron | 1960 | OF | 79.5 | 0.308 | 0.379 | 0.565 | 1819 | 375 | 1107 | 5912 | 1091 |
Frank Robinson | 1960 | OF | 70.4 | 0.304 | 0.405 | 0.56 | 1603 | 316 | 1011 | 5265 | 1013 |
Rickey Henderson | 1980 | OF | 69.9 | 0.291 | 0.405 | 0.436 | 1507 | 137 | 535 | 5173 | 1122 |
Mel Ott | 1930 | OF | 68.6 | 0.313 | 0.42 | 0.56 | 1673 | 308 | 1135 | 5341 | 1095 |
Ted Williams | 1940 | OF | 65.9 | 0.356 | 0.496 | 0.647 | 1303 | 234 | 893 | 3656 | 951 |
Barry Bonds | 2000 | OF | 64 | 0.322 | 0.52 | 0.724 | 925 | 317 | 697 | 2871 | 772 |
Mike Trout | 2010 | OF | 63.7 | 0.305 | 0.423 | 0.581 | 1324 | 285 | 752 | 4340 | 903 |
Left-Handed Pitchers
Name | Dec | Pos | WARP | Record | ERA | WHIP | K | IP |
Lefty Grove | 1930 | LHP | 80.8 | 199-76 | 2.91 | 1.243 | 1313 | 2399 |
Randy Johnson | 1990 | LHP | 66.2 | 150-75 | 3.14 | 1.197 | 2538 | 2063 |
Clayton Kershaw | 2010 | LHP | 64.3 | 156-61 | 2.31 | 0.962 | 2179 | 1996 |
Randy Johnson | 2000 | LHP | 63.7 | 143-78 | 3.34 | 1.114 | 2182 | 1885 |
Warren Spahn | 1950 | LHP | 62 | 202-131 | 2.92 | 1.18 | 1464 | 2823 |
Sandy Koufax | 1960 | LHP | 60.5 | 137-60 | 2.36 | 1.005 | 1910 | 1808 |
Billy Pierce | 1950 | LHP | 59.3 | 155-121 | 3.06 | 1.237 | 1487 | 2383 |
Carl Hubbell | 1930 | LHP | 56 | 188-104 | 2.71 | 1.118 | 1281 | 2597 |
Hal Newhouser | 1940 | LHP | 54.3 | 170-118 | 2.84 | 1.302 | 1579 | 2453 |
Fernando Valenzuela | 1980 | LHP | 49.9 | 128-103 | 3.19 | 1.265 | 1644 | 2145 |
The final team (WARNING: Gory math ahead)
To select our best team, we draw on a technique called integer programming. Integer programming is closely related to linear programming, which is used in a wide variety of optimization problems including scheduling, crop rotation, and maximizing marketing and R&D effectiveness. Linear programming is a powerful technique that can be used to select optimal values for variables when there are linear constraints on the values of variables, and such that a linear function based on the variables is maximized or minimized.
A linear programming problem can be formally stated as:
Given a set of coefficients b, and c (where b, and c are vectors of known coefficients), and a Matrix A of known coefficients, choose values for vector x such that we:
Maximize cTx
Subject to: Ax ≤ b
and x ≥ 0
When all values of x are required to be integers, this type of optimization problems is called Integer Programming. When all values of x are required to be a value from {0,1} , this type of problem is called 0-1 Integer Programming. Although integer programming belongs to a class of difficult problems called NP-Complete, in practice integer programming problems can be solved in reasonable time using open source libraries and heuristics.
In our case, we formulate the problem in the following way:
x represents the vector we are trying to solve, where each value of x is either a 0 or a 1 representing whether that particular [player, decade] pair should be on the All-Star team.
c represents the vector of the WARP for each [player, decade] pair.
A represents the restriction on the All-Star team.
We have two types of restrictions for our team:
- Position Restrictions: we are allowed to choose only three outfielders and one each of LHP, RHP, C, 1B, 2B, 3B, SS.
- Decade Restrictions: we are allowed to choose only one player for each decade from 1920 to 2010.
By using a python library called cvxpy, we can express our constraints and goals in fewer than 30 lines of code. We’ve included our source code and data here. This code is in the form of a Jupyter Notebook hosted on Kaggle, which makes it easy to examine the data and run the optimization. You can try it out yourself.
The Results
MLB All-Century Team (1920 – 2020)
Name | Dec | Pos | WARP | BA | OBP | SLG | H | HR | RBI | AB | R |
Rogers Hornsby | 1920 | 2B | 93.1 | 0.382 | 0.46 | 0.637 | 2085 | 250 | 1153 | 5451 | 1195 |
Jimmie Foxx | 1930 | 1B | 73.1 | 0.336 | 0.44 | 0.652 | 1845 | 415 | 1403 | 5495 | 1244 |
Lou Boudreau | 1940 | SS | 60 | 0.3 | 0.385 | 0.422 | 1578 | 62 | 692 | 5268 | 758 |
Willie Mays | 1960 | OF | 84 | 0.3 | 0.379 | 0.559 | 1635 | 350 | 1003 | 5459 | 1050 |
Rickey Henderson | 1980 | OF | 69.9 | 0.291 | 0.405 | 0.436 | 1507 | 137 | 535 | 5173 | 1122 |
Barry Bonds | 1990 | OF | 79.6 | 0.302 | 0.439 | 0.602 | 1478 | 361 | 1076 | 4894 | 1091 |
Alex Rodriguez | 2000 | 3B | 73 | 0.304 | 0.405 | 0.587 | 1740 | 435 | 1243 | 5732 | 1190 |
Buster Posey | 2010 | C | 51.2 | 0.302 | 0.375 | 0.458 | 1378 | 140 | 673 | 4558 | 594 |
Name | Dec | Pos | WARP | Record | ERA | WHIP | K | IP |
Warren Spahn | 1950 | LHP | 62 | 202-131 | 2.92 | 1.18 | 1464 | 2823 |
Tom Seaver | 1970 | RHP | 76.9 | 178-101 | 2.61 | 1.073 | 2304 | 2652 |
The results are surprising! We chose Rickey Henderson in the outfield and left off Babe Ruth, even though Babe’s WARP for the 1920s (102.3) was almost 10 wins higher than the next highest player (Rogers Hornsby) and almost 20 wins higher than the next highest outfielder (Willie Mays). But Ruth was left off the team because Rogers Hornsby thoroughly dominated his position in the 1920s. If we selected Ruth, we could not select Hornsby (due to our 1 player/ 1 decade constraints) and we’d lose even more WARP. Similarly, Rickey Henderson made the team despite his 1980s WARP (69.9) being only the sixth highest among outfielders because the 1980s had a lack of high WARP players.
Other teams
The methods we used to create our all-time All-Star team can easily be modified to build other teams with unique conditions. Below is an example of a Yankees all-decade team. In this case, we’ve gone back to 1900 to choose our players. Like in the previous example, we allow at most one player each decade, but in this case, we do not require a player from every decade since there are more decades than there are roster spots.
Yankees All Time Team (1900 – 2020)
Name | Dec | Pos | WARP | BA | OBP | SLG | H | HR | RBI | AB | R |
Kid Elberfeld | 1900 | SS | 27.2 | 0.275 | 0.351 | 0.348 | 952 | 8 | 416 | 3464 | 505 |
Babe Ruth | 1920 | OF | 102.4 | 0.355 | 0.488 | 0.74 | 1734 | 467 | 1338 | 4884 | 1365 |
Lou Gehrig | 1930 | 1B | 73.1 | 0.343 | 0.453 | 0.638 | 1802 | 347 | 1358 | 5255 | 1257 |
Joe Dimaggio | 1940 | OF | 43.8 | 0.325 | 0.404 | 0.568 | 1156 | 180 | 786 | 3562 | 684 |
Mickey Mantle | 1950 | OF | 60.2 | 0.311 | 0.426 | 0.569 | 1392 | 280 | 841 | 4478 | 994 |
Thurman Munson | 1970 | C | 42.7 | 0.292 | 0.35 | 0.411 | 1536 | 112 | 692 | 5258 | 690 |
Willie Randolph | 1980 | 2B | 32.6 | 0.276 | 0.38 | 0.351 | 1326 | 37 | 402 | 4798 | 754 |
Alex Rodriguez | 2000 | 3B | 73 | 0.304 | 0.405 | 0.587 | 1740 | 435 | 1243 | 5732 | 1190 |
Name | Dec | Pos | WARP | Record | ERA | WHIP | K | IP |
Whitey Ford | 1960 | LHP | 36.5 | 115-56 | 2.83 | 1.173 | 1041 | 1609 |
David Cone | 1990 | RHP | 55.3 | 141-85 | 3.21 | 1.21 | 1928 | 2017 |
While this team has fewer surprises than our first, there are still some unexpected results. For instance, although most of the big names like Ruth, Gehrig, and Mantle made this team, Derek Jeter is missing. At least as far as this exercise is concerned, Jeter had the misfortune of having his best decade coincide with that of Alex Rodriguez.
Conclusion
Decisions made in baseball, like in the real world, often involve making tradeoffs to reach an end goal. Whether optimizing a team roster to meet salary constraints or choosing this year’s All-Star team while making sure that each franchise is represented, integer programming can be used as a straightforward technique to let your baseball imagination run wild. We encourage you to look at the code, be creative, and design your own teams.
Watch for future Singlearity articles and tools for baseball analysis and decision making.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
The "create your own" option doesn't let you change the 10-year boundary years, although it is a good suggestion.