February 10, 2004
Baseball's Hilbert Problems
23 Burning QuestionsBaseball Prospectus 2000.
"Who of us would not be glad to lift the veil behind which the future lies hidden, to cast a glance at the next advances of our science and at the secrets of its development during future years? What particular goals will there be toward which the leading sabermetric spirits of coming generations will strive? What new methods and new facts in the wide and rich field of sabermetric thought will the new years disclose?"
Here at Baseball Prospectus, we're not completely immune to the general fascination with the recent turn of the world's odometer. So, with this edition marking the final year of the second millennium, let's take a look forward at what the third holds for us seamheads.
Our inspiration comes from a similar effort nearly 100 years ago. In 1900, a mathematician named David Hilbert addressed the International Congress of Mathematicians in Paris and delivered what was to become history's most influential speech about mathematics. Hilbert outlined 23 major problems to be studied in the coming century. In doing so he expressed optimism about the field, sharing his feeling that unsolved problems were a sign of vitality, encouraging more people to do more research.
The above quote is, in fact, a bastardization of the opening statements of Hilbert's speech. Hilbert referred to mathematics instead of sabermetrics and spoke in terms of "centuries" instead of "years." Given the relative youth of sabermetrics and baseball analysis compared to math, it's appropriate to use a period of smaller scope than Hilbert. The quotes that appear periodically throughout this essay are similarly taken from Hilbert's speech and altered to refer to baseball analysis.
Hilbert's address was much more than a collection of problems. It outlined a philosophy of mathematics, and the problems put forth were ones relevant to that philosophy. By putting forth our own "Hilbert problems" for baseball analysts of the future, Baseball Prospectus is outlining our philosophy for how and why this kind of work ought to be done--our attempt to provide inspiration and guidance to the baseball community at large.
"It is difficult and often impossible to judge the value of a problem correctly in advance; for the final award depends upon the gain which science obtains from the problem. Nevertheless we can ask whether there are general criteria which mark a good sabermetric problem."
We used the following criteria to guide our selection of the baseball research problems for the future:
1) To be relevant, sabermetrics must inform a decision: data for data's sake is not useful. Bill James once defined sabermetrics as the search for objective knowledge about baseball. While this is still true, it doesn't cut to the heart of the matter. A list of players, cross-referenced by preferred breakfast cereal and astrological sign, is objective knowledge, but it isn't what anyone would call useful information. There is already too much irrelevant data clogging up the airwaves and the Web. Baseball analysis must focus on knowledge that can lead to an action or a commitment of resources (time, effort, money) by someone who wants to study the game. The decision can be anything from in-game tactical moves to judging player acquisitions. It can be prospect evaluation or an MVP or Hall of Fame ballot. It can even be some personal idiosyncratic award for things you might think are important. But, in order to produce useful information, you have to start with a relevant question that needs answering.
2) The industry of baseball encompasses more than just the action on the field. To be relevant to the sport as it's practiced today, baseball analysis must expand to explicitly consider the economic, social, technological, competitive and governmental contexts in which the game operates.
3) The amount of potential information is larger than the amount of information that is available today. In raising some of these research questions, we acknowledge that the resources to answer them may not yet be available. It may be years or decades before there's sufficient effort, technology or understanding to create a systematic collection of observations needed to resolve some of these issues. However, by recognizing the importance of the problem itself, we can hope to guide the efforts to acquire new information in a manner that is consistent with the problems we want to solve.
4) Numbers alone are not data, and solving equations is not analysis. Some data can be expressed as numbers, and judicious use of mathematics can yield analytical insights, but we should not abandon a line of reasoning for lack of quantification or a failure to find a tidy formula.
"If we would obtain an idea of the probable development of sabermetric knowledge in the immediate future, we must let the unsettled questions pass before our minds and look over the problems which the science of today sets and whose solution we expect from the future. To such a review of problems the present day, lying at the meeting of the centuries, seems to me well adapted. For the close of a great epoch not only invites us to look back into the past but also directs our thoughts to the unknown future."
Baseball Prospectus' Hilbert Problems for the Next Century
1) Separating defense into pitching and fielding.
This is one of the oldest and most vexing problems in baseball analysis. Pitching and fielding are so intertwined that they seem impossible to separate. That doesn't mean we shouldn't try.
2) Evaluating interrelationships among teammates' defensive performances.
Does having a good shortstop make the second baseman or third baseman better? Does it show up in the numbers? Does a Gold Glove center fielder cut into the apparent defensive performance of a corner outfielder? Can a poor defensive player's shortcomings be covered for by pairing him with a stellar glove man at an adjacent position?
3) Measuring the catcher's role in run prevention.
In Baseball Prospectus 1999, Keith Woolner presented a compelling case that catchers do not have a noticeable effect on a pitcher's performance. If there is no "game-calling" effect, what impact does a catcher have? Is it primarily controlling the running game? If so, how much of that is attributable to the pitching staff? Is it in preventing wild pitches and passed balls, thus giving the pitcher more confidence to keep the ball low? What about reading a pitcher's physical state and helping to keep his pitch count low? We've made some important first steps, but there's still a lot we don't know about evaluating catcher defense. (Ed. note: A follow-up essay on catchers' defense, by Keith Woolner, appears in the soon-to-be-released Baseball Prospectus 2004.)
4) Mapping career trajectories for defensive performance.
The phenomenon of the "Age 27" peak for offensive performance is well documented. However, while we still struggle with developing a reliable assessment of defensive performance, little attention has been paid to how a player's defensive skills change as he ages. Do a player's defensive skills peak earlier in his career? Later? Do strong arms last longer than quick feet? Does defensive longevity vary by position, or by the particular mix of skills a player has? Do difficult positions such as shortstop and catcher wear a player out faster?
5) Making an assessment of relative positional difficulties.
Much is made of the "defensive spectrum," where positions are thought of as if they were laid out on a ruler, with shortstop at the high/difficult end of the spectrum and first base at the low end. This makes intuitive sense and matches well with the observed differences in offensive performance: Generally, there are fewer good-hitting shortstops from which to choose, which implies a lower average offensive performance level for all shortstops. That also means that first basemen generally out-hit shortstops.
6) Quantifying the value of positional flexibility.
A player who plays two positions at a league-average level gives his manager flexibility, both in setting up the team's roster and using in-game strategies. Positional value methods that are based on playing time at a position are inadequate because they would penalize a player for time spent at the lesser position, even if a comparable offensive player who plays full-time at the more difficult position was unable to play the easier position. Because roster spots are scarce, a team gets value from a player's ability to play multiple positions, but we do not yet have an understanding of how much value there is to having a Mark Loretta or Jose Hernandez on your roster.
7) Measuring the value of non-range-based aspects of defense.
This means measuring skills like an outfielder's arm, a middle infielder's ability to hang in while turning the double play or a first baseman's ability to scoop low throws. To date, the effort spent on assessing defensive performance has focused on converting batted balls into outs, essentially measuring a player's range and sure-handedness. Fielding percentage, range factor, Sherri Nichols' Defensive Average, STATS, Inc.'s Zone Rating--these all focus on opportunities to turn batted balls into outs. While important, there are other, less-studied aspects to baseball defense.
We'd want to start measuring the impact of an outfielder's arm, both in terms of cutting down baserunners and whether an outfielder with a cannon-arm reputation intimidates runners. We'd like to establish ways to determine a middle infielder's ability to turn the double play, a first baseman's ability to handle poor throws, an outfielder's reliability in hitting the cutoff man and a catcher's success in blocking the plate. These are all non-range-based factors in defense, they're all important skills and they've all been ignored, for the most part.
8) Evaluating the impact on offensive performance of changing defensive positions up/down the defensive spectrum.
This is the flip side to understanding the defensive spectrum. Here, we ask whether a player's expected offensive production is influenced by the position he's asked to play, and whether changing his position would alter his performance. Would Jeff Bagwell be the same hitter had he stayed at third base? Or Matt Williams at shortstop? Ron Gant at second base? How much is a player helped by moving to an easier position?
9) Predicting the impact on career length from changing positions.
How much longer are players effective after changing positions, particularly if they're moved to an easier position? Is the trade-off in preventing an offensive decline worth the sacrifices you might be making on defense? Do you burn a productive catcher out by leaving him behind the plate and having him retire with knee problems at 32, or are you better off moving him to first base and keeping his bat on the team until he approaches 40? Mike Piazza would like to know the answer, and so would we.
10) Projecting minor league pitchers accurately.
One of the holy grails of sabermetrics is creating useful projections of major league pitcher performance based on minor league performance. While strikeout-to-walk ratios and other means of assessment can give us rough guides to good and bad young pitchers, we're nowhere near the level of certainty we want to achieve. Given the lack of progress from purely statistical approaches, this would be an ideal place to marry the analysis of player-development professionals with sabermetric methods to develop a more powerful predictor than either approach has produced alone.
11) Creating a way to better analyze mechanics.
The precision and consistency needed to be an effective major league pitcher is exceptional. Minute variations in a pitcher's release point, arm angle and body position make the difference between Cy Young and Matt Young. While game film and frame analysis help capture nuances in a pitcher's delivery that escape the naked eye, there's much more that could be done. Advances in data storage make it possible to record and analyze every pitch thrown in every game by a pitcher. Cataloging this information, and measuring the angles, velocities and timing of movements could open up new worlds to help instructors improve pitching. Computer-aided analysis could measure consistency in release points. You could help improve a young pitcher's consistency by having him throw 100 fastballs to the same place, and measure the variance around his optimal release point. Pitchers with greater command should see a smaller standard deviation than someone as wild as Brad Pennington. Lessons gained from biomechanics could suggest new delivery methods that improve effectiveness while reducing strain on a pitcher's arm. These kinds of approaches may help identify pitchers who should be converted to knuckleballers, submariners, or other non-traditional delivery styles.
12) Identifying and quantifying good coaching.
Most of sabermetrics focuses on the players, and the largest portion of any remaining attention goes to managers. But a team's coaches influence the game as well, and they have rarely been studied in any systematic way. Hitting and pitching coaches affect the development of young players and may help avoid prolonged slumps for all players. Pitching coaches often influence a manager's use (or abuse) of his pitching staff. Coaches at third base make split-second calls on whether to send a runner home or hold him up. Are they doing a good job? Frankly, we have no evidence on which to base useful assessments yet. We all think Leo Mazzone is doing a great job with the Braves, but how great?
13) Assessing the "coachability" of players.
Professional baseball players, both in the majors and the minors, possess tremendous physical gifts. However, not every "five-tool" player matures into an effective ballplayer. Is the difference in the quality of his talent, his ability to learn, or both? How much patience can be taught to a free-swinger like Garret Anderson? Can you train any player with blazing speed to read pitchers well enough to become a base-stealing threat? Do draft picks who sign late diminish their peak value by missing out on a season's worth of instruction?
14) Assessing developmental strategies for minor league pitchers.
What is the optimal strategy for developing young pitching? What kind of usage pattern and what quantity of work balances the need for experience against the possibility of physical damage? Some teams have studied the issue and come up with innovative approaches; at A-ball, the A's use eight-man rotations, where matched pairs of pitchers pitch every four days with low pitch counts. By contrast, some teams still think young pitchers need to get as many innings as possible. Should a team try to expose a prospect to as many different hitters and parks as possible? Is that good for development, or bad? Was folding the American Association into the Pacific Coast League and the International League good, because it gave Triple-A pitchers more potential opponents and parks to work in, or was it more important to hone their skills against the same seven or nine opposing teams? Are the competitive structures of the minors good for player development or not? What are some potential improvements?
15) Clarifying the win/dollar trade-off preferences for major league decision-makers.
Winning has never been the only thing in baseball. The fact that baseball is a business is not news to anyone reading this book--in fact, it hasn't been news for the last 125 years. As the ultimate decision-maker for a franchise, the owner of a team values two types of outcomes: on-field success and profitability. The relationship of one to the other isn't objectively knowable, as it comes from the personal--or professional--preferences of the owner. It's perfectly rational for an owner to refuse to risk an $80 million payroll for a 10% chance at a World Series, while another would spend $90 million for a 7% chance. The second owner has a higher win-per-dollar trade-off preference than the former, or you could say he's more willing to take risks. For baseball analysis to move to the next level of relevancy for baseball teams, it must be ready to deal with varying preferences and tolerances and account for them rationally in assessing desirable trades, transactions and contracts. (Ed. note: Doug Pappas breaks down teams by marginal wins/dollar in BP 2004, and will further expand on the topic at baseballprospectus.com in the coming weeks.)
16) Creating a framework for evaluating trades.
Whether a trade is a good or bad decision is something that should be assessed based on the information known (or that should have been known) at the time of the trade. Analyzing trades with any consistency is difficult, as there are always several reasons and factors that go into every trade. Salary dumps, stretch-drive pickups, overcoming key injuries, getting rid of a troublemaker or somebody the manager just can't stand, exchanging excess talent at one position to fill some other hole on the team--these have all motivated transactions of various kinds over the years. To be successful, a framework for evaluating trades must be ready to consider financial factors (including the overall health of the club), current and future expected production from the players involved, the team's current and future competitive situation, and the premium ownership places on winning.
17) Determining the value of draft picks, Rule 5 picks, player-to-be-named-later arrangements, and other non-specific forms of compensation in transactions.
The more esoteric forms of compensation in trades are usually ignored, but they must have some real value if teams continue to exchange talent for them. What does a team give up when it signs a Type A free agent? How much is that draft pick worth? Is a typical Rule 5 pick worth the $50,000 and the roster spot? Are teams taking full advantage of the Rule 5 draft? What kinds of PTBNL deals make sense for both teams?
18) Evaluating the effect of short- and long-term competitiveness on attendance and demand elasticity.
Scholars like Andrew Zimbalist and Gerald Scully have done pioneering work in measuring the relationship between on-field success and attendance. Building on that work, we should study second-order effects in more detail, such as the impact of five or more losing seasons on long-term attendance trends. How long does it take attendance to recover from a bunch of lousy seasons in a row? How quickly will fickle fans abandon a former champion? Does fan apathy catch up with a team that consistently contends year after year, but never quite wins the pennant? Is it worth overspending in the short term to build long-term fan loyalty, thus ensuring greater financial resources to devote to the team in the future?
19) Optimizing the competitive ecology of the game.
Some issues are bigger than any single team's problems. The long-term survivability of "small-market" clubs has made headlines in the past couple of years. One theory is that "small-market" teams can't hold onto their own farm-developed talent, which supposedly departs through free agency for major media markets like New York and Los Angeles. The current argument is that the Minnesotas of the world can never retain the players produced by their farm system long enough to contend. However, if we went back to making it easier for teams to retain their own players, you risk creating long-term dynasties like the Yankees of the 1950s, which diminishes interest in baseball in the cities without the dynasty. So what's the best way to achieve league-wide competitiveness?
20) Determining optimal pitcher usage strategies.
Ideally, a manager wants his best pitchers to throw the most and most important (or highest-leverage) innings. However, he also doesn't want to abuse his pitchers' arms, risking short-term fatigue and long-term injury. There's uncertainty about when high-leverage opportunities will present themselves, yet a regular and tolerable workload is necessary to keep any pitcher sharp. Would a return to four-man rotations with stricter pitch counts lead to greater success? Should teams use a designated closer, or use their best reliever in game-critical situations even if they aren't save situations? Does Tony La Russa's ill-fated experiment with the three-inning starter warrant a longer trial? Should relievers throw fewer, longer appearances, or should they be mixed and matched as platoon differentials dictate?
21) Determining optimal roster design.
Any team's range of in-game strategic options originates with the decisions about which players are on the roster, yet the strategies for constructing a roster have undergone little scrutiny. During the season, a team has many objectives that sometimes conflict with one another. Winning the division or qualifying for the postseason is the ultimate goal, but throughout the season there are other smaller goals: seasoning a rookie, sorting out bullpen roles, assessing a player's readiness after an injury. If a team finds itself in contention, should it ignore potential future payoffs and focus on using established veterans deemed most likely to contribute this year? How would the failure to play that rookie impact the team's competitiveness next year or the year after? Is it worthwhile to carry a player whose primary talent is pinch-running? How important is a third-string catcher? Or a second left-handed specialist out of the bullpen?
22) Quantifying the manager's impact on winning.
Bill James published his Guide to Baseball Managers in 1997, and in it set forth some nifty tools for estimating a manager's effectiveness based on seasonal statistics. Careful observation and recording of managerial moves (e.g., roster management, in-game tactics, pitcher usage) set the stage for a detailed assessment of a manager's impact on his team. Unifying this data into a coherent whole is a challenge, but the payoff would be a much better understanding of the value of a manager's contributions to his team.
23) Developing a game-theoretic framework for analyzing elective strategies.
In the offense-crazed world of the late 1990s, it seems quaint to concern ourselves with little-ball strategies like the sacrifice bunt and the hit-and-run. These strategies are widely derided in sabermetrics, largely on the basis of expected-run analysis. Tables, indexed by the number of outs and the location of baserunners, give an expected number of runs scored for the rest of the inning, based on the results of actual games. If expected run scoring declines after a player bats, or given a typical success rate for a play, then the strategy is deemed to be bad. While this approach was an important and useful first step, there are two major problems with it: First, this method yields an answer for a league-average team because it's based on the results of the league as a whole; and second, it ignores the changes in shape of run scoring that can be important for many game contexts.
To truly understand where and when to use these strategies, they must be studied not just with an expected-run analysis, but with a true assessment of how much more likely the game is to be won using such a strategy. There is a branch of mathematics called game theory which is ideally suited for studying not only the direct impact of little-ball strategies on the outcome of games, but the move-countermove nature of two managers trying to gain whatever advantages they can against one another. An in-depth treatment of little-ball strategies that recognizes the true richness of managerial decisions and counter-decisions should be welcomed.
"The deep significance of certain problems for the advance of sabermetric science in general and the important role which they play in the work of the individual investigator are not to be denied. As long as a branch of science offers an abundance of problems, so long is it alive; a lack of problems foreshadows extinction or the cessation of independent development. Just as every human undertaking pursues certain objects, so also sabermetric research requires its problems. It is by the solution of problems that the investigator tests the temper of his steel; he finds new methods and new outlooks, and gains a wider and freer horizon."
The range of interesting avenues of exploration is larger than a single publication--even Baseball Prospectus--can possibly hope to explore. Fortunately, the community of interested and knowledgeable baseball fans and analysts is large and getting larger, and many of these Hilbert problems will be solved by researchers nobody has even heard of yet. We look forward to seeing the solutions, and with them the posing of more interesting questions, in the century to come.