April 28, 2017
The Natural Conclusion of Baseball Statistics
There was a once a time, not so terribly long ago, when a person interested in scientific pursuits might require little more than a (quill) pen and paper, bottles, a net, and a magnifying glass. Undiscovered wisdom was everywhere, on every beach and every country stroll, and people of different vocations would devote their weekends and evenings to the collection and collation of all sorts of data. This pursuit was dubbed “natural history,” and societies sprang up to further combine and organize this knowledge into shared productivity. Clergymen and merchants would meet to compare notes, distinguish between different type of insects, occasionally argue about the names.
After the intellectual stagnation of the medieval era, this scientific revolution was an awakening of creativity and curiosity. Not all branches were equally accessible: fields like optics, anatomy, and chemistry required specialized equipment and education. But others, particularly biology and agriculture, laid secrets for anyone to find given sufficient ingenuity. An abbot named Mendel discovered genetics in the peapods of his garden; Darwin and his associates flung themselves across the globe, seeking new species and new habitats.
There’s something of the Wild West to this era of science, an open frontier with few restrictions and guidelines. It’s a culture of infinite potential, of endless possibility and disappointment, that corresponds well to the early days of baseball research. The ideas were there for anyone to find, if they knew how to look.
(The rudimentary beginnings of the WAR leaderboard.)
Science, around the turn of the 20th century, began to change, relocating from the fields to the laboratory. Astronomy created uncountable numbers; doctors created complicated tests to verify our growing knowledge of the human body. As our knowledge in physics and biology grew, it drove people to look closer, exploring under stronger and stronger microscopes to understand the realm of bacteria and atoms. Training and formal education became necessary, measurement grew increasingly expensive. These progressions were neither good or bad; they were a simple evolution, the next step.
Since its unveiling at the MIT Sloan Sports Analytics Conference three short years ago, Statcast has transformed the nature and scope of baseball data. The sum of the immeasurable now measurable is dizzying; exit speed, launch angle, running speed, leadoff distance, spin rate, transfer speed on catch-and-throws. The secrets of these new veins of research are extracted and refined in 30 different laboratories, and measurement of players has already been altered. It’s no longer batting average, it’s batting velocity.
As with science, this is only the latest step in the natural progression of baseball analytics, a miniaturization of exploitable advantage. Just as secondary average and Win Shares pioneered the ability to accurately measure what a player had accomplished, fielding-independent metrics and BABIP drove us to think about not what happened, but what should have. That there was a talent level separate from production, something that existed on a separate plane. In one sense, such a shift in thought was liberating; finally, every discussion of a player didn’t have to end in “how many rings/All-Star berths/MVPs did the guy get?” But in another way, the complexity and aethereality of modern statistics could portend an unhappy future.
How many baseball diamonds can you spot in the picture above?
Science, like baseball, does not progress in a steady motion; it jumps forward in fits and starts, swept clean and rebuilt by the ideas of great men. Idealists like Aristotle, Newton, and Bill James transformed the landscape in their fields, but ironically, these upheavals are often followed by lulls. It’s difficult to exist under such great shadows, to pick through and understand the aftermath of their thought.
In a way, we’re still recovering from Newton all this time later; we still think in his terms, despite the fact that in reality—the reality that exists beyond the scope of the naked eye—his most famous tenets don’t really apply. It’s hard to shrug off the ideas that we were taught as children, ideas that, in our everyday lives, appear so intuitive. And so we as laymen have taken on a split personality in science: understanding the real, and the practical. So much of what we “know,” in the theoretical sense, is no longer attached to our own experiential learning.
It’s this, I believe, and not just a political movement of anti-intellectualism or conservative religious thought, that has created the recent pushback against science. The problem is exacerbated in an educational system that has slowly distanced itself from critical thinking, and emphasized testing and memorization. Once teaching is an authoritative act, a forceful transfer of knowledge and value robs the student of the ability to trust or mistrust, and then verify. Skepticism is replaced by cynicism. It results in a loss of ownership, a perceived disenfranchisement over what is considered “truth,” that leads people to believe in flat earths and vaccination-autism links. Such beliefs, as ridiculous as they are from a scientific standpoint, are at least in formulation a creative act.
Batting average is not an intuitive stat. Neither is earned run average. Both of these statistics are considered meaningful because baseball has grown to love them over the course of a century and a half. BP’s own True Average is what it is because people know what a “good” or a “bad” batting average looks like. The internalization has come slowly, and if baseball were to start over there’s no way we arrive at the same numbers; they were forged through arguments and power shifts. But they’re what we have, what our grandparents gave us.
(Fig. 3: some data)
One of the points of contention in the sabermetric era was the inclusion of advanced statistics in popular baseball culture: getting broadcasters and talking heads to employ, or even correctly describe when debunking, “complicated” stats like OPS and WPA. It seems (and seemed) like a rather silly political battle, but it was political. Often the argument was that the statistics hadn’t been internalized, that explaining these new-fangled numbers would alienate viewers, disrupt the flow of the game. The vicious circle: the stats required explanation because the average fan didn’t know them; they didn’t know them because they never got taught.
A decade later, pockets of resistance still hold their ground, but the landscape has mostly transformed. Statcast’s in-house sponsorship will allow it to avoid that awkward phase and beam straight into homes at full power. The tools now available to describe and analyze are almost unfathomably expansive and specific. But they also arrive at a time when caution is equally necessary.
It won’t be an easy task, and it’s not necessarily a problem of Statcast’s making. As with the advancement of science, current baseball knowledge has reached a level of granularity that goes beyond our level of intuition. It’s a tricky balance: we do want information that challenges preconceived notions, that pushes the audience closer to a concept of truth. As Sam Miller noted recently, a game without statistics, one in which intuition is given free reign, is an uncomfortable place. Both halves of the game feed off of each other.
So much of modern analytics dwells in the realm of probability, a concept that can come off as unintuitive to the point of meaningless at best, and insulting at worst. The value of probability study is to provide context on exactly what isn’t intuitive, that the reality we experience isn’t predestined, but one possible result. In a society that values results, has bronzed its own national bootstraps and glamorized the concept of glamour, it’s a difficult sell. It’s also a vital one. Probability gives us an opportunity to move past the golden-era flawless hero to an imperfect, fortunate, and human one. It gives us an opportunity to find nobility and honor not just in triumph, but also in failure.
Unlike the Project Scoresheet and PITCHf/x revolutions of the past, which began as underground movements, Statcast is a top-down enterprise; while they’re to be commended for how much data they’ve freely provided, its authority puts it at an immediate disadvantage. The advantage of not having to fight for its place is actually a disadvantage, because denying that conversation and debate is also denying a learning curve. It’s not enough to just give the people statistics, quote barrels and launch angles without context and without meaning. It’s not even enough to try to give them the context.
The masses will have to build that value themselves, take apart the pieces and look at them and argue about them and slowly understand them. It’ll be messy, and it may not be a pleasure to watch in real time on Twitter. The turmoil over the usefulness of WAR, and its less creditable components, is a fine example. But it’s vital if these numbers are to become more than incidental fun facts, or arbitrary judgments. The average fan will need to be able to see those statistics and not just know them but feel them, take them in as just as much of the game as the hue of the grass and the sound of the crowd and the arc of the left-hander’s swing. Otherwise, baseball runs the risk of watching its fans reject them entirely, and go back to what they’ve always done: make up their own stories.