Can we improve on PECOTA's forecast for a hitter just by looking at which pitches the opposing catcher called?
The fastball is the meat and potatoes of the batter-pitcher contest. Variations in fastball velocity and movement explain a lot of the differences between pitchers, and a good heater can set up a whole arsenal of other pitches to boot. Fastballs are the most commonly thrown pitch by a wide margin, and so they determine to a great extent the results of any given matchup.
It’s no surprise then that pitchers tend to vary how much they use their fastballs on a hitter-by-hitter basis. Some hitters see fastballs rarely, others overwhelmingly, and the difference between hitters tells us something about their power (as well as their proficiency against fastballs). Being that they are the main offering of most pitchers, fastballs are the easiest to tee off against, and so they are thrown more rarely against powerful hitters.
The rest of this article is restricted to Baseball Prospectus Subscribers.
Not a subscriber?
Click here for more information on Baseball Prospectus subscriptions or use the buttons to the right to subscribe and get access to the best baseball content on the web.
Can the uncertainty in a player's projections be projected?
There are two important aspects of prediction. The first concerns the accuracy of the prediction—that is, how close a prediction is to the actual, observed result. The second is uncertainty, which is how sure a forecaster is about his or her projection. These issues are fundamental forecasting concepts, and similarly apply to predictions of the weather, the stock market, or the outcome of tomorrow’s ballgame. At present, only one of these facets of a prediction gets much attention in the world of baseball projections, and that is accuracy. Accuracy is measured by the absolute error, which defines how close, on average, a forecast is to the actual, observed result. Projectionists struggle primarily to minimize this number.
The under-examined facet of prediction that we will address in this article is the uncertainty. Whereas we know that predictions tend to be accurate to within a hundred or so points of OPS, we would also like to know whether we are more or less likely to be wrong on certain players. The uncertainty is often treated as a second-order concern because it is usually more difficult to estimate. However, as we show, it is possible to predict ahead of time which players’ forecasts are more uncertain than others. This concept is important because certain teams may prefer high versus low-risk players—a team with high win expectations (90+ wins) might prefer to reduce risk, whereas a middle-of-the-road team (80-85 wins) would presumably seek risk in order to “get lucky” and reach the postseason.
In what direction are voting totals trending for marginal candidates, and are steroids actually to blame?
Every year, the Hall of Fame vote brings a great deal of vitriol to baseball. With each year’s ballot, we are confronted by the specter of the steroid era, always a sore subject. But even neglecting the steroid era candidates, the BBWAA voters manage to produce a handful of idiotic ballots, defended with harebrained rationales, sometimes leading to obvious omissions.
It would be easy to pin the Hall’s recent mismanagement solely on the steroid issue, but the problems do not stop there. There’s a clear backlog of players that’s been developing for more than ten years, leaving deserving stars (with no steroid evidence against them) like Tim Raines and Curt Schilling on the outside. The situation is especially dire for pitchers, where the voters seem to rely upon outdated benchmarks like 300 wins, which even the best modern pitchers simply cannot hope to reach. This failure is through no fault of their own—pitcher usage patterns and injuries have changed the game. Clinging to the old milestones has the effect of artificially increasing the standards for induction, so that only the most inner-circle, obvious Hall players can make it.
The “fastball hitter” is one of the oft-repeated archetypes in baseball. This is the notion of a hitter who can strike fastballs just fine, but struggles to deal with the unpredictability of a breaking ball. A simple search of Baseball Prospectus’ archives reveals 64 results for “fastball hitter”; Google, surveying the entirety of the internet, pulls down more than 10,000. Like many of baseball’s finest tropes, the fastball hitter has even been enshrined in cinema lore.
Beyond movie characters and conventional wisdom, it seems plausible that some batters might have more difficulty recognizing or mentally adjusting for the break of a curve, for instance. Curves and sliders, in particular, possess not only the capacity to slice through the air horizontally, but also are often said to create visual illusions in the mind of hitters.
The error spectrum of projections shows the limitations of analysis, or the progress we can still make.
It’s around the time that projection engines are being tweaked, updated, and improved, in anticipation of the release of new predictions for the coming year. At Baseball Prospectus, Rob McQuown is hard at work ironing out the kinks for this year’s release of PECOTA. Given the present focus on predictions, the time is ripe for a retrospective look at how the projections fared last year.
There’s no better source for a large-scale comparison of projection algorithms than Will Larson’s Baseball Projection Project, which I will use for this article. Larson’s page houses the old predictions of as many different sources as he can get his hands on, including methods as diverse as Steamer, the Fan Projections a FanGraphs, and venerable old Marcel. It’s a rich storehouse of information concerning the ways in which we can fail to foresee baseball.
Constructing a leaderboard that passes the smell test.
I’ve recentlywritten about the role and value of plate discipline in hitting. I concluded the last article with the takeaway that plate discipline, while undoubtedly important in hitting, was not fully separable from the other attributes of a hitter. In searching for a complete per-pitch way of evaluating hitters, we have to account for the entire package of skills, because all of the skills interact with each other. So we have to go back to basics.
Despite being athletically impossible, hitting is theoretically simple. Every hitter is confronted on every pitch with a choice: to swing, or to take. A take is valuable when the pitch is likely to be a ball; depending on the count, you can get a walk, or at least advance the count in a favorable direction. If you swing here, you both lose the benefit of the called ball, and also risk whiffing on the pitch or making weak contact. On the other hand, when the pitch is thrown over the middle of the plate and is thus likely to be called a strike, the better choice is to do your best to make contact. If you take, you get a strike, and lose the opportunity of a hittable pitch.
Turning a smarter, better plate discipline measure into a leaderboard--and examining what it means.
Last week I wrote about how thinking about the zone in a probabilistic way could inform a better approach to plate discipline. In brief, I wrote that the zone as a discrete box above the plate does not exist. In its place, we can judge each pitch according to the probability that it will be called a strike, building into our estimate some of the factors which we know change the geometry of the zone. Examining plate discipline in this fashion proved illuminating, not to mention predictive of walk rates.
There’s a further step we can take with this probabilistic zone, which is to bring in linear weight information in order to judge the actions of each batter. In so doing, we can get an idea of the value of a hitter’s decision-making translated into the fundamental currency of baseball, runs.
In the sortables section of Baseball Prospectus, there is a report called Batter Plate Discipline. If you’re trying to get a handle on how good hitters are at reacting to balls and strikes, this section contains measurements on such things as swing and contact rates. A natural way to divide such rates is based on the strike zone: A swing at a pitch inside the zone is a different event than at one outside the zone. A whiff on a pitch middle-middle is a disparate event from a whiff on a pitch way outside, so it makes sense to tabulate them in different columns. There is a problem with this dichotomy, however: There is no strike zone.
In the words of Michael Lopez (who borrowed in turn from Bobby Ojeda), the strike zone is a unicorn. By this I mean not that the strike zone does not exist at all, but rather that it does not exist in the way that Major League Baseball defines it. The rulebook definition is a rectangular solid hanging in space, with infinitesimally thin boundaries which, once touched, trigger strike calls.
You've seen it in a hundred chyrons: The Giants do well against fastballs that come in faster than 95 mph. But the stat is nonsense. Is the idea behind it nonsense, too?
One of the statistics bandied about with great frequency in the World Series coverage has been the Giants' collective proficiency against the fastest of fastballs (typically defined as more than 95 mph). On several occasions, broadcasters have mentioned that the Giants hitters do well against these pitches, both as a team and with reference to particular individual players. The tenuous conclusion to be drawn from these statistics is that the Giants will continue to do well against the blistering heat, including those fastballs wielded by such prominent Royals as Yordano Ventura and the Reliever Triumvirate.
As many have noted, the stats as presented on the broadcast are terrible, for a bevy of reasons. We can start with batting average, which I probably don’t have to tell you is not a very good index of a hitter’s skill or outcomes. We’d like a better metric, ideally something that included the value of plate discipline (walks are valuable, too!).
After months of moving downward, the October strike zone is suddenly rising.
Everybody’s been writing about the strike zone recently, and that’s for good reason. The strike zone is evolving, and for the first time in the history of baseball, we have the technology to directly record that evolution. Mostly, the bottom of the strike zone is dropping, and that plays some role in shaping the current pitching-dominated era (although exactly how much of a role is a matter of some debate).
What’s most astonishing about the strike zone’s changing definition is the rapidity with which we are witnessing the results. Year after year, the strike zone falls, and this year has been no exception. In this recent article, Jon Roegele chronicles the most dramatic drop in the bottom of the strike zone yet: In the last year, the zone’s real estate has increased by 16 square inches. But even without a rigorous statistical analysis of the zone, you could feel the impact of the strike zone’s accelerating fall in the numerous strikeout records which have been broken, and in the historic seasons of Clayton Kershaw and other pitchers.
When umpires don't call balls and strikes the way we expect them to, who suffers?
One of the emerging storylines of the postseason so far has been inconsistency in the strike zone. That’s not unique to this postseason, of course; every year sees its share of poor calls, and the effect of those calls is magnified when so much is on the line. Whereas a missed strike may be objectionable in the regular season, it can (at worst) alter the outcome of one game out of 162. Missed calls in the postseason, on the other hand, can end seasons.
As a result, every bad call an umpire makes is scrutinized to a much greater degree. When an umpire’s zone is off—poorly defined, or merely inconsistent—whole legions of fans can flood the internet with vitriol. Generally, an umpire who’s doing a bad job of calling balls and strikes won’t favor the fortunes of one team or the other. But it is frustrating, as a fan, to see a beleaguered slugger’s bat taken out of the game on a borderline call, as happened to Matt Kemprecently.
Last week, we found something odd about the way the league started pitching a slumping Oakland. This week, we might have found the reason.
In last week’s column, I shared an observation I’d made about the pitch selection used against the Oakland Athletics. I found that sometime midway through the season, opposing pitchers had subtly but noticeably altered the frequency at which they threw fastballs. Suddenly, the Oakland A’s were being approached with a reduced number of heaters*: