April 9, 2010
Checking the Numbers
Few times of the year are as anticipated as the start of spring training. Fans have been without Major League Baseball for five months, and while exhibition games matter little in the grand scheme of a season, they offer the reminder that meaningful action is right around the corner. Unfortunately, spring training games also serve as the impetus for a litany of articles that use murky data to make points, or focus on how generally meaningless the numbers are before throwing out a "having said that…" reversal. We might read that spring numbers don’t mean much, especially for pitchers, but that Fausto Carmona’s K/BB ratio in 12 innings is legit. Or how power numbers are suspect in the spring, but Abraham Nunez’s (the other one) nine blasts and .981 slugging percentage was a clear portent of solid production around the corner for the 2004 Marlins.
Additionally, spring numbers are often used for the sake of convenience in confirming a preformed opinion. If Ryan Howard hits .220/.300/.380, he’ll be fine because he’s Ryan Howard. If Cole Hamels posts a 4.88 ERA in 24 innings, however, there is still "something wrong with him." Parsing utility out of spring training stats is incredibly difficult and a very futile exercise, but that does not mean spring training itself is useless. Decisions can still be influenced by activity on the field, even if slavishly quoting the numbers is a waste of time. Today we’ll explore the psychology of synthesizing spring training data as well as explore what front offices and organizations actually seek in these Grapefruit League and Cactus League games.
What is Spring Training?
Spring training is that magical time of the year when players congregate in warm-weathered cities to fine-tune their muscle memory, work on various aspects of their game, and spew clichés in droves that are brushed aside given the level of excitement stemming from baseball merely being back. Most of the players have not played competitively in months, instead focusing on getting into the "best shape of their life," and use the practices and exhibition games to get back into the swing of things—pun completely intended. While games are certainly played and nobody likes to lose, players tend to spend the majority of their time harnessing specific aspects of their game; a pitcher might look to add a new pitch, while a scrappy hitter might try to master the art of sacrifice bunting. In other words, winning is not the chief objective.
If baseball analysis is meaningful when the games themselves are, it stands to reason that non-meaningful games produce non-meaningful statistics. Did anyone really think Nunez was going to bop 50 blasts in 2004 or conjure up a Ruthian campaign? Sure, this is an extreme example of a performance drawing skepticism, but in many non-extreme circumstances, fans contradict themselves: They will agree that the numbers don’t matter then form opinions based on those numbers. Why does this occur?
The Mind of the Matter
Numbers amassed in late February and March are littered with contextual flaws. One day, Johan Santana might face the Michigan Wolverines, working on his slider, while the next day Yohan Flande has to deal with a full-fledged Yankees lineup. The variances are much different than what transpires over the course of a major-league season, when everyone does whatever they can to win, and as bad as certain teams might be, they can still school collegiate rosters. In spite of the problems with context, what occurs in spring training represents the first new information that has become available, and the only data from the current year, which can be extremely difficult to simply brush aside or normalize in your head.
But if the contextual issues were all resolved, we would still have to deal with sampling problems. If Kyle Kendrick and Jamie Moyer enter the spring vying for the same spot in the Phillies' rotation and produce numbers of opposite extremes—and we assume the competition they faced and the quality of defense behind them were equal—the lack of certainty inherent in 15 or so frames renders most of the numbers moot.
Even if Kendrick finished the spring with a 0.50 ERA to the 14.62 mark of Moyer, we cannot be sure that the former is better than the latter. Streaks like this take place all the time over the course of a six-month regular season, but this is conveniently forgotten when a tangible anchoring point can be traced. This is why many people begin workout regimens or diets on a Monday, or the first of the month as opposed to a random Wednesday the 12th, and why stats produced in April often influence a fan’s opinion of a player for that entire season.
But it is so incredibly tough to block the numbers out, especially when there is seemingly nothing else to go on. If Kendrick and Moyer posted the numbers above and the Phillies went with Moyer in their rotation while demoting Kendrick to the minor leagues or relegating him to the bullpen, many would be confused, and understandably so. And the brass would likely have to answer questions from fans and the press as to the basis of its decision. How can you choose someone with an ERA thirty times worse? Again, this is an extreme example, but the waters would be very muddy in comparisons between pitchers with ERAs of 3.75 and 4.80, where it might not be clear as to which pitcher performed better.
We know the numbers don’t mean much, but they are all we have to go on, and from a front office perspective, decisions still need to be made. How can a team make a decision based on what happens in the spring—uncertainty—when such decisions normally require certainty?
Teams and Spring Training
I spoke to a few different front offices to get a sense of their views on spring numbers as well as what they look for in the exhibition games and workouts. The consensus seemed to be that what happens in the spring (fight… urge… to say… stays in the spring) can be important, but only under context-tinted lenses, and that the qualitative aspects irrefutably matter more. A discussion with one of these sources helped put things in perspective:
"We go into the spring with a baseline evaluation of each individual player and his role within the club and organization. During the spring, we try to measure each player against this baseline. The lower that initial valuation or our certainty in regards to that evaluation, the more sensitive we are going to be to that individual’s performance in the spring. The statistics only matter in that they are a reflection of the performance that those performing on field evaluations are seeing."
The last sentence sticks out and is the key to unlocking spring stats: the numbers are going to exist whether they are used or not, but generally, managers, coaches and scouts are going to be able to identify a poor swing or mechanical flaw. Perhaps the field staff feels Moyer has a big problem with his release point that, when fixed, will produce much better results. The teams know that having Jimmy Rollins at shortstop is different than Mike Morse, and that facing Rollins in the batter’s box is much different than, well, Morse in the batter’s box, but it is the qualitative aspects of performance that drive the decisions, numbers or not. The source continued:
"There are four factors we take into consideration in evaluating spring performance: our initial evaluation of the player, the comparison of his spring performance to that baseline, the context in which the performance is derived, and the performances of others competing for the same spots. The other important part is contextualizing what we see: pitchers, for instance, have a great deal of variance in the quality of lineups faced and, by extension, the quality of defenses behind them. This is of utmost importance when dealing with small samples."
I’ve often found that the issue of defense is not discussed as much as the variance in offense, when it really does matter a great deal. If Moyer posted his higher hypothetical ERA but played with Adam Dunn clones at every position, the results would have been expected to be worse than if it had been Adam Everett clones roaming the diamond. Perhaps this is due to the skepticism thrown in the direction of defensive metrics to begin with, but an interesting tidbit nonetheless.
The treatment of spring training performance described above might not embody the modus operandi of every major-league team, but it likely isn’t too far off. Spring training does matter in the sense that decisions have to be made regarding certain spots in the rotation, bullpen, lineup, or bench, and for certain players it marks the first time their employer sees them up close and personal. But the decisions are rarely, if ever, based on the stats. If a player goes 42-for-47 at the plate, it isn’t the .894 batting average that will earn him a job, but the fluidity of his swing and the potential the coaches see when he comes to the plate.
Numbers are useful when drawn from useful environments. When the environment isn’t beneficial to statistical analysis, it’s best to focus on the qualitative information, be it spring training or individual platoon splits. It’s the beginning of the season and a highlight of the year, for sure, but don’t be shocked if Fausto Carmona posts a 5.4 BB/9 after just two free passes in 26 spring innings, or if John Bowker doesn’t hit 30 home runs after mashing six in March. Their numbers were not fake, but illusory, and focusing on what happens when the games matter is a much better way to go.