In my previous article, I modeled the choice between a stable veteran and a promising but unproven rookie as a “multi-armed bandit” problem. To solve it, I devised an algorithm that computes the expected value of each choice and discovered that provisionally starting the rookie usually yields a greater expectation. The intuition behind this result is that it’s usually worth investing a few plate appearances in the hopes that a rookie might be highly productive, even if that is improbable according to his PECOTA projection; after all, the veteran can always replace him if the experiment fails. But what constitutes failure? In other words, how long should the rookie be allowed to struggle?
To answer this question, we’ll need to go under the hood of the model I described the first time out. In the model, the rookie’s productivity is described by a probability distribution on his OBP based on PECOTA‘s projections. The probability distribution in question is a beta; in order to understand the methodology I am about to describe, we’ll need to grasp the basics of this distribution. The beta ranges over the interval [0, 1] and is defined by two positive parameters (α, β). Its mean is α/(α+β); thus, if α goes up, the mean goes up, while if β goes up, the mean goes down. Also, the greater α and β are the more “peaked” the distribution will be around this mean. Let’s see how this looks:
The red curve is a beta(10, 30), whose mean is 10/(10+30) = .25, while the yellow is a beta(30, 10), whose mean is 30/(30+10) = .75. The green curve is a beta(20, 20), while the blue curve is a beta(100, 100); both have a mean of .5, but the latter is much narrower because it has larger parameters.
The purpose of this digression has been to introduce the concept of updating, which refers to adjusting a probability distribution based on new data. The beta is very convenient for this purpose, as it is easily updated by incrementing its parameters. Specifically, after a “success” we increment α, raising the mean, while after a “failure” we increment β, lowering the mean. In either case, the distribution becomes narrower, as the new data give us a better idea of where the true value must lie.
We can use updating to determine how long a rookie should be allowed to struggle. Let’s return to the cases of Matt Wieters and Cameron Maybin. As of this writing, Wieters has racked up 92 successes (H + BB + HBP) and 198 failures (outs). His original OBP distribution based on PECOTA‘s preseason projections was a beta(107, 166); we can update this in light of his major league performance to a beta(199, 354). The mean of this distribution is .359, which is lower than his original mean of .392, but still higher than Gregg Zaun‘s 2006-2008 aggregate OBP of .348. It therefore seems that the Orioles are justified in keeping Wieters in the lineup based on expected productivity alone. Plugging Wieters’ new distribution into the algorithm I described in the first part produces a “break-even” OBP of .365 (assuming 100 remaining PA in the season), only .006 higher than his new mean. Recall that the original distribution produced a break-even OBP of .415, which was considerably higher than the mean of .392; the updating has caused the gap between these two numbers to shrink, as Wieters’ MLB performance discounts the extreme upside encoded into his original projection.
As for Maybin, we already know that the Marlins sent him down to the minors on May 11, in no small part due to a lackluster .280 OBP in 95 PA, which breaks down to 26 successes and 69 failures. His original distribution was a beta(88, 166); we update this to a beta(114, 235), which yields a mean of .327 and a break-even OBP of .334. The latter value is higher than the 2006-2008 aggregate OBP of either Alfredo Amezaga or Cody Ross, the two players who took Maybin’s spot in center. Based on this analysis, the decision to demote Maybin may have been hasty. Fortunately, the Marlins did not give up on him for the season; he was promoted at the beginning of September and has excelled (.250/.389/.536).
The actual performances of Wieters and Maybin, though poor, are not bad enough to justify removing them from the lineup. What, then, would it take? Let’s consider two cases: first, that the rookie posts a .000 OBP (a la Willie Mays), and second, if the rookie posts a .250 OBP. Supposing four PA per game, we can represent these performances as going 0-for-4 every game and 1-for-4 every game, respectively. After each game, we can update the rookie’s probability distribution: in the first case, we increment β by 4 (in light of his four failures), while in the second we increment α by 1 and β by 3 (one success against three failures). After each update, we can plug the new distributions into the algorithm described the first time around; this will generate new break-even values. When these values fall below the veteran’s OBP, it’s time to make the switch.
I simulated both cases above for Wieters and Maybin. The results are best depicted graphically:
The break-even values fall steadily as the rookie performs poorly; the worse he performs, the faster they fall. The critical points are the intersections of the curves and the horizontal lines; this is where the break-even values cross the OBP of the veteran. For Wieters, this happens after 52 PA (13 games) of .000 OBP and 136 PA (34 games) with a .250 OBP. For Maybin, it’s 37 PA (9 games) and 119 PA (30 games).
When interpreting these results, it’s important to note that they apply “in a vacuum,” ignoring other important considerations. For example, if Wieters goes 1-for-4 every night but each of those hits was a home run, Dave Trembley would be crazy to bench him. Conversely, if Maybin was 0-for-25 and scouts observed that he looks completely lost at the plate, the Marlins might be wise to send him to the minors even though this analysis suggests he be given at least 12 more plate appearances. Furthermore, a player’s defense alone may be sufficient grounds to keep him in the lineup or remove him.
These limitations notwithstanding, the results suggest that teams not hastily give up on a promising rookie who gets off to a slow start, especially if the organization and its scouts are confident in his eventual success. Recall the numbers for Maybin, a fairly typical prospect-according to my analysis, it would still have been profitable to leave him in the lineup if he had started the season 30-for-119. In practice, teams regularly send rookies with better numbers in fewer plate appearances down to the minors. Such a knee-jerk reaction to a small sample hurts a team by reducing its expectation over the course of the season. Generally speaking, teams faced with an under-performing rookie should be patient unless they have very good reason to believe that he will continue to struggle.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
By the way, if you want to consider offensive characteristics beyond OBP, use EqA instead. It's technically incorrect because it's not based on binary outcomes but in practice it makes very little difference: calculate the number of successes as EqA * PA and add it to alpha and add (1-EqA)*PA to beta.
This analysis could be done using EqA; OBP, however, is a better fit for the model and a good metric for offensive performance in its own right.
It's an interesting question that you've addressed -- I'm curious to read others' comments and your responses. Thanks.