BP Comment Quick Links
September 22, 2009 Solving the Rookie DilemmaPart 2In my previous article, I modeled the choice between a stable veteran and a promising but unproven rookie as a "multiarmed bandit" problem. To solve it, I devised an algorithm that computes the expected value of each choice and discovered that provisionally starting the rookie usually yields a greater expectation. The intuition behind this result is that it's usually worth investing a few plate appearances in the hopes that a rookie might be highly productive, even if that is improbable according to his PECOTA projection; after all, the veteran can always replace him if the experiment fails. But what constitutes failure? In other words, how long should the rookie be allowed to struggle? To answer this question, we'll need to go under the hood of the model I described the first time out. In the model, the rookie's productivity is described by a probability distribution on his OBP based on PECOTA's projections. The probability distribution in question is a beta; in order to understand the methodology I am about to describe, we'll need to grasp the basics of this distribution. The beta ranges over the interval [0, 1] and is defined by two positive parameters (α, β). Its mean is α/(α+β); thus, if α goes up, the mean goes up, while if β goes up, the mean goes down. Also, the greater α and β are the more "peaked" the distribution will be around this mean. Let's see how this looks:
The red curve is a beta(10, 30), whose mean is 10/(10+30) = .25, while the yellow is a beta(30, 10), whose mean is 30/(30+10) = .75. The green curve is a beta(20, 20), while the blue curve is a beta(100, 100); both have a mean of .5, but the latter is much narrower because it has larger parameters. The purpose of this digression has been to introduce the concept of updating, which refers to adjusting a probability distribution based on new data. The beta is very convenient for this purpose, as it is easily updated by incrementing its parameters. Specifically, after a "success" we increment α, raising the mean, while after a "failure" we increment β, lowering the mean. In either case, the distribution becomes narrower, as the new data give us a better idea of where the true value must lie. We can use updating to determine how long a rookie should be allowed to struggle. Let's return to the cases of Matt Wieters and Cameron Maybin. As of this writing, Wieters has racked up 92 successes (H + BB + HBP) and 198 failures (outs). His original OBP distribution based on PECOTA's preseason projections was a beta(107, 166); we can update this in light of his major league performance to a beta(199, 354). The mean of this distribution is .359, which is lower than his original mean of .392, but still higher than Gregg Zaun's 20062008 aggregate OBP of .348. It therefore seems that the Orioles are justified in keeping Wieters in the lineup based on expected productivity alone. Plugging Wieters' new distribution into the algorithm I described in the first part produces a "breakeven" OBP of .365 (assuming 100 remaining PA in the season), only .006 higher than his new mean. Recall that the original distribution produced a breakeven OBP of .415, which was considerably higher than the mean of .392; the updating has caused the gap between these two numbers to shrink, as Wieters' MLB performance discounts the extreme upside encoded into his original projection. As for Maybin, we already know that the Marlins sent him down to the minors on May 11, in no small part due to a lackluster .280 OBP in 95 PA, which breaks down to 26 successes and 69 failures. His original distribution was a beta(88, 166); we update this to a beta(114, 235), which yields a mean of .327 and a breakeven OBP of .334. The latter value is higher than the 20062008 aggregate OBP of either Alfredo Amezaga or Cody Ross, the two players who took Maybin's spot in center. Based on this analysis, the decision to demote Maybin may have been hasty. Fortunately, the Marlins did not give up on him for the season; he was promoted at the beginning of September and has excelled (.250/.389/.536). The actual performances of Wieters and Maybin, though poor, are not bad enough to justify removing them from the lineup. What, then, would it take? Let's consider two cases: first, that the rookie posts a .000 OBP (a la Willie Mays), and second, if the rookie posts a .250 OBP. Supposing four PA per game, we can represent these performances as going 0for4 every game and 1for4 every game, respectively. After each game, we can update the rookie's probability distribution: in the first case, we increment β by 4 (in light of his four failures), while in the second we increment α by 1 and β by 3 (one success against three failures). After each update, we can plug the new distributions into the algorithm described the first time around; this will generate new breakeven values. When these values fall below the veteran's OBP, it's time to make the switch. I simulated both cases above for Wieters and Maybin. The results are best depicted graphically:
The breakeven values fall steadily as the rookie performs poorly; the worse he performs, the faster they fall. The critical points are the intersections of the curves and the horizontal lines; this is where the breakeven values cross the OBP of the veteran. For Wieters, this happens after 52 PA (13 games) of .000 OBP and 136 PA (34 games) with a .250 OBP. For Maybin, it's 37 PA (9 games) and 119 PA (30 games). When interpreting these results, it's important to note that they apply "in a vacuum," ignoring other important considerations. For example, if Wieters goes 1for4 every night but each of those hits was a home run, Dave Trembley would be crazy to bench him. Conversely, if Maybin was 0for25 and scouts observed that he looks completely lost at the plate, the Marlins might be wise to send him to the minors even though this analysis suggests he be given at least 12 more plate appearances. Furthermore, a player's defense alone may be sufficient grounds to keep him in the lineup or remove him. These limitations notwithstanding, the results suggest that teams not hastily give up on a promising rookie who gets off to a slow start, especially if the organization and its scouts are confident in his eventual success. Recall the numbers for Maybin, a fairly typical prospectaccording to my analysis, it would still have been profitable to leave him in the lineup if he had started the season 30for119. In practice, teams regularly send rookies with better numbers in fewer plate appearances down to the minors. Such a kneejerk reaction to a small sample hurts a team by reducing its expectation over the course of the season. Generally speaking, teams faced with an underperforming rookie should be patient unless they have very good reason to believe that he will continue to struggle.
Dan Malkiel is an author of Baseball Prospectus. 9 comments have been left for this article.

One problem with your analysis is that the distribution given by PECOTA is the distribution of possible outcomes over a season (it is based on the full season performance of comparables, including good or bad luck), not the distribution of uncertainty as to true ability. To update a projection of a prospect's performance going forward, you first estimate the variance about true ability by subtracting the "luck" component contained in the PECOTA (the binomial variance based on projected OBP and plate appearances) from the total variance in the projection. You then find the beta distribution that fits the remaining variance (it will have a larger alpha and beta) and use that as your prior distribution to be updated by actual performance.
By the way, if you want to consider offensive characteristics beyond OBP, use EqA instead. It's technically incorrect because it's not based on binary outcomes but in practice it makes very little difference: calculate the number of successes as EqA * PA and add it to alpha and add (1EqA)*PA to beta.
Or, use wOBA. (As heretical a suggestion as that may be.)
There's no PECOTA for wOBA
You make a good point about the distinction between the projection of a player's "true ability" and that of his performance over the course of a season. It's the latter, however, in which we're interested; the possibility that a rookie will get lucky and deliver an OBP much higher than his "true ability" is the reason that sticking with him often leads to a greater expectation for the season.
This analysis could be done using EqA; OBP, however, is a better fit for the model and a good metric for offensive performance in its own right.