| PECOTA Pitching Forecast Glossary and Reference |
|
2004 Forecast is a representation of the pitcher's expected performance in the upcoming season at various levels of probability. For example, if a pitcher's 75th percentile ERA forecast is 3.52, this indicates that he has a 75% chance to post an ERA of 3.52 or higher, and a 25% chance to post an ERA less than 3.52. Higher percentiles indicate more favorable outcomes. PECOTA runs a series of regressions within the set of comparable data in order to estimate how changes in peripheral statistics are related to changes in ERA. For example, if it first estimates that Woody Williams will produce an ERA of 4.29 next year, it then tries to determine what strikeout rate, walk rate, and so on are most likely to be associated with a 4.29 ERA season. A player's 2004 numbers are adjusted to the park and league context for the team listed at the top of the forecast page. Park factors are based on a three-year average over the period 2001-2003, except for teams that have changed ballparks. In addition, the pitcher forecasts include an adjustment for team defense which affects the pitcher's H/BIP. PECOTA forecasts playing time (games and innings pitched) in addition to a player's rate statistics. These forecasts are based on a player's previous record of performance, and do not incorporate any additional information about managerial decisions. Attrition Rate is the percent chance that a pitcher's opposing batters faced will decrease by at least 50% relative to his Baseline. Although it is generally a good indicator of the risk of injury, attrition rate will also capture seasons in which his playing time decreases due to poor performance or managerial decisions. The Baseline forecast, although it does not appear here, is a crucial intermediate step in creating a pitcher's forecast. The Baseline developed based on the player's previous three seasons of performance. Both major league and (translated) minor league performances are considered. The Baseline forecast is also significant in that it attempts to remove luck from a forecast line. For example, a pitcher with an unusually low EqHR9 rate, but poor peripherals otherwise, is likely to have achieved the low EqHR9 partly as a result of luck; the Baseline attempts to correct for this. In addition, the Baseline corrects for large disparities between a pitcher's ERA and his PERA, and an unusually high or low hit rate on balls in play, which are highly subject to luck. Breakout Rate is the percent chance that a pitcher's PERA will improve by at least 20% relative to his Baseline. High breakout rates are indicative of upside risk. Collapse Rate is the percent chance that a pitcher's PERA will increase by at least 25% relative to his Baseline. High collapse rates are indicative of downside risk. Comparable Pitchers are the backbone of a pitcher's PECOTA. Only the twenty best comparables are listed here, but as many as 100 players may be used in the generation of his forecast if they are sufficiently comparable . PECOTA compares each pitcher against a database of roughly 10,000 pitcher seasons since World War II. Pitchers are compared only against others of the same age. PECOTA considers four broad categories of attributes in determining comparability:
In most cases, the database is large enough to provide a meaningfully large set of appropriate comparables. When it isn't, the program is designed to 'cheat' by expanding its tolerance for dissimilar players until a reasonable sample size is reached. In the case of very old or very young pitchers, there may not be a significant number of pitchers who appeared in the major leagues at all at that age, and so the results of their forecast may be unreliable. Comparable Year represents the season analogous for 2004 for a comparable pitcher. For example, if Luis Tiant is listed as a comparable, and year listed next to his name is 1974, Tiant's 1974 is used as a component of the pitcher's forecast. It also indicates that Tiant's Baseline performance entering into the 1974 season was similar to the Baseline performance of the player in question. PECOTA constructs a 182-day interval on either side of a player's birthdate in order to match ages; this method is more precise than the Bill James similarity scores, which use a player's age as of July 1. Diagnostics are a series of metrics designed to estimate the probability of certain types of changes in production and playing time; see the individual entries for additional detail. Drop Rate is the percent chance that a pitcher will not face any major league batters in a given season, based on comparables who disappear from the dataset entirely. Because of the conventions PECOTA uses in selecting comparables, the Drop Rate is always assumed to be zero for 2004, but it is an important consideration in a pitcher's Five-Year Forecast.
EqERA, EqH9, EqHR9, EqBB9 and EqK9 are calibrated to an ideal major league with the following characteristics:
EqERA = 4.50 EqH9 = 9.0 EqHR9 = 1.0 EqBB9 = 3.0 EqK9 = 6.0 While a major league pitcher's equivalent stats should not differ substantially from his actual numbers, a minor league pitcher's equivalent stats undergo translation and may differ significantly. Equivalent stats also adjust for park effects. The ERA Distribution chart displays a pitcher's ERA forecast at various levels of probability (see discussion). It progresses in sequential intervals of five percentage points, ranging from a pitcher's 95th percentile forecast on the left, to his 5th percentile forecast on the right. In addition to the probability distribution for a given pitcher, which appears in blue, the chart also includes a normal distribution on ERA for all pitchers in the league, as adjusted to the player's current park and league context ("Norm"), and a dashed line representing the performance of a replacement level pitcher ("Replace"). The Five-Year Forecast presents a series of high-level measurements designed to analyze a pitcher's value over the forthcoming five seasons. It is derived from the same set of comparables used to generate his 2004 forecast, and assumes that the pitcher remains with the same team and in the same league over the entire five-year period. The Five-Year Forecast consists of three parts:
H/BIP is an estimate of the percentage of non-HR hits that result from balls in play. This ratio is mostly the result of team defense and luck, and has a strong tendency to regress to the mean in forthcoming seasons. In 2002, the league-wide H/BIP was 28.9% in the American League, and 28.8% in the National League. However, the pitcher's H/BIP is allowed to diverge from league norms based on his performance history, and his team defense. Historical Stats are the player's previous three seasons of performance as they appear in Baseball Prospectus 2004. ERA and all columns to the left of it are raw statistics, while EqH9 and all columns to the right of it are translated statistics. Improvement Rate is the percent chance that a pitcher's PERA will improve at all relative to his Baseline. A pitcher who is expected to perform just the same as he has in the past will have an Improvement Rating of 50%. PERA is a pitcher's ERA as estimated from his peripheral statistics (EqH9, EqHR9, EqK9, EqK9). Because it is not sensitive to the timing of batting events, PERA is less subject to luck than ERA, and is a better predictor of ERA going-forward than ERA itself. Like the rest of a pitcher's equivalent stats, his PERA is calibrated to an ideal league with an average PERA of 4.50. Percentile. See 2004 Forecast. The Player Profile is a chart that evaluates a pitcher's performance in four categories: strikeout rate, walk rate, home run allowed rate, and hit rate on balls in play. The rates are presented as a percentile compared to all major league pitchers; for example, a player with an strikeout rating of 75% is superior in this category to three-quarters of all major leaguers. The player profile is based on the player's three previous seasons of performance, rather than his projection. Note also that the denominator for strikeout rate, walk rate, and home run rate as presented in the Player Profile is not innings pitched, but batters faced. This calculation is somewhat more accurate as pitchers differ in the number of batters they face per inning based on their on base average allowed. Similarity Index is a composite of the similarity scores of all of a pitcher's comparables. Similarity index is an gauge of the pitcher's historical uniqueness; a pitcher with a score of 50 or higher has a very common typology, while a pitcher with a score of 20 or lower is historically unusual. For pitchers with a very low similarity index, PECOTA expands its tolerance for dissimilar comparables until a meaningful sample size is established (see discussion). Similarity Score is a relative measure of a pitcher's comparability. Its scale is very different from the Bill James similarity scores; a score of 100 is assigned to a perfect comparable, while a score of 0 represents a pitcher who is meaningfully similar. Pitchers can and frequently do receive negative similarity scores, and they are dropped from the analysis. A score above 50 indicates that a pitcher is substantially comparable, and scores in excess of 70 are very unusual. The comparable pitcher observations are weighted based on their similarity score in constructing a forecast. Stuff is a metric created by Clay Davenport which is intended to measure those elements of a pitcher's line that are best correlated with his success going forward. The formula for Stuff is as follows: Stuff = EqK9 * 6 - 1.333 * (EqERA + PERA) - 3 * EqBB9 - 5 * EqHR9 -3 * MAX{6-IP/G),0} Note that the last term is an adjustment for innings pitched per game which reflects that a pitcher will generally post a higher Stuff score in appearances of shorter duration. In addition, the Stuff score includes an adjustment for age that is not reflected in the formula above. Trend identifies pitchers who demonstrate dramatic changes from their Baseline during their comparable year. Trend is designed to correspond to a pitcher's Breakout and Collapse scores. Pitchers who improve their PERA by at least 20% are identified by a green, upward-pointing arrow and contribute to a pitcher's Breakout score; pitchers whose ERA increases by at least 25% are identified by a red, downward-pointing arrow and contribute to a pitcher's Collapse score. The Value Distribution chart plots a pitcher's wins above replacement at various levels of probability (see discussion). It accounts for both the quantity and the quality of his expected performance. VORP, created by Keith Woolner, is an estimate of a pitcher's value over and above a replacement player, as measured in runs. Because it accounts for both quantity and quality of a pitcher's performance, it is the single best measure for assessing his value. Runs allowed, rather than earned runs allowed, are used in the calculation of VORP. An extensive description of the derivation of VORP can be found here. The Weighted Mean forecast incorporates all of the player's potential outcomes into a single average (see also 2004 Forecast), with an additional adjustment for playing time. In almost all cases, poor performances are associated with a reduced number of innings pitched. For that reason, they don't hurt a player's team quite as much as good performances help it; the weighting is designed to compensate for this effect. Wins is a conversion of a pitcher's VORP to wins added over a 162-game season, based on a version of the Pythagorean formula. |
|
Baseball Prospectus Home
|
Terms of Service
|
Privacy Policy
|
Contact Us Copyright © 1996-2004 Prospectus Entertainment Ventures, LLC. |