Glossary: Sabermetric
View Glossary Entries by
The number of equivalent runs scored by a team, adjusted for the quality of their opponent's pitching and defense.
The number of equivalent runs allowed by a team, adjusted for the quality of their opponent's offense.
Adjusted Pitcher Wins. Thorn and Palmer's method for calculating a starter's value in wins. Included for comparison with SNVA. APW values here calculated using runs instead of earned runs.
Average Pitcher Abuse Points per game started.
The estimated number of real, nine-inning games played at this position.
Statistics that have been adjusted for all-time have all of the adjustments for a single season, plus two more.
One adjustment normalizes the average fielding numbers over time. Historically, the fielding share of total defense has been diminishing with time - more walks, more strikeouts, and more home runs means less work for fielders. In the single-season adjustments, fielders from before WWII have a lot more value than fielders today; the all-time adjustments have attempted to remove that temporal trend.
The second adjustment is for league difficulty. League quality has generally increased with time. Each league has been rated for difficulty and compared to a trend line defined by the post-integration National League.
In addition to the adjustments for season, an adjustment is made for league difficulty.
Statistics that have been adjusted for a single season are the best stats to use when you are only interested in that one season. In these, adjustments have been made to account for the home park and for the offensive level of the league as a whole. Hitters have an adjustment for not having to face their own team's pitchers; pitchers have a similar adjustment for not having to face their own hitters. Hitters in the AL since 1973 have a disadvantage in these statistics, since the league average is artificially inflated by the use of the DH and no adjustment is made for that.
Attrition Rate is the percent chance that a hitter's plate appearances or a pitcher's opposing batters faced will decrease by at least 50% relative to his Baseline playing time forecast. Although it is generally a good indicator of the risk of injury, Attrition Rate will also capture seasons in which his playing time decreases due to poor performance or managerial decisions.
Batting Average on balls put into play. A pitcher's average on batted balls ending a plate appearance, excluding home runs. Based on the research of Voros McCracken and others, BABIP is mostly a function of a pitcher's defense and luck, rather than persistent skill. Thus, pitchers with abnormally high or low BABIPs are good bets to see their performances regress to the mean. A typical BABIP is about .290.
Batters faced pitching.
Batting runs above average (BRAA), adjusted for league difficulty.
Batting runs above replacement (BRAR), adjusted for league difficulty.
Batting runs above a replacement at the same position. A replacement position player is one with an EQA equal to (230/260) times the average EqA for that position.
The Baseline forecast, although it does not appear here, is a crucial intermediate step in creating a player's forecast. The Baseline developed based on the player's previous three seasons of performance. Both major league and (translated) minor league performances are considered. The Baseline forecast is also significant in that it attempts to remove luck from a forecast line. For example, a player who hit .310, but with a poor batting eye and unimpressive speed indicators, is probably not really a .310 hitter. It's more likely that he's a .290 hitter who had a few balls bounce his way, and the Baseline attempts to correct for this.
Similarly, a pitcher with an unusually low EqHR9 rate, but a high flyball rate, is likely to have achieved the low EqHR9 partly as a result of luck. In addition, the Baseline corrects for large disparities between a pitcher's ERA and his PERA, and an unusually high or low hit rate on balls in play, which are highly subject to luck.
Approximate number of batting outs made while playing this position.
Bequeathed runs prevented from scoring. Measures how many more or fewer of the bequeathed baserunners subsequent relievers allowed to score than would be expected from league average performance in those situations. I.e., a positive figure means the following relievers kept more of the bequeathed runners from scoring than expected, negative means more of the runners scored than expected.
Breakout Rate is the percent chance that a hitter's EqR/27 or a pitcher's EqERA will improve by at least 20% relative to the weighted average of his EqR/27 in his three previous seasons of performance. High breakout rates are indicative of upside risk. Breakout rates measure change relative to a player's previously-established level of performance. For this reason, a high Breakout score can create a falsely optimistic picture for a player who has a very poor performance record. It is far easier for a player with a baseline of 40 EqR per season to improve upon that figure by 20% than it is for a player with a baseline of 100 EQR per season; as a result, his Breakout score is likely to be higher (see also Ugueto Effect).
A category 1 start is a start in which the pitcher throws 100 pitches or less.
A category 2 start is a start in which the pitcher throws 101-109 pitches.
A category 3 start is a start in which the pitcher throws 110-121 pitches.
A category 4 start is a start in which the pitcher throws 122-132 pitches.
A category 5 start is a start in which the pitcher throws 133 or more pitches.
For hitters, Collapse Rate is the percent chance that the player's EqR/27 will decrease by at least 20% relative to the weighted average of his EqR/27 in his three previous seasons of performance. For pitchers, Collapse Rate is the percent chance that a pitcher's EqERA will increase by at least 25% relative to his baseline EqERA over his past three seasons. High Collapse Rates are indicative of downside risk.
Comparable Players are the backbone of a player's PECOTA. Only the twenty best comparables are listed here, but as many as 100 players may be used in the generation of his forecast if they are sufficiently comparable.
PECOTA compares each player against a database of roughly 20,000 major league batter seasons since World War II. In addition, it also draws upon a database of roughly 15,000 translated minor league seasons (1997-2006) for players that spent most of their previous season in the minor leagues. (When minor league comparables are used, they appear in ALL CAPS).
PECOTA considers four broad categories of attributes in determining a hitter's comparability:
1. Production metrics--such as batting average, isolated power, and unintentional walk rate for hitters, or strikeout rate and groundball rate for pitchers. 2. Usage metrics, including career length and plate appearances or innings pitched. 3. Phenotypic attributes, including handedness, height, weight, career length (for major leaguers), and minor league level (for prospects). 4. Fielding Position (for hitters) or starting/relief role (for pitchers). PECOTA doesn't require that a comparable hitter play the same defensive position; it is a factor that is evaluated along with many others, and assigned a relatively substantial weight. Consideration is also given to the 'similarity' between two positions; for example, a shortstop will be compared to a second baseman before he is compared to a left fielder.
In most cases, the database is large enough to provide a meaningfully large set of appropriate comparables. When it isn't, the program is designed to 'cheat' by expanding its tolerance for dissimilar players until a reasonable sample size is reached.
Comparable Year represents the season analogous to the current projected year for a comparable player. For example, if Dick Allen is listed as a comparable, and the year listed next to his name is 1974, Allen's 1974 is used as a component of the player's forecast. It also indicates that Allen's Baseline performance entering into the 1974 season was similar to the Baseline performance of the player in question. PECOTA constructs a 182-day interval on either side of a player's birthdate in order to match ages; this method is more precise than the Bill James similarity scores, which use a player's age as of July 1.
Delta between actual wins and W1. Positive number means the team has won more games than expected from their statistics.
Delta between actual wins and W2. Positive number means the team has won more games than expected from their statistics.
Delta between actual wins and W3. Positive number means the team has won more games than expected from their statistics.
Defense-adjusted ERA. Not to be confused with Voros McCracken's Defense-Neutral ERA. Based on the PRAA, DERA is intended to be a defense-independent version of the NRA. As with that statistic, 4.50 is average. Note that if DERA is higher than NRA, you can safely assume he pitched in front of an above-average defense.
Def Eff, or Defensive Efficiency, is the rate at which balls put into play are converted into outs by a team's defense. Def Eff can be approximated with (1 - BABIP), if all you have is BABIP, but a team's actual Def Eff is computed with
1 - ((H + ROE - HR) / (PA - BB - SO - HBP - HR))
the Team Audit Standings use the latter formula.
The number of hits above or below average for this pitcher, based on his own number of balls in play and his team's rate of hits (minus home runs) per ball in play; (H-HR) - BIP * (team (H-HR)/BIP). Essentially, the Voros McCracken number. For a team, Delta-H should be zero. Positive numbers signify more hits allowed than expected ("bad luck," if you believe pitchers have nothing to do with the outcome of a BIP), negative numbers mean fewer hits than expected ("good luck").
The number of runs, more or less, that a pitcher allowed, compared to his statistics. The pitcher's statistics (such as hits, walks, home runs) are run through a modified version of the equivalent runs formula to get estimated runs. Again, positive is "bad luck," negative is "good luck."
The number of wins, more or less, that a pitcher won, compared to estimated wins. Estimated wins are derived from the pitcher's actual runs allowed and team average run scoring. Here, a positive number is "good luck," negative is "bad luck."
Diagnostics are a series of metrics designed to estimate the probability of certain types of changes in production and playing time; see the individual entries for additional detail.
Each league has been given a difficulty level, based on the performance of players in that league compared to the same players' performance in other seasons. The reference difficulty level was defined by the trend line of the National League from 1947 to 2002, and extended backwards to 1871. The difficulty adjustment is the ratio between the actual difficulty level and the reference level.
Drop Rate is the percent chance that a player will not receive any major league plate appearances in a given season, based on comparables who disappear from the dataset entirely. Because of the conventions PECOTA uses in selecting comparables, the Drop Rate is always assumed to be zero for the current year, but it is an important consideration in a hitter's Five-Year Forecast.
Expected loss record for the pitcher, based on how often pitchers with the same innings pitched and runs allowed earned a win or loss historically (this differs from how it was computed, which was a more complicated, theoretical calculation).
Expected win record for the pitcher, based on how often pitchers with the same innings pitched and runs allowed earned a win or loss historically (this differs from how it was computed, which was a more complicated, theoretical calculation).
Also known as a BSP chart, an acronym for bloodstain spatter pattern, which these graphs seem to bear an eerie resemblance toward. The BSP charts plot a rate performance statistic (EqA or EqERA) on the one axis and playing time on the other (PA or IP). Each of the diamonds you see represents the performance implied by one of a player’s comparables; the higher the similarity score for that comparable, the larger the size of the diamond. There is also an area of the chart shaded in a yellow color; this is the ‘golden zone’ of performance in which a player both performs well (an EqA of .300 or higher) and remains in the lineup frequently (at least 500 plate appearances). Pitchers actually have two golden zones, one each for roles as starting pitchers and relievers.
In PECOTA projections, the ERA Distribution chart displays a pitcher's ERA forecast at various levels of probability. It progresses in sequential intervals of five percentage points, ranging from a pitcher's 95th percentile forecast on the left, to his 5th percentile forecast on the right. In addition to the probability distribution for a given pitcher, which appears in blue, the chart also includes a normal distribution on ERA for all pitchers in the league, as adjusted to the player's current park and league context ("Norm"), and a dashed line representing the performance of a replacement level pitcher ("Replace").
Equivalent Average. A measure of total offensive value per out, with corrections for league offensive level, home park, and team pitching. EQA considers batting as well as baserunning, but not the value of a position player's defense. The EqA adjusted for all-time also has a correction for league difficulty. The scale is deliberately set to approximate that of batting average. League average EqA is always equal to .260.
EqA is derived from Raw EqA, which is
RawEqA =(H+TB+1.5*(BB+HBP+SB)+SH+SF-IBB/2)/(AB+BB+HBP+SH+SF+CS+SB)
Any variables which are either missing or which you don't want to use can simply be ignored (be sure you ignore it for both the individual and league, though). You'll also need to calculate the RawEqa for the entire league (LgEqA).
Convert RawEqA into EqR, taking into account the league EqA LgEqA, league runs per plate appearance, the park factor PF, an adjustment pitadj for not having to face your own team's pitchers, and the difficulty rating. Again, you can ignore some of these as the situation requires. xmul can simply be called "2", while the PF, diffic, and pitadj can be set to "1".
xmul=2*(.125/PF/Lg(R/PA)/pitadj)
EQAADJ=xmul*(RawEqa/LgEqa)* ((1+1/diffic)/2) + (1-xmul)
UEQR=EQAADJ*PA*Lg(R/PA)
To get the final, fully adjusted EqA, we need to place this into a team environment.
This is an average team:
AVGTM=Lg(R/Out)*Lg(Outs/game)*PF*Games*(DH adjustment)
The DH adjustment is for playing in a league with a DH. "Games" is the number of games played by this player.
Replacing one player on the average team with our test subject:
TMPLUS=AVGTM+UEQR-OUT*Lg(R/Out)*DH*PF
Get pythagorean exponent
pyexp=((TMPLUS+AVGTM)/Games)**.285
Calculate win percentage
WINPCT=((TMPLUS/AVGTM)**pyexp)/(1+(TMPLUS/AVGTM)**pyexp)
Convert into adjusted space, where the Pythagorean exponent is set to 2.
NEWTM=(WINPCT/(1-WINPCT))**(1/2)
Fully adjusted EqR:
EQR=.17235*((NEWTM-1)*27.*Games + Outs)
Fully adjusted EqA
EQA= (EQR/5/Outs)** 0.4
Equivalent Air Advancement Runs. The number of theoretical runs contributed by a baserunner or baserunners above what would be expected given the number and quality of their baserunning opportunities. EqAAR is based on a multi-year Run Expectancy matrix, is park adjusted, and considers the following scenarios:
Runner on first with second and third unoccupied, less than two outs, a line drive, pop-up, or fly ball is caught by an outfielder
Runner on second but not third, less than two outs, a line drive, pop-up, or fly ball is caught by an outfielder
Runner on third with other bases optionally occupied, less than two outs, a line drive, pop-up, or fly ball is caught by an outfielder
Equivalent Batting Average, sometimes also referred to as Translated or Normalized Batting Average. This is a player's batting average, adjusted for ballpark, league difficulty, and era, and calibrated to an ideal major league where the overall EqBA is .260. While a major league hitter's equivalent stats should not differ substantially from his actual numbers, a minor league hitter's equivalent stats undergo translation and may differ significantly.
EqBB9 is calibrated to an ideal major league where EqBB9 = 3.0.
While a major league pitcher's equivalent stats should not differ substantially from his actual numbers, a minor league pitcher's equivalent stats undergo translation and may differ significantly. Equivalent stats also adjust for park effects.
Equivalent Base Running Runs. Measures the number of runs contributed by a player's advancement on the bases, above what would be expected based on the number and quality of the baserunning opportunities with which the player is presented, park-adjusted and based on a multi-year run expectancy table. EqBRR is calculated as the sum of various baserunning components: Equivalent Ground Advancement Runs (EqGAR), Equivalent Stolen Base Runs (EqSBR), Equivalent Air Advancement Runs (EqAAR), Equivalent Hit Advancement Runs (EqHAR) and Equivalent Other Advancement Runs (EqOAR).
EqERA is calibrated to an ideal major league where EqERA = 4.50.
While a major league pitcher's equivalent stats should not differ substantially from his actual numbers, a minor league pitcher's equivalent stats undergo translation and may differ significantly. Equivalent stats also adjust for park effects, and the quality of a pitcher's defense. EqERA is conceptually identical to NRA, as used in the DT cards.
Equivalent Ground Advancement Runs. The number of theoretical runs contributed by a baserunner or baserunners above what would be expected given the number and quality of baserunning opportunities. EqGAR is based on a multi-year Run Expectancy matrix and considers the following scenarios:
Runner on first only with less than two outs, ground ball or bunt is hit to an infielder where a hit or an error is not credited
Runner on second only with less than two outs, ground ball or bunt is hit to an infielder where a hit or an error is not credited
Runner on third only with less than two outs, ground ball or bunt is hit to an infielder where a hit or an error is not credited
EqH9 is calibrated to an ideal major league where EqH9 = 9.0.
While a major league pitcher's equivalent stats should not differ substantially from his actual numbers, a minor league pitcher's equivalent stats undergo translation and may differ significantly. Equivalent stats also adjust for park effects.
Equivalent Hit Advancement Runs. The number of theoretical runs contributed by a baserunner or baserunners above what would have been expected given the number and quality of opportunities. EqHAR considers advancement from first on singles, second on singles, and first on doubles and is adjusted for park and based on a multi-year Run Expectancy Matrix.
EqHR9 is calibrated to an ideal major league where EqHR9 = 1.0.
While a major league pitcher's equivalent stats should not differ substantially from his actual numbers, a minor league pitcher's equivalent stats undergo translation and may differ significantly. Equivalent stats also adjust for park effects.
EqK9 is calibrated to an ideal major league where EqK9 = 6.0.
While a major league pitcher's equivalent stats should not differ substantially from his actual numbers, a minor league pitcher's equivalent stats undergo translation and may differ significantly. Equivalent stats also adjust for park effects.
EqMLVr, or Equivalent rate-based Marginal Lineup Value, is calibrated to an ideal major league with an overall EqMLVr of .000.
While a major league hitter's equivalent stats should not differ substantially from his actual numbers, a minor league hitter's equivalent stats undergo translation and may differ significantly. Equivalent stats also account for park effects.
Equivalent Other Advancement Runs. Measures the number of runs contributed by a player's advancement on the bases, above what would be expected based on the number and quality of the baserunning opportunities with which the player is presented. Other Advancement takes into consideration a player's opportunities and advancement on the basepaths due to wild pitches, passed balls, and balks. The run value of this advancement is based on a multi-year run expectancy matrix and park-adjusted.
Equivalent Runs; EQR = 5 * OUT * EQA^2.5. In the fielding charts, the estimated number of EqR he had at the plate while playing this position in the field. In Adjusted Standings, EqR refers to the total number of equivalent runs scored by the team.
Equivalent Runs allowed by a team.
Equivalent Stolen Base Runs. The number of theoretical runs contributed by a baserunner or baserunners above what would be expected given the number and quality of their baserunning opportunities. EqSBR is based on a multi-year Run Expectancy matrix and considers both stolen base attempts and pick-offs.
"Fair" runs against average. RA with inherited/bequeathed runners included.
Fielding Runs Above Average.
Fielding runs above average (FRAA), adjusted for league difficulty.
Fielding Runs Above Replacement. The difference between an average player and a replacement player is determined by the number of plays that position is called on to make. That makes the value at each position variable over time. In the all-time adjustments, an average catcher is set to 39 runs above replacement per 162 games, first base to 10, second to 29, third to 22, short to 33, center field to 24, left and right to 14.
Fielding runs above replacement. A fielding statistic, where a replacement player is meant to be approximately equal to the lowest-ranking player at that position, fielding wise, in the majors. Average players at different positions have different FRAR values, which depend on the defensive value of the position; an average shortstop has more FRAR than an average left fielder.
See FRAR, FRAR2. FRAR2 incorporates adjustments for league difficulty and normalizes defensive statistics over time.
The Five-Year Forecast is a player's weighted mean PECOTA forecast, taken over his next five seasons.
The process for generating a player's weighted mean line for a season some number of years into the future (e.g. 2008) is fundamentally identical to generating his forecast for the season immediately upcoming (e.g. 2006). The exception is that some players may have dropped out of the comparables database, in which case their performance cannot be considered. (See also
Jeremy Giambi Effect).
If a player's Drop Rate exceeds 50% (that is, more than half of his comparables are no longer playing professional baseball), then PECOTA does not list his weighted mean line for that season. Instead the season is designated with the tagline 'Out of Baseball'.
Note that the Five-Year Forecast assumes that a player's team context remains the same for all years of the forecast.
Historical Stats are the player's previous three seasons of performance as they appear in the BP book (with the addition of a player's WARP scores).
Inherited Runs. The number of runners inherited by the reliever who scored while the reliever was in the game.
Isolated Power (ISO) is a measure of a hitter's raw power, in terms of extra bases per AB. Its formula is ISO = (2B + (3B*2) + (HR*3)) / AB
In PECOTA, ISO is one of five primary production metrics used in identifying a hitter or pitcher's comparables. PECOTA uses a slightly modified version of Isolated Power that assigns the same value to triples as to doubles (extending a double into a triple is generally an indicator of speed, rather than additional power). Thus, the formula for PECOTA isolated power as follows: ISO = (2B + 3B + (HR*3)) / AB
Improvement Rate is the percent chance that a hitter's EqR/27 or a pitcher's EqERA will improve *at all* relative the weighted average of his EqR/27 or EqERA in his three previous seasons of performance. A player who is expected to perform just the same as he has in the past will have an Improvement Rating of 50%.
Inherited runs prevented from scoring. The expected number of inherited runners that would score in the reliever's appearances based upon league average performance, minus the actual number the reliever allowed to score.
The very self-consciously named Jaffe WARP Score system, which is designed to determine how a Hall of Famer or Hall of Fame candidate measures up to his enshrined peers at his position with regards to his regular season pitching, hitting, and fielding contributions. The goal of JAWS is to identify players who are above-average candidates for Hall of Fame enshrinement in these respects.
A player's JAWS score is the average of his career WARP3 total and his peak total [(Career WARP + Peak WARP) / 2], where peak is a player's best seven seasons (early versions of the system used best five consecutive, but this method was abandoned starting with the 2006 BBWAA ballot). This JAWS score is then compared to a modified average of the enshrined Hall of Famers at each position, with the lowest score--invariably an unqualified Veterans Committee selection--dropped (four pitchers are dropped).
Because the WARP data tends to undergo minor tweaks from time to time, JAWS standards at each position occasionally need to be re-computed. The standards for the 2007 BBWAA and VC ballots as computed from January 2007 data (including 2007 inductees Cal Ripken and Tony Gwynn) are: POS WARP3 Peak JAWS
C 95.7 59.0 77.3
1B 106.1 62.8 84.5
2B 122.8 71.5 97.1
3B 117.4 67.3 92.4
SS 115.2 68.2 91.7
LF 111.1 62.6 86.8
CF 109.1 63.7 86.4
RF 119.8 65.5 92.7
SP 99.0 62.7 80.9
"First order losses." Pythagenport expected losses, based on RS and RA.
"Second order losses." Pythagenport losses, based on EQR and EQRA.
"Third order losses." Pythagenport losses, based on AEQR and AEQRA.
Leverage measures how important the situations a reliever has been used in are. A leverage of 1.00 is the same importance as the start of a game. Leverage values below one represent situations that are less important than the start of a game (such as mopup innings in a blowout). Leverage values above one represent situations with more importance (such as a closer protecting a one-run lead with bases loaded in
the 9th inning).
Mathematically, leverage is based on the win expectancy work done by Keith Woolner in BP 2005, and is defined as the change in the probability of winning the game from scoring (or allowing) one additional run in the current game situation divided by the change in probability from scoring
(or allowing) one run at the start of the game.
The maximum amount of Pitcher Abuse Points a pitcher has accumulated in a single start.
Marginal Lineup Value, a measure of offensive production created by David Tate and further developed by Keith Woolner. MLV is an estimate of the additional number of runs a given player will contribute to a lineup that otherwise consists of average offensive performers. Additional information on MLV can be found here.
MLVr is a rate-based version of Marginal Lineup Value (MLV), a measure of offensive production created by David Tate and further developed by Keith Woolner. MLV is an estimate of the additional number of runs a given player will contribute to a lineup that otherwise consists of average offensive performers. MLVr is approximately equal to MLV per game. The league average MLVr is zero (0.000). Additional information on MLV and MLVr can be found here.
Marginal Value Above Replacement Player, as introduced in this article. MORP is modelled based on the actual behavior of recent free agent markets, and accounts for non-linearity in the market price of baseball talent (e.g. teams are willing to pay more for one 6-win player than two 3-win players).
As listed in a player's PECOTA card, a player's MORP includes the major league minimum salary of $380,000 for 2007. Further, in a player's Five-Year Forecast, we assume salary inflation of 8% per year through 2010 (EXCEPTION: a player's Peak MORP does *not* include the minimum salary or the inflation adjustment.)
For 2007, a player's MORP is estimated as follows:
1200000*(WARP^1.5) + 380000
Normalized Runs Allowed. "Normalized runs" have the same win value, against a league average of 4.5 and a pythagorean exponent of 2, as the player's actual runs allowed did when measured against his league average.
Known outs made by the player, defined by AB-H+CS+SH+SF.
Offensive Winning Percentage. A Bill James stat, usually derived from runs created. In EqA terms, it could be calculated as (EQA/refEQA)^5, where refEQA is some reference EQA, such as league average (always .260) or the position-averaged EQA.
The percentage of the team's total plate appearances that this player had.
Pitcher Abuse Points. When used in the Pitcher Abuse Point report, PAP refers to PAP^3, which assigns 0 PAP to a start in which the pitcher throws 100 or fewer pitches and (PC-100)^3 PAP for all other starts.
PERA is a pitcher's ERA as estimated from his peripheral statistics (EqH9, EqHR9, EqBB9, EqK9). Because it is not sensitive to the timing of batting events, PERA is less subject to luck than ERA, and is a better predictor of ERA going-forward than ERA itself. Like the rest of a pitcher's equivalent stats, his PERA is calibrated to an ideal league with an average PERA of 4.50.
A pitcher's park-adjusted RA, expressed on a scale like ERA or RA. RA+ -- Park and league normalized Run Average. Similar to ERA+ found in Total Baseball, but based on RA rather than ERA.
Positional MLV. Runs contributed by a batter beyond what an average player at the same position would produce in a team of otherwise league-average hitters.
Positional MLV rate. Runs/game contributed by a batter beyond what an average player at the same position would hit in a team of otherwise league-average hitters. Like MLVr, it is a rate stat. The comparable season total is PMLV.
Pitcher-only runs above average. The difference between this and RAA is that RAA is really a total defense statistic, and PRAA tries to isolate the pitching component from the fielding portion. It relies on the pitching/fielding breakdown being run for the team, league, and individual. The individual pitching + defense total is compared to a league average pitcher + team average defense, and the difference is win-adjusted.
Pitcher-only runs above replacement. Similar to PRAA, except that the comparison is made to a replacement level player instead of average. The nominal RA for a replacement pitcher is 6.11 (the same ratio, compared to a 4.50 average, as a .230 EQA is to .260). This assumes that there is a 50/50 split between pitching and fielding. If the pitch/field split is less than that, as it was in the 1800s, the replacement ERA is reduced.
Pitching runs above replacement (PRAR), adjusted for league difficulty.
An adjustment made to account for the fact that some parks are easier to hit in than average, giving an advantage (in raw statistical terms) to hitters who play for that team. Park factors are always made relative to a league average of 1.00. The park adjustments in the BP are made only on the park factor for runs, averaged over five years; they can be found here. The first column is a one-year park factor, the second column is the five-year average centered on that year (assuming the team did not change or massively renovate their park).
The number of additional runs charged to the starting pitcher that his bullpen allowed to score after he left the game, compared to an average bullpen. Negative Pen Support means the bullpen prevented more runs from scoring than an average pen (i.e. the pitcher's ERA looks better than it should because of good bullpen support).
A player's Percentile Forecast is a representation of the player's expected performance in the upcoming season at various levels of probability. For example, if a pitcher's 75th percentile EqERA forecast is 3.52, this indicates that he has a 75% chance to post an EqERA of 3.52 or higher, and a 25% chance to post an ERA less than 3.52. Higher percentiles indicate more favorable outcomes.
The Percentile Forecast is calibrated off of two Key Statistics: EqERA for pitchers, and EqA for batters. The Key Statistics are chosen because they provide the best representation of a player's overall value.
PECOTA runs a series of regressions within the set of comparable data in order to estimate how changes in peripheral statistics are related to changes in the Key Statistic. For example, if it first estimates that Carl Crawford will produce a .290 EqA next year, it then tries to determine what home run total, walk total, and so on are most likely to be associated with a .290 EqA season. PECOTA then iterates this result until the peripherals match ("add up to") the Key Statistic.
Important Note: The Percentile Forecasts are designed to work for the Key Statistic (EqA and EqERA) only. If a player's 90th percentile forecast for home runs is 42, this should not be read to mean that he has a 10% chance of hitting 42 home runs (or more). Rather, it means that he has a 10% chance of having a performance as valuable as the line represented by the 10th percentile forecast, whether this comes from the particular combination of peripheral statistics listed in the percentile line, or an equally valuable (but different) combination of statistics. In particular, the percentile forecasts should not be read literally for counting statistics (HR, W, etc.) because of the complicated interactions between performance and playing time.
Described more completely in the 2002 Prospectus, the breakdown is a sequence of calculations designed to separate the pitching and fielding components of defense from each other. Certain events (walks, strikeouts, home runs) are considered to be entirely the responsibility of the pitcher. Errors and double plays are assumed to be entirely the domain of the fielders. Other hits and outs are assumed to be 75% fielding, 25% pitching.
For Hitters: The Player Profile is a chart that evaluates a given hitter's primary production metrics (batting average, isolated power, unintentional walk rate, strikeout rate, and speed score) as a percentile compared to all major league hitters. For example, a player with an isolated power rating of 75% is superior in this category to three-quarters of all major leaguers. The player profile is based on the player's three previous seasons of performance, rather than his projection. For Pitchers: The Player Profile is a chart that evaluates a pitcher's performance in five categories: strikeout rate, walk rate, opponents' isolated power (e.g. home run rate), hit rate on balls in play, and groundball-to-flyball ratio. The rates are presented as a percentile compared to all major league pitchers; for example, a player with a strikeout rating of 75% is superior in this category to three-quarters of all major leaguers. The player profile is based on the player's three previous seasons of performance, rather than his projection. Note that the denominator for strikeout rate and walk rate as presented in the Player Profile is not innings pitched, but batters faced. This calculation is somewhat more accurate as pitchers differ in the number of batters they face per inning based on their on base average allowed. Note also that, for pitchers, the percentiles take into account whether the pitcher threw in a starting or relief role, as most pitchers post substantially better numbers in relief.
For PECOTA, a player's Position is a consideration in identifying his comparables, as well as in calculating his VORP. The player's primary position as used by PECOTA is listed at the top of his forecast page; however, secondary and tertiary positions are also considered based on the relative amount of appearances that a player receives there. The position determination is made primarily based on the position(s) that a player appeared in his most recent season, with lesser consideration given to the position(s) he appeared other recent previous seasons. Both major league and minor league defensive appearances are considered in the determination of a player's position, but major league appearances are weighted more heavily. PECOTA considers LF, CF and RF to be separate positions.
When listed numerically on our statistical reports, positions are: 1, pitcher; 2, catcher; 3, first base; 4, second base; 5, third base; 6, shortstop; 7, left field; 8, center field; 9, right field; 10, designated hitter; 11, pinch hitter; 12, pinch runner.
A modified form of Bill James' pythagorean formula. Instead of using a fixed exponent (2, 1.83), the "pythagenport" formula derives the exponent from the run environment - the more runs per game, the higher the exponent. The formula for the exponent was X = .45 + 1.5 * log10 ((rs+ra)/g), and then winning percentage is calculated as (rs^x)/(rs^x + ra^x). The formula has been tested for run environments between 4 and 40 runs per game, but breaks down below 4 rpg. The original article is here.
After further review, I (Clay) have come to the conclusion that the so-called Smyth/Patriot method, aka Pythagenpat, is a better fit. In that, X=((rs+ra)/g)^.285, although there is some wiggle room for disagreement in the exponent. Anyway, that equation is simpler, more elegant, and gets the better answer over a wider range of runs scored than Pythagenport, including the mandatory value of 1 at 1 rpg. Go here for more.
QERA, or QuikERA, was described most verbosely by Nate Silver in this article:
QuikERA (QERA), which estimates what a pitcher's ERA should be based solely on his strikeout rate, walk rate, and GB/FB ratio. These three components--K rate, BB rate, GB/FB--stabilize very quickly, and they have the strongest predictive relationship with a pitcher’s ERA going forward. What’s more, they are not very dependent on park effects, allowing us to make reasonable comparisons of pitchers across different teams.
The formula for QERA is as follows:
QERA =(2.69+K%*(-3.4)+BB%*3.88+GB%*(-0.66))^2
Note that everything ends up expressed in terms of percentages: strikeouts per opponent plate appearance, walks per opponent plate appearance, and groundballs as a percentage of all balls hit into play. Andy Pettitte, for example, has a 19.6% K rate, a 7.9% BB rate, and a 62.7% GB rate, giving him a QERA of 3.68. Note further that QERA is exponential, which is appropriate since run scoring is not linear.
Park and league normalized Run Average. Similar to ERA+ found in Total Baseball, but based on RA rather than ERA.
For Pitchers: Runs above average. At its simplest, this would be the league runs per inning, times individual innings, minus individual runs allowed. However, we have gone one step beyond that, because being 50 runs above average in 1930, in the Baker Bowl, doesn't have the same win impact as being +50 in the 1968 Astrodome. The league runs per inning need to be adjusted for park and team hitting (and difficulty, for the alltime RAA), and then you can multiply by individual innings and subtract individual runs. Finally, that quantity needs to be win-adjusted. See win-adjustment. For Fielders: Runs above average at this position, similar to Palmer's Fielding Runs as far as interpretation is concerned.
Runs Above Position: The number of Equivalent Runs this player produced, above what an average player at the same postion would have produced in the same number of outs.
Runs Above Replacement.
For a fielder, it is simply Runs Above Replacement for the position, where a replacement-level fielder is determined to be about 20 runs below average for the position; the number varies slightly depending on the number of balls in play.
Runs Above Replacement, Position-adjusted. A statistic that compares a hitter's Equivalent Run total to that of a replacement-level player who makes the same number of outs and plays the same position. A "replacement level" player is one who has 22.1 fewer EqR per 486 outs than the average for that position. For the overall league average (.260), that corresponds to a .230 EqA and a .351 winning percentage.
Essentially, this is the Equivalent Average analog of VORP.
Raw equivalent average, the first step towards building the EqA. In its fullest form, REQA = (H + TB + 1.5*(BB + HBP + SB) + SH + SF) divided by (AB + BB + HBP + SH + SF + CS + SB). REQA gets converted into unadjusted equivalent runs, UEQR.
Runs Prevented. The extra number of runs an average pitcher would have allowed in the same number of innings pitched (adjusted for park and league). RP greater than zero indicates that the pitcher allowed fewer runs than an average pitcher (i.e. he's better than average). Negative RP indicates the pitcher allowed more runs than an average pitcher (i.e. he's worse then average)
Replacement level MLV rate. Runs/game contributed by a batter beyond what a replacement level player at the same position would hit in a team of otherwise league-average hitters. The comparable season total is RPMLV. It differs from VORPr and VORP only in that it is solely based on batting performance whereas VORP includes basestealing.
A way to look at the fielder's rate of production, equal to 100 plus the number of runs above or below average this fielder is per 100 games. A player with a rate of 110 is 10 runs above average per 100 games, a player with an 87 is 13 runs below average per 100 games, etc.
See Rate. Rate2 incorporates adjustments for league difficulty and normalizes defensive statistics over time.
Support-Neutral Losses. the pitcher's expected number of losses assuming he had league-average support.
SNW / (SNW+SNL)
Support Neutral Value Added - wins above average added by the pitcher's performance.
Support-Neutral Wins. the pitcher's expected number of wins assuming he had league-average support.
Support-Neutral Wins Above Replacement-level. the number of SNWs a pitcher has above what a .425 pitcher would get in the same number of (Support-Neutral) decisions.
Pitcher abuse points divided by number of pitches thrown, or PAP/NP.
Similarity Index is a composite of the similarity scores of all of a player's comparables. Similarity index is a gauge of the player's historical uniqueness; a player with a score of 50 or higher has a very common typology, while a player with a score of 20 or lower is historically unusual. For players with a very low similarity index, PECOTA expands its tolerance for dissimilar comparables until a meaningful sample size is established (see Comparable Players).
Similarity Score is a relative measure of a player's comparability. Its scale is very different from the Bill James similarity scores; a score of 100 is assigned to a perfect comparable, while a score of 0 represents a player who is meaningfully similar. Players can and frequently do receive negative similarity scores, and they are dropped from the analysis. A score above 50 indicates that a player is substantially comparable, and scores in excess of 70 are very unusual. The comparable player observations are weighted based on their similarity score in constructing a forecast.
Speed Score (SPD) is one of five primary production metrics used by PECOTA in identifying a hitter's comparables. It is based in principle on the Bill James speed score and includes five components: Stolen base percentage, stolen base attempts as a percentage of opportunities, triples, double plays grounded into as a percentage of opportunities, and runs scored as a percentage of times on base.
Beginning in 2006, BP has developed a proprietary version of Speed Score that takes better advantage of play-by-play data and ensures that equal weight is given to the five components. In the BP formulation of Speed Score, an average rating is exactly 5.0. The highest and lowest possible scores are 10.0 and 0.0, respectively, but in practice most players fall within the boundary between 7.0 (very fast) and 3.0 (very slow).
The "standard league" is a mythical construction, in which all statistics have been adjusted for easy comparison. Its primary features are that runs scored is 4.5 runs per game; equivalent average is .260; and the pythagorean exponent is exactly 2.00.
In PECOTA, stolen base attempts as a percentage of times on first base.
A rough indicator of the pitcher's overall dominance, based on normalized strikeout rates, walk rates, home run rates, runs allowed, and innings per game. "10" is league average, while "0" is roughly replacement level. The formula is as follows: Stuff = EqK9 * 6 - 1.333 * (EqERA + PERA) - 3 * EqBB9 - 5 * EqHR9 -3 * MAX{6-IP/G),0}
An adjustment made for hitters, to account for not having to face their own pitchers. Using pitching stats, (league R * pf - team R), divided by (league IP - team IP), divided by park-adjusted league runs per inning.
An adjustment made for pitchers, to account for not having to face their own team's batters. Using batting stats, (league runs * pf - team runs), divided by (league PA - team PA), divided by league runs per plate appearance * pf.
Trend identifies players who demonstrate dramatic changes from their Baseline during their comparable year. For Hitters: Hitters who improve their EqR/PA by at least 20% are identified by a green, upward-pointing arrow and contribute to a hitter's Breakout score; hitters whose EqR/PA decreases by at least 20% are identified by a red, downward-pointing arrow and contribute to a hitter's Collapse score. For Pitchers: Pitchers who improve their EqERA by at least 20% are identified by a green, upward-pointing arrow and contribute to a pitcher's Breakout score; pitchers whose EqERA increases by at least 25% are identified by a red, downward-pointing arrow and contribute to a pitcher's Collapse score.
Unadjusted Equivalent Runs; (2 * REQA/LgREQA - 1) * PA * LgR/LgPA. Analogous to runs created.
The Ugueto Effect is name given to the phenomenon in which very poor players are associated with very high PECOTA Breakout scores. It is far easier for a player like Luis Ugueto, who would produce about 40 EQR over a full season, to improve upon that figure by 20% than it is for Alex Rodriguez; as a result, his Breakout score is likely to be higher. This does not mean that Ugueto is a player you'd want anywhere near your roster.
Unintentional Walk Rate (BB) is one of five primary production metrics used by PECOTA in identifying a player's comparables. It is defined as (BB-IBB)/PA.
Value Over Replacement Player. The number of runs contributed beyond what a replacement-level player at the same position would contribute if given the same percentage of team plate appearances. VORP scores do not consider the quality of a player's defense.
See also RARP.
VORP rate. Runs/game contributed beyond what a replacement level player would produce. Also a rate stat.
"First order wins." Pythagenport expected wins, based on RS and RA.
"Second order wins." Pythagenport wins, based on EQR and EQRA.
"Third order wins." Pythagenport wins, based on AEQR and AEQRA.
Wins Above Replacement Player, level 1. The number of wins this player contributed, above what a replacement level hitter, fielder, and pitcher would have done, with adjustments only for within the season. It should be noted that a team which is at replacement level in all three of batting, pitching, and fielding will be an extraordinarily bad team, on the order of 20-25 wins in a 162-game season.
WARP is also listed on a player's PECOTA card. The PECOTA WARP listing is designed to correspond to WARP-1, not WARP-2 or WARP-3.
Wins Above Replacement Player, with difficulty added into the mix. One of the factors that goes into league difficulty is whether or not the league uses a DH, which is why recent AL players tend to get a larger boost than their NL counterparts.
WARP2, expanded to 162 games to compensate for shortened seasons. Initially, I was just going to use (162/season length) as the multiplier, but this seemed to overexpand the very short seasons of the 19th century. I settled on using (162/scheduled games) ** (2/3). So Ross Barnes' 6.2 wins in 1873, a 55 game season, only gets extended to 12.8 WARP, instead of a straight-line adjustment of 18.3.
For most hitters, at least, it is just that simple. Pitchers are treated differently, as we not only look at season length, but the typical number of innings thrown by a top starting pitcher that year (defined by the average IP of the top five in IP). We find it hard to argue that pitchers throwing 300 or more innings a year are suffering some sort of discrimination in the standings due to having shortened seasons. This why Walter Johnson has almost no adjustment between WARP2 and WARP3, while his contemporaries Cobb, Speaker, and Collins all gain around 7 or 8 wins.
The probability of winning the current game, given some
information about how many runs each team has scored to a certain point in the game, how many outs there are, whether there are runners on base, and the strength of each team. Keith Woolner outlined a method for computing Win Expectancy given all of these parameters in BP 2005.
Expected wins added over an average pitcher, adjusted for level of opposing hitters faced. WXL factors in the MLVr of the actual batters faced by the relievers. Then, like WX, WXL uses win expectancy calculations to assess how relievers have changed the outcome of games.
Expected wins added over a replacement level pitcher. WXR uses win expectancy calculations to assess how relievers have changed the outcome of games, similar to WX. However, instead of comparing the pitcher's performance to an average pitcher, he is compared to a replacement level pitcher to determine WXR.
Expected wins added over a replacement level pitcher, adjusted for level of opposing hitters. WXRL combines the individual adjustments for replacement level (WXR) and quality of the opposing lineup (WXL) to the basic WX calculation.
The Weighted Mean forecast incorporates all of the player's potential outcomes into a single average, weighted baed on projected playing time. In almost all cases, poor performances are associated with a reduced number of plate appearances. For that reason, they don't hurt a player's team quite as much as good performances help it; the weighting is designed to compensate for this effect (see also Jeremy Giambi Effect).
EXCEPTION: a player's projected PLAYING TIME (and therefore, his counting statistics that are incumbent on his playing time) is taken based on the median of his comparables' performance, rather than the weighted mean. This is designed to mitigate the influence of catastrophic injuries, which are better represented by Attrition Rate.
This exception does NOT affect a player's WARP and VORP forecast, which are calculated per the weighted mean method, treating players who dropped out of the database as having zero WARP/VORP.
A correction made to raw runs when converting them to a standard league to preserve their win value. Define an average team from season games played, league runs per game (9 innings or 27 outs, depending on whether you are using pitcher or batter data), and appropriate adjustments (park, team hitting/pitching, difficulty). "Team" is the effect of replacing one player on the average team with the player we are analyzing. Calculate the pythagorean exponent from (average + team) / games as your RPG entry; calculate winning percentage using the modified pythagorean formula. Now, go backwards, solve for "team" runs, given the winning percentage, an average team that scores 4.5 per game, and a pythagorean exponent of 2.00.
See WARP-1.
Adjusted Innings Pitched; used for the PRAA and PRAR statistics. There are two separate adjustments: 1) Decisions. Innings are redistributed among the members of the team to favor those who took part in more decisions (wins, losses, and saves) than their innings alone would lead you to expect. The main incentive was to do a better job recognizing the value of closers than a simple runs above average approach would permit. XIPA for the team, after this adjustment, will equal team innings. First, adjust the wins and saves; let X = (team wins) / (team wins + saves). Multiply that by individual (wins + saves) to get an adjusted win total. Add losses. Multiply by team innings divided by team wins and losses. 2) Pitcher/fielder share. When I do the pitch/field breakdown for individuals, one of the stats that gets separated is innings. If an individual pitcher has more pitcher-specific innings than an average pitcher with the same total innings would have, than the difference is added to his XIPA. If a pitcher has fewer than average, the difference is subtracted. This creates a deliberate bias in favor of pitchers who are more independent of their fielders (the strikeout pitchers, basically), and against those who are highly dependent on their defenses (the Tommy John types).
|