Glossary: Davenport Translations
View Glossary Entries by
Doubles or doubles allowed.
Triples or triples allowed.
Assists.
At-bats; official plate appearances where the batter doesn't walk or get hit by the pitch, doesn't hit some kind of recognized sacrifice, and isn't interfered with by the catcher.
Batting average; hits divided by at-bats.
The estimated number of real, nine-inning games played at this position.
Statistics that have been adjusted for all-time have all of the adjustments for a single season, plus two more.
One adjustment normalizes the average fielding numbers over time. Historically, the fielding share of total defense has been diminishing with time - more walks, more strikeouts, and more home runs means less work for fielders. In the single-season adjustments, fielders from before WWII have a lot more value than fielders today; the all-time adjustments have attempted to remove that temporal trend.
The second adjustment is for league difficulty. League quality has generally increased with time. Each league has been rated for difficulty and compared to a trend line defined by the post-integration National League.
In addition to the adjustments for season, an adjustment is made for league difficulty.
Statistics that have been adjusted for a single season are the best stats to use when you are only interested in that one season. In these, adjustments have been made to account for the home park and for the offensive level of the league as a whole. Hitters have an adjustment for not having to face their own team's pitchers; pitchers have a similar adjustment for not having to face their own hitters. Hitters in the AL since 1973 have a disadvantage in these statistics, since the league average is artificially inflated by the use of the DH and no adjustment is made for that.
In Davenport translations, player age as of July 1. Players for whom birth information is unknown have an age of "0."
Bases on Balls (also known as walks), or bases on balls allowed.
Balks. Not recorded 1876-1880.
Batting runs above a replacement at the same position. A replacement position player is one with an EQA equal to (230/260) times the average EqA for that position.
Approximate number of batting outs made while playing this position.
Complete Games.
Caught stealing. CS are not available for the NL from 1876-1950 (except for 1915, 1920-25, and some players for 1916), in the AL from 1901-19 (except 1914-15 and some 1916 players), and are not available at all for the AA, UA, PL, or FL. Surprisingly, they are available for the NA. In catcher's fielding, not available prior to 1978.
Defense-adjusted ERA. Not to be confused with Voros McCracken's Defense-Neutral ERA. Based on the PRAA, DERA is intended to be a defense-independent version of the NRA. As with that statistic, 4.50 is average. Note that if DERA is higher than NRA, you can safely assume he pitched in front of an above-average defense.
Double plays, turned or hit into.
The number of hits above or below average for this pitcher, based on his own number of balls in play and his team's rate of hits (minus home runs) per ball in play; (H-HR) - BIP * (team (H-HR)/BIP). Essentially, the Voros McCracken number. For a team, Delta-H should be zero. Positive numbers signify more hits allowed than expected ("bad luck," if you believe pitchers have nothing to do with the outcome of a BIP), negative numbers mean fewer hits than expected ("good luck").
The number of runs, more or less, that a pitcher allowed, compared to his statistics. The pitcher's statistics (such as hits, walks, home runs) are run through a modified version of the equivalent runs formula to get estimated runs. Again, positive is "bad luck," negative is "good luck."
The number of wins, more or less, that a pitcher won, compared to estimated wins. Estimated wins are derived from the pitcher's actual runs allowed and team average run scoring. Here, a positive number is "good luck," negative is "bad luck."
Each league has been given a difficulty level, based on the performance of players in that league compared to the same players' performance in other seasons. The reference difficulty level was defined by the trend line of the National League from 1947 to 2002, and extended backwards to 1871. The difficulty adjustment is the ratio between the actual difficulty level and the reference level.
Errors.
Earned Runs.
Equivalent Average. A measure of total offensive value per out, with corrections for league offensive level, home park, and team pitching. EQA considers batting as well as baserunning, but not the value of a position player's defense. The EqA adjusted for all-time also has a correction for league difficulty. The scale is deliberately set to approximate that of batting average. League average EqA is always equal to .260.
EqA is derived from Raw EqA, which is
RawEqA =(H+TB+1.5*(BB+HBP+SB)+SH+SF-IBB/2)/(AB+BB+HBP+SH+SF+CS+SB)
Any variables which are either missing or which you don't want to use can simply be ignored (be sure you ignore it for both the individual and league, though). You'll also need to calculate the RawEqa for the entire league (LgEqA).
Convert RawEqA into EqR, taking into account the league EqA LgEqA, league runs per plate appearance, the park factor PF, an adjustment pitadj for not having to face your own team's pitchers, and the difficulty rating. Again, you can ignore some of these as the situation requires. xmul can simply be called "2", while the PF, diffic, and pitadj can be set to "1".
xmul=2*(.125/PF/Lg(R/PA)/pitadj)
EQAADJ=xmul*(RawEqa/LgEqa)* ((1+1/diffic)/2) + (1-xmul)
UEQR=EQAADJ*PA*Lg(R/PA)
To get the final, fully adjusted EqA, we need to place this into a team environment.
This is an average team:
AVGTM=Lg(R/Out)*Lg(Outs/game)*PF*Games*(DH adjustment)
The DH adjustment is for playing in a league with a DH. "Games" is the number of games played by this player.
Replacing one player on the average team with our test subject:
TMPLUS=AVGTM+UEQR-OUT*Lg(R/Out)*DH*PF
Get pythagorean exponent
pyexp=((TMPLUS+AVGTM)/Games)**.285
Calculate win percentage
WINPCT=((TMPLUS/AVGTM)**pyexp)/(1+(TMPLUS/AVGTM)**pyexp)
Convert into adjusted space, where the Pythagorean exponent is set to 2.
NEWTM=(WINPCT/(1-WINPCT))**(1/2)
Fully adjusted EqR:
EQR=.17235*((NEWTM-1)*27.*Games + Outs)
Fully adjusted EqA
EQA= (EQR/5/Outs)** 0.4
Equivalent Average, as taken from the Davenport Translation (DT) Player Cards. EqA1 is EqA adjusted for the season in which the performance occurred, as opposed to EqA2, which is adjusted for comparisons across multiple seasons or eras. For example, if you wanted to compare Albert Pujols's 2008 performance against those of other players in the 2008 season, you would reference his EqA1; if you wanted to compare Pujols's 2008 against Lou Gehrig's seasons in the 1920s and 30s, you would reference Pujols's and Gehrig's respective EqA2s.
Equivalent Batting Average, sometimes also referred to as Translated or Normalized Batting Average. This is a player's batting average, adjusted for ballpark, league difficulty, and era, and calibrated to an ideal major league where the overall EqBA is .260. While a major league hitter's equivalent stats should not differ substantially from his actual numbers, a minor league hitter's equivalent stats undergo translation and may differ significantly.
Equivalent Runs; EQR = 5 * OUT * EQA^2.5. In the fielding charts, the estimated number of EqR he had at the plate while playing this position in the field. In Adjusted Standings, EqR refers to the total number of equivalent runs scored by the team.
Fielding Runs Above Average.
Fielding Runs Above Replacement. The difference between an average player and a replacement player is determined by the number of plays that position is called on to make. That makes the value at each position variable over time. In the all-time adjustments, an average catcher is set to 39 runs above replacement per 162 games, first base to 10, second to 29, third to 22, short to 33, center field to 24, left and right to 14.
Fielding runs above replacement. A fielding statistic, where a replacement player is meant to be approximately equal to the lowest-ranking player at that position, fielding wise, in the majors. Average players at different positions have different FRAR values, which depend on the defensive value of the position; an average shortstop has more FRAR than an average left fielder.
See FRAR, FRAR2. FRAR2 incorporates adjustments for league difficulty and normalizes defensive statistics over time.
Games played (pitched, fielded, officiated). Properly speaking, a pitcher should only be credited with a game played on his batting line when he actually appears in the lineup (i.e., not when a DH hits for him.) The BP database is currently inconsistent in this respect.
Grounded into double play. Not recorded prior to 1933 in the NL, or 1939 in the AL, and not at all for the other leagues. Unfortunately, without opportunity information, I don't find it very useful for inclusion in EqA. There is also evidence, from Tom Ruane, that players who hit into more DP also tend to advance more runners with outs, enough to offset the DPs.
Hit by pitch. Not recorded for the NL 1876-1886, the AA in 1882-83, the 1884 UA, and the 1871-75 NA, for either hitters or pitchers.
Home runs, or home runs allowed.
Intentional walks. Not recorded for any league prior to 1955.
Refers to a pitcher's losses. In context of a team rather than an individual pitcher, refers to team losses. In VORP and PAP reports, refers to league.
Normalized Runs Allowed. "Normalized runs" have the same win value, against a league average of 4.5 and a pythagorean exponent of 2, as the player's actual runs allowed did when measured against his league average.
On-base percentage. (H + BB + HBP) divided by (AB + BB + HBP + SF). For pitchers, OBP is on base percentage allowed.
Known outs made by the player, defined by AB-H+CS+SH+SF.
Offensive Winning Percentage. A Bill James stat, usually derived from runs created. In EqA terms, it could be calculated as (EQA/refEQA)^5, where refEQA is some reference EQA, such as league average (always .260) or the position-averaged EQA.
Plate appearances; AB + BB + HBP + SH + SF.
The percentage of the team's total plate appearances that this player had.
Passed balls; not available for the NA.
Putouts.
Power Percentage, a statistic created by Julien Headley, is described here. POW describes extra-base power on contact, in terms similar to Isolated Power. The formulation used in the Minor League Statistics and Translations page is POW=(DB+(2*TP)+(3*HR))/(AB-SO).
Pitcher-only runs above average. The difference between this and RAA is that RAA is really a total defense statistic, and PRAA tries to isolate the pitching component from the fielding portion. It relies on the pitching/fielding breakdown being run for the team, league, and individual. The individual pitching + defense total is compared to a league average pitcher + team average defense, and the difference is win-adjusted.
Pitcher-only runs above replacement. Similar to PRAA, except that the comparison is made to a replacement level player instead of average. The nominal RA for a replacement pitcher is 6.11 (the same ratio, compared to a 4.50 average, as a .230 EQA is to .260). This assumes that there is a 50/50 split between pitching and fielding. If the pitch/field split is less than that, as it was in the 1800s, the replacement ERA is reduced.
An adjustment made to account for the fact that some parks are easier to hit in than average, giving an advantage (in raw statistical terms) to hitters who play for that team. Park factors are always made relative to a league average of 1.00. The park adjustments in the BP are made only on the park factor for runs, averaged over five years; they can be found here. The first column is a one-year park factor, the second column is the five-year average centered on that year (assuming the team did not change or massively renovate their park).
Described more completely in the 2002 Prospectus, the breakdown is a sequence of calculations designed to separate the pitching and fielding components of defense from each other. Certain events (walks, strikeouts, home runs) are considered to be entirely the responsibility of the pitcher. Errors and double plays are assumed to be entirely the domain of the fielders. Other hits and outs are assumed to be 75% fielding, 25% pitching.
A modified form of Bill James' pythagorean formula. Instead of using a fixed exponent (2, 1.83), the "pythagenport" formula derives the exponent from the run environment - the more runs per game, the higher the exponent. The formula for the exponent was X = .45 + 1.5 * log10 ((rs+ra)/g), and then winning percentage is calculated as (rs^x)/(rs^x + ra^x). The formula has been tested for run environments between 4 and 40 runs per game, but breaks down below 4 rpg. The original article is here.
After further review, I (Clay) have come to the conclusion that the so-called Smyth/Patriot method, aka Pythagenpat, is a better fit. In that, X=((rs+ra)/g)^.285, although there is some wiggle room for disagreement in the exponent. Anyway, that equation is simpler, more elegant, and gets the better answer over a wider range of runs scored than Pythagenport, including the mandatory value of 1 at 1 rpg. Go here for more.
Runs scored (for hitters) or allowed (pitchers).
For Pitchers: Runs above average. At its simplest, this would be the league runs per inning, times individual innings, minus individual runs allowed. However, we have gone one step beyond that, because being 50 runs above average in 1930, in the Baker Bowl, doesn't have the same win impact as being +50 in the 1968 Astrodome. The league runs per inning need to be adjusted for park and team hitting (and difficulty, for the alltime RAA), and then you can multiply by individual innings and subtract individual runs. Finally, that quantity needs to be win-adjusted. See win-adjustment. For Fielders: Runs above average at this position, similar to Palmer's Fielding Runs as far as interpretation is concerned.
Runs Above Position: The number of Equivalent Runs this player produced, above what an average player at the same postion would have produced in the same number of outs.
Runs Above Replacement.
For a fielder, it is simply Runs Above Replacement for the position, where a replacement-level fielder is determined to be about 20 runs below average for the position; the number varies slightly depending on the number of balls in play.
Runs Above Replacement, Position-adjusted. A statistic that compares a hitter's Equivalent Run total to that of a replacement-level player who makes the same number of outs and plays the same position. A "replacement level" player is one who has 22.1 fewer EqR per 486 outs than the average for that position. For the overall league average (.260), that corresponds to a .230 EqA and a .351 winning percentage.
Essentially, this is the Equivalent Average analog of VORP.
Runs Batted In.
Raw equivalent average, the first step towards building the EqA. In its fullest form, REQA = (H + TB + 1.5*(BB + HBP + SB) + SH + SF) divided by (AB + BB + HBP + SH + SF + CS + SB). REQA gets converted into unadjusted equivalent runs, UEQR.
A way to look at the fielder's rate of production, equal to 100 plus the number of runs above or below average this fielder is per 100 games. A player with a rate of 110 is 10 runs above average per 100 games, a player with an 87 is 13 runs below average per 100 games, etc.
See Rate. Rate2 incorporates adjustments for league difficulty and normalizes defensive statistics over time.
Stolen bases. Not recorded for any league between 1876 and 1885. On the catcher's fielding charts, not available prior to 1978.
Sacrifice flies. The statistical category of "sacrifice flies" did not exist prior to 1954; the concept had been around, on and off, since 1908, but had been always been part of the "SH" category. See SH.
Sacrifice hits. Not recorded prior to 1894. From 1894-1907, they were essentially the same as the modern rule - a bunt which advanced a baserunner. From 1908-25, they included what we would now call a sacrifice fly (sacrifices increase 25% between 1907 and 1908 as a result). From 1926-30, they included any fly ball on which a runner advanced, not just ones where the runner scored (another 25% increase in 1926). From 1931-38, sacrifice flies were eliminated completely (causing a 45% drop in sacrifices, and a 4-point decline in batting averages); that brought us back to the modern definition of sacrifice hit. In 1939 they re-introduced the run-scoring sac fly (returning to the 1908-25 rules), but eliminated it again in 1940. When sacrifice flies appeared again in 1954, they had their own category, so the rule for what we would call a sacrifice hit has not changed since 1940.
Shutouts.
Slugging percentage (hitters) or slugging percentage allowed (pitchers). Total bases divided by at-bats.
Strikeouts. For pitchers, batters struck out, for batters, times struck out.
Saves.
The "standard league" is a mythical construction, in which all statistics have been adjusted for easy comparison. Its primary features are that runs scored is 4.5 runs per game; equivalent average is .260; and the pythagorean exponent is exactly 2.00.
A rough indicator of the pitcher's overall dominance, based on normalized strikeout rates, walk rates, home run rates, runs allowed, and innings per game. "10" is league average, while "0" is roughly replacement level. The formula is as follows: Stuff = EqK9 * 6 - 1.333 * (EqERA + PERA) - 3 * EqBB9 - 5 * EqHR9 -3 * MAX{6-IP/G),0}
Total batters faced. Not recorded for the NL 1876-1886, the AA of 1882-83, the 1884 UA, or the NA of 1871-75.
As used in most places (including the PECOTA cards), Team is the three letter abbreviation for a major league, minor league, or foreign team. This page contains the list of teams and their abbreviations. The Davenport Translations Player Cards have slightly different abbreviations, with a three-character team signifier, followed by a league signifier. The leagues are as follows: N signifies the National Association of 1871-1875 and the National League of 1876-present. A is for both the American Association (1882-1891, a major league, separate from the later minor league of the same name) and the 1901-present American League. U is the Union Association of 1884, P the Players League of 1890, and F the Federal League of 1914-15. For example, the Boston Red Sox are BOS-A, where the "A" signifies an American League team, while BOS-N refers to the Boston Braves National League franchise. At this time, for players who played for more than one team in a season, the order in which the various team stints are shown is not necessarily chronological.
Translated at-bats: number of at-bats adjusted for park and season.
Translated batting average: batting average adjusted for park and season. Equal to T_H / T_AB.
Translated doubles: number of doubles adjusted for park and season.
Translated triples: number of triples adjusted for park and season.
Translated walks: number of walks adjusted for park and season.
Translated caught stealing: number of times caught stealing adjusted for park and season.
Translated hits: number of hits adjusted for park and season.
Translated hit by pitch: number of times hit by pitch adjusted for park and season.
Translated home runs: number of home runs adjusted for park and season.
Translated OBP: on-base percentage adjusted for park and season.
Translated outs: number of outs made (AB-H+CS+SH+SF) adjusted for park and season.
Translated runs: number of runs scored adjusted for park and season.
Translated RBI: number of runs batted in adjusted for park and season.
Translated stolen bases: number of stolen bases adjusted for park and season.
Translated SLG: slugging percentage adjusted for park and season.
Translated strikeouts: number of strikeouts adjusted for park and season.
An adjustment made for hitters, to account for not having to face their own pitchers. Using pitching stats, (league R * pf - team R), divided by (league IP - team IP), divided by park-adjusted league runs per inning.
An adjustment made for pitchers, to account for not having to face their own team's batters. Using batting stats, (league runs * pf - team runs), divided by (league PA - team PA), divided by league runs per plate appearance * pf.
Hits plus doubles plus two times triples plus three times home runs.
Converts the player's batting statistics into a context that is the same for everybody. The major characteristics of the translation are: 1) that the translated EQA should equal the original, all-time adjusted EQA (within some margin for error); 2) that all seasons are expanded to a 162 game schedule; 3) that the statistics are adjusted to a season where an average hitter would have, per 650 PA: 589 AB, 153 H, 31 DB, 3 TP, 19 HR, 56 BB, 5 HBP, 113 SO, 10 SB, 5 CS, 79 R and 75 RBI. His rates would be a .260 batting average, .330 onbase average, .420 slugging average, and a .260 EQA with 76 EQR.
Unadjusted Equivalent Runs; (2 * REQA/LgREQA - 1) * PA * LgR/LgPA. Analogous to runs created.
Refers to a pitcher's wins. In context of a team rather than an individual pitcher, refers to team wins.
Wins Above Replacement Player, level 1. The number of wins this player contributed, above what a replacement level hitter, fielder, and pitcher would have done, with adjustments only for within the season. It should be noted that a team which is at replacement level in all three of batting, pitching, and fielding will be an extraordinarily bad team, on the order of 20-25 wins in a 162-game season.
WARP is also listed on a player's PECOTA card. The PECOTA WARP listing is designed to correspond to WARP-1, not WARP-2 or WARP-3.
Wins Above Replacement Player, with difficulty added into the mix. One of the factors that goes into league difficulty is whether or not the league uses a DH, which is why recent AL players tend to get a larger boost than their NL counterparts.
WARP2, expanded to 162 games to compensate for shortened seasons. Initially, I was just going to use (162/season length) as the multiplier, but this seemed to overexpand the very short seasons of the 19th century. I settled on using (162/scheduled games) ** (2/3). So Ross Barnes' 6.2 wins in 1873, a 55 game season, only gets extended to 12.8 WARP, instead of a straight-line adjustment of 18.3.
For most hitters, at least, it is just that simple. Pitchers are treated differently, as we not only look at season length, but the typical number of innings thrown by a top starting pitcher that year (defined by the average IP of the top five in IP). We find it hard to argue that pitchers throwing 300 or more innings a year are suffering some sort of discrimination in the standings due to having shortened seasons. This why Walter Johnson has almost no adjustment between WARP2 and WARP3, while his contemporaries Cobb, Speaker, and Collins all gain around 7 or 8 wins.
Wild pitches.
A correction made to raw runs when converting them to a standard league to preserve their win value. Define an average team from season games played, league runs per game (9 innings or 27 outs, depending on whether you are using pitcher or batter data), and appropriate adjustments (park, team hitting/pitching, difficulty). "Team" is the effect of replacing one player on the average team with the player we are analyzing. Calculate the pythagorean exponent from (average + team) / games as your RPG entry; calculate winning percentage using the modified pythagorean formula. Now, go backwards, solve for "team" runs, given the winning percentage, an average team that scores 4.5 per game, and a pythagorean exponent of 2.00.
Adjusted Innings Pitched; used for the PRAA and PRAR statistics. There are two separate adjustments: 1) Decisions. Innings are redistributed among the members of the team to favor those who took part in more decisions (wins, losses, and saves) than their innings alone would lead you to expect. The main incentive was to do a better job recognizing the value of closers than a simple runs above average approach would permit. XIPA for the team, after this adjustment, will equal team innings. First, adjust the wins and saves; let X = (team wins) / (team wins + saves). Multiply that by individual (wins + saves) to get an adjusted win total. Add losses. Multiply by team innings divided by team wins and losses. 2) Pitcher/fielder share. When I do the pitch/field breakdown for individuals, one of the stats that gets separated is innings. If an individual pitcher has more pitcher-specific innings than an average pitcher with the same total innings would have, than the difference is added to his XIPA. If a pitcher has fewer than average, the difference is subtracted. This creates a deliberate bias in favor of pitchers who are more independent of their fielders (the strikeout pitchers, basically), and against those who are highly dependent on their defenses (the Tommy John types).
|