CSS Button No Image Css3Menu.com
New! Search comments:
(NOTE: Relevance, Author, and Article are not applicable for comment searches)
Personally hoping to see David Green, Keith Mitchell, and Kevin Young.
On the question of whether umpires are unwilling to end an at bat on a called 3rd strike, that shouldn't be that hard to test. The prediction is that the same pitch will be less likely to be called a strike when there are already 2 strikes, versus when there are not. (Might want to leave out 3-2 counts, to avoid conflating with the similar-but-different effect of feeling it is unmanly to not swing at a close pitch on a full count.)
You've alredy described the methodology and used it: cluster pitches by absolute location and handedness of batter (and any other features of the PA that you think might be important, such as batter height). Compare called strike rates within each cluster with and without 2 strikes. If the hypothesis is true, you should see a significant decrease in called strike rate when there are 2 strikes, independent of batter handedness or pitch location cluster.
greg26 wrote: "Not only do we have overpaid players, but we have fewer players who stay with teams long enough for fans to develop strong attachments."
There isn't any more player mobility now than there was before the union. It's just that today the players have a lot of say in where (and whether) they move.
Take a look at the history of the Kansas City Athletics in the late '50s and early '60s. The Yankees basically used them as a farm club, letting them develop talent (Roger Maris, Ralph Terry, Ryne Duren...) and then buying that talent. That was the A's business model -- on the field they were a joke.
Marvin Miller is a direct cause of why that doesn't happen any more. The worst recent offenders (the Pohlad Twins, the fire-sale Marlins) are much more competitive and much more fun to watch than the bad teams of the pre-union era.
Yes, superstars leave their original teams as free agents. Then again, if you check out the all-time WAR leaders at BaseballReference, you'll see that pre-union Babe Ruth, Tris Speaker, Rogers Hornsby, Eddie Collins, Nap Lajoie, Jimmy Foxx, Cy Young, Pete Alexander, Lefty Grove, ... all of these guys were traded mid-career. Some of them more than once. Some of them for cash, which was not re-invested in the team.
I'm not crazy about player mobility myself, but the union didn't invent it. The union only made it benefit players, rather than owners.
Grr. Trying to say "the / tag" (let's see if I escaped that correctly).
"The following graph of the average net BABIP in each group by ground-ball rate is even clearer:"
I'm sure they would if there weren't a typo in the HTML for most of them. Looks like the "" tag is getting closed prematurely, right in the middle of the "...GB(1).jpg" part of the file name.
I think you're being a little too hard on 'average' as a benchmark.
The goal of baseball teams is to win. You don't win by being better than replacement level; you win by being better than your opponent. In the aggregate, that means being better than average. A team that is not better than average is not going to win. A team that has a player at a given position who is better than replacement level, but worse than average, is being _hurt_ by that player in their pursuit of winning it all.
Replacement level is an interesting and useful concept for modeling the effects of talent scarcity. But it doesn't follow that being above replacement level has positive value to teams that want to win.
To put it another way: replacement level as a baseline tells you what you are getting from a player that the worst team isn't getting. Average as a baseline tells you what you are getting that your real competition aren't getting. It seems pretty clear to me that the latter is a more relevant way to measure contribution toward winning.
I certainly agree that everything needs to be compared against how umpires really perform in practice. We're on the same page there. Can't wait to see your research.
I do still see a difference between "how much better than X is Y for our purposes?" and "how much closer to perfection is X than Y?". The ranking of X and Y is the same for both questions, but the importance of the absolute difference (or the perception of it) is not. People tend to rate the importance of a difference as a percent of the scale. If X falls 82% shy of perfection and Y falls 79% shy, that looks like a pretty small difference -- until you realize that you're talking about slugging average, and perfection is a 4.000 ...
(Direct reply not working, so bear with me here.)
"Perfection is not an attainable goal with any ball-strike detection system. We're going to fall short whether we use human umpires or technology."
To see how counter-productive this statement is, consider the following: "Perfection is not an attainable goal with any forecasting system. We're going to fall short whether we use statistical methods or astrology."
Perfectly true statements, but they add (at best) nothing to a discussion of the relative methods of statistical forecasting and astrology.
If there is an argument in favor of human umpires, it does not begin this way.
"The question is how close we can get to a perfect ball-strike detection system, at reasonable cost, without too much disruption to the game."
No. The question is how much better than human umpires can we get, at reasonable cost, without too much disruption to the game. We measure success by how much better off we are, not by how close to perfection we can get.
@Fantasyking (and reply): The question of strike zone adjustment to batter height and stance is usually raised as a barrier to automated ball and strike calls, but I never hear anyone mentioning the fact that there's very little evidence about how well (or poorly) human umpires do this. The fuzziness in the top boundary of the strike zone is well known, and I see no evidence (with the possible exception of Rickey Henderson) that umpires ever take into account a hitter's normal batting stance in their calls.
Finally, a question: I'm clearly not understanding why calibration is considered such a problem. Any three cameras can triangulate a position. Home plate doesn't move. How hard is it to sight on reference objects placed at the corners of the plate, at known heights, just prior to game time? That would correct for any changes in camera orientation since the last calibration. If the positional assessment is accurate to an inch or two, but the orientation is a concern, wouldn't that fix the problem? What am I missing here?
The fact that you could write this article without mentioning Ben Zobrist or Jason Bartlett at all, or Matt Garza as other than trade bait, may say more about the Rays' season than what you did talk about.
Two low-grade stars for the back half of the MVP vote, three competent starting position players, 3 well-exploited role players, and way too much replacement level from players who have been stars in the past. No, it was not reasonable to expect Pena or Bartlett or Zobrist to reproduce their fluke years -- but it was certainly reasonable to expect them to hold _half_ of their value. Which none of them did.
@ dianagram (since "Post Reply" isn't working for me):
"As an aside, I think A-Rod would have been a dead duck on that single to right. The throw was strong and true."
Yes, that throw would have had A-Rod by a mile. But that's not the question -- the question is what were the odds that the the throw and tag would be good enough? With 2 outs, you have to believe that the runner on third has less than (about) 1 chance in 3 of scoring on the play, if you're going to hold him at third.
Strong, accurate throws from the outfield are pretty rare these days. I'm not sure that sending him wasn't the correct call, even though (as it turns out) he'd have been dead meat.
It's true for any distribution with a single central peak near the mean/median. By definition, the quantiles are closer together where the density (or pdf) is highest, and farther apart where it is lower.
Incidentally, if you're talking about using a Poisson to model the number of runs allowed (or scored), it's better than a Gaussian but still not right. 0 occurs too frequently (IIRC) relative to a Poisson distribution.
Probability theory makes it very clear that the (true) width of the percentile bands MUST get wider as you move away from the mean. Any set of percentile forecasts that don't obey this are (forgive the term) nonsense.
"How much wider should the 70-80 band be, compared to the 60-70 band?" is an interesting and difficult analytical question. Whether or not it should be wider is not.
I play in an 8-team Diamond Mind league, AL-only, 84-game season. For the 2008 season, I had Jack Cust on my team, and he shattered all previous league TTO records, winning the TTO triple crown with 24 HR (tie), 126 K, and 67 BB+HBP in 314 PA.
Final TTO%: 217/314 = 69.1%
Final slash stats: a very Deeresque .222/.385/.547, for an ISO of .325 ... Oof.
"I'm just curious about what it might take for a skeptical sabermetrician to believe in an improbable performance over a fairly small sample."
Statistical Process Control (SPC) theory tells you when to conclude that the recent observed behavior of a time series is from a different generating distribution (e.g. a new "true" batting average) than the historical/expected distribution.
Bayes' Theorem (and associated inference methods) tell you what the most likely new distribution is, given the evidence. There are also hypothesis tests based on Bayes' Theorem that can be used in lieu of SPC if you aren't dealing with time series per se.
"If batting against same-side hitting is a skill, and your annual OBP or SLG vs. same-side hitting varies around a mean, and the same thing is true of opposite-side hitting, isn’t the variance between those two numbers going to be even greater?"
Yep. If OBP or SLG vs same-side is independent of OBP or SLG vs opposite-side, then the variance of the difference is the sum of the variances of the individual splits. If there is correlation between same-side and opposite-side (which there probably is), then you have to adjust for the covariance effect, but it's probably still significantly larger than either variance in isolation.
The correlation question is interesting. How much does how well a RH batter hits RHP tell you about how well he will hit LHP? How about for a LH batter? I'm betting the correlation is higher for RH batters, but that's just a guess...
Two words: Gene Mauch. 'Nuff said.
Oh, and thanks for the extra note on John Tudor. That season was something very special, and only Jim Palmer (tied) had as many shutouts in a season since Bob Gibson's legendary '68 in The Real Year of the Pitcher.
It was also (if I'm counting right) in the 20 best full-season ERAs since WW2, lost in the glare of Gooden's golden sophomore season.
The '77 version of Brett is also a candidate. And I could wish you had a spot somewhere for The Mad Hungarian, Al Hrabosky, who played for Herzog in KC and StL and epitomized his relief pitching.
Not candidates by your criteria, but the Fun Factor was certainly enhanced by players like Vince Coleman (not very good, but fun to watch) and Terry Pendleton (who finally broke out for the Braves after leaving St. Louis).
I remember these teams with extreme nostalgia -- I was a huge Cardinals fan from ~1980 to 1990 (when I moved to Pittsburgh and Did the Burgh Thing). As you say, this style of team is almost inconceivable today. It's worth noting that it usually didn't work -- these teams were awful in the years when the parts didn't quite come together. But in '77, '78, '82, '85, and '87 they were awesome.
Curse you, Don Denkinger.
No words for the Rays, with their blazing start and subsequent semi-fold? It's hard to think of them as "gainers" at the moment, relative to expectations. It all comes down to which part of the first half was the fluke.
Good stuff, Ken, but I too am curious about the "irrefutable evidence" standard. Why adopt it? I hate that standard in football, for a variety of reasons (some of which have to do with the interaction with "limited challenges available").
Once you've decided to review, why not let the guy with the best view (the review official) use his best judgment to make the call? This becomes especially important once the field umpires get used to the system, and start making the overturnable call, instead of the one they think really happened, any time it's close. If you impose an asymmetric standard of evidence, you end up biasing the eventual calls.
Example: ground ball sliced past the first base bag into the corner, with runners on base. If I'm the first base umpire, and I know review is possible, I call that ball fair EVERY TIME, and let the play finish. If the review guy eventually comes back and says "foul ball", there's probably no harm done; everyone goes back to where they were. But if I call it foul, the play is dead and we have a mess figuring out who would be where.
If you have an "irrefutable visual evidence" standard, you have now biased the eventual decision in favor of a call that the on-field official may not have thought was correct even at the time. That's a Bad Thing.
I suppose that, yes, I disagree with that. Griffey has a reputation for Mays-like skills and style, but he never had them. Really. Just compare:
Mays stole > 200 bases by age 30, leading the league 4 times.
Mays hit 94 triples by age 30.
Mays was one of the greatest defensive CF of all time, as best we can tell from the available stats.
Reggie stole 171 bases by age 30, with a high of 28.
Reggie hit 28 triples by age 30.
Reggie in his youth was a good-fielding RF with a great arm. He also played nearly 200 games in CF before his 30th birthday. I can't quote his fielding runs because BP seems to have removed the useful stats from the player cards, but I remember them as slightly positive in his youth, slightly negative overall.
The Kid stole 173 bases by age 30, with a high of 24.
The Kid hit 33 triples by age 30.
The Kid had an outstanding defensive reputation in CF, but the objective measures always placed him somewhere between average and awful. He made a lot of highlight reels, but didn't cover much ground. His fielding runs (same caveat as above) were about like Reggie's, even comparing only their youths.
So, which of these things is not like the others? Griffey and Jackson are almost identical in "style profile" when standing next to Mays, even if you only look at their youths.
Agreed. And, given the shape of the media today, it is becoming somewhat easier for consumers of news to stay source-loyal rather than brand-loyal, which is a good thing. Reputation should matter.
But when the byline doesn't have a name on it, or it's a name you don't know...
The problem with anonymous sources is that there's no way for a reader to distinguish between "authoritative insider", "clueless insider", and "fictitious insider". Or, indeed, "someone grinding an axe without fear of reprisal".
If all journalists were perfectly ethical, scrupulous, and infallible, this wouldn't be an issue. They aren't, and it is.
No, Will had it right. Griffey had Mays's reputation, but nothing like Mays's skills. Nor Frank Robinson's, while we're at it. Reggie is a very close comp.
Reggie wasn't a CF, but he was a decent RF for much of his career, and a much better player than is remembered today. (Robinson was a VASTLY better player than is remembered today.) And, of course, Griffey is probably the most overrated fielder of our time(now that Jeter has improved to reasonable levels).
There's no shame in being about as valuable as Reggie, but let's not confuse Griffey with the Inner Circle players like Mays.
"I got the chance to see Carlos Santana on Monday night. He's good."
It took me a minute to realize that this wasn't a musical aside. Would have been true that way, too...
That should be "Nolan Reimold" in the rumors and rumblings...
On A.J. Pollock: Is this the same thing as a Salter-Harris fracture? Do you know which type?
More importantly, which growth plate? Humerus, radius, ulna? Distal? Throwing arm or glove arm? (I'm guessing throwing arm and distal humerus, given your comment about moving to 1st...?)
Has the trend toward managers calling the game from the bench reverted to tradition? How many catchers are on their own to call a game?
Outstanding info, Will. Thanks again.
His regular restaurants (Jaleo, Oyamel, Cafe Atlantico, and Zaytinya in DC), as opposed to MiniBar and The Bazaar, are much less weird in the science experiment sense, but excellent food. Jaleo is the most traditional. Oyamel is Mexican, Cafe Atlantico is South American seafood, Zaytinya is middle-eastern.
Someday I'll manage to score a reservation at MiniBar. It's essentially a lottery at this point.
Have you tried Oyamel, where Andres invents a tapas version of traditional Mexican cooking? Fabulous, and not all that pricey.
Fascinating topic, guys. I've been worrying about this since I first read Lindsey's work from 1960, back in the 80s.
1. Clay nailed the fundamental problem -- the collapse of uncertainty when the game is over.
2. How you address the problem requires you to declare in advance whether or not game-level clutch hitting (as opposed to inning-level) is something you want to give credit for. (If you don't want to give credit for clutch hitting at all, stick to VORP/WAR/etc.)
3. I think Rob is on the wrong track, regarding player differences. You don't want a method that discounts Chase Utley's performance because he's Chase Utley and we expect more from him -- that leads down the rabbit hole.
If you want to discount his performance based on Victorino being on and Howard coming up next, that's better -- it starts to correct the problem that hitters on bad offensive teams have less total run leverage to work with. But the plain truth about value-added methods is that the only accurate correction for that is to switch back to context-neutral methods entirely. As above, it becomes a question of which contextual inequalities you wish to give personal credit for, and which you don't. That's philosophy, not math.
In the specific Sweeney/Ichiro example, Sweeney is penalized for batting in a nearly hopeless position. How do the numbers work out if you give him credit not for the delta win probability, for the factor by which he increased the win prob? What was the win prob when Sweeney came to the plate? I'm guessing that he increased it by, if not a factor of 9, at least something a lot closer to that...
D'oh, strike "at a given position" from that last paragraph. And spell "Wobegon" correctly while you're at it. I need to proofread more.
Actually, your first example is misleading. If the league hit .267, that's almost certainly not the average of the batting averages of the players in the league, and thus not a measure of central tendency at the *player* level.
That's part of why average production is more valuable than we would tend to think -- it's because professional sports is the anti-Wobegone, where most players are below-average. A disproportionate number of PA/BF go to the best players, with the rest competing for playing time.
This is important for the concept of "replacement level", because that's defined in terms of individual players -- roughly the productivity of the 60th-best catcher or 100th-best shortstop or 150th-best starting pitcher in the world at a given position. That's a quantile of the distribution of individual player performance levels, not of the league performance distribution 'averaged' over all PA/BF.
The key point here is to treat any replay reversal as a dead ball from the point of the reversed call, with umpire discretion to place runners subsequent to that.
As for Mountainhawk's idea that the NFL deals with this correctly, I couldn't disagree more. The asymmetry between which calls are reviewable and which are not is pernicious. The NFL has created a system where the officials are encouraged to make the reversable call (rather than the correct one), but the teams are penalized for challenging those calls.
Every call should be reviewable, and there should be no penalty for a successful challenge.
Good stuff, Matt, but I still have a minor concern about the estimated difference between re-signed players and new-team players. I went back to the original articles, and I don't think you've quite addressed the concerns about age bias.
Ideally, you would do your comparison within age cohorts, comparing players who got 3-year deals only if they were the same (or similar) age at the time of the deal. I'm guessing that the problem is that this would knock your sample size down into the "don't bother" range?
You might try 3-year age cohorts -- 29-31 at the time of new contract, 32-34, and 35+. I'd be curious to see how much of the durability advantage for re-signed players persists when you do that.
Fair enough -- I agree that the error bars here are wide enough that precision isn't necessary.
Your 1.08^-6 turns out to be .63, which corresponds to a 3 or 4 year lag followed by 6 equal-production seasons. Close enough.
I'm confused by your discounting, Matt. You can't calculate a discounted next-6-years WARP without specifying the year by year WARP breakout. If you divide by (1.08)^6 (that negative sign was a typo, right?) you are essentially assuming that all of the WARP accrues 7 years from now, with no WARP in any other season. That doesn't seem reasonable.
If you assume the WARP lays out as WARP/6 in each of the next 6 years, with 8% discount rate, you would get a discounted WARP that is ~ 2/3 of the original WARP. If you start pushing that profile into the future (since draft picks won't generally start contributing WARP until they get out of the minors a few years from now), the fraction will go down from there.
Of course he's been untouchable -- I own him in an AL-only Diamond Mind league. Sigh.
My point was that to talk about a hitter's 'strategy' you have to assume he's deciding courses of action before seeing the pitch, or at least looking for a particular pitch in a particular place. I'm not sure that's true for all hitters -- I think some of them really do react to the pitch.
In such a case, the most you could say is that the hitter's chance of successfully hitting a particular pitch might depend on the pitch sequence leading up to it. That makes it a Markov decision model for the pitcher -- but the state might be the entire pitch sequence, not just the most recent previous pitch. More likley, you could closely approximate the state space by looking only at (say) the last 3 pitches.
As you sort of implied when invoking Maddux and Williams, the equilibrium point is different for every batter/pitcher matchup. That's because different pitchers have a different ability to miss bats in the strike zone (which is different for each pitch they throw), and different hitters have a different ability to avoid being fooled (which is different for the different pitches they might face).
In theory, you could look at a composite effectiveness of each pitch a pitcher throws, and a composite outcome matrix for the batter against each type of pitch, and figure out the pitcher's randomized optimal strategy.
(I'm not convinced, though, that the hitter really has a "strategy" here -- hitters who guess don't last. It might be more like the hitter has a different success matrix depending on what the last few pitches have been like -- a Markov decision model for the pitcher, rather than a Nash game.)
"Has anybody here seen my old friend Corky?" just doesn't have the same staying power.
Mmmm, calvados. You can keep the fish eggs, though.
This is an interesting point. Baseball is entertainment.
A lot of people (it seems) would find it less entertaining if they were prevented from interpreting the timing of events as illustrating moral courage, fortitude, iron will, mental dominance, focus, choking, distraction, lack of commitment, flakiness, etc. If it were just strategy, tactics, odds, and luck, it would no longer be entertaining.
I think this may be where people are coming from when they accuse statheads of not actually watching games. They can't imagine that there would be any POINT in watching games if you really don't think that the outcomes reflect microcauses and the triumph of the superior will to win.
One reason that "hot hand" models (and clutch/choke models) are so intuitively plausible is that they may very well be true in ordinary life -- but not in Major League Baseball.
I remember reading some studies in the late 80's or early 90's about the influece of higher arousal (in the technical affect sense) on sports performance. They found that it tended to have a stronger influence on performance in activities where the athlete initiates action at will (e.g. golf, bowling, darts, tennis serve, penalty kicks) and less influence in reactive sitations (batting, return of serve, goalkeeping).
It might be worthwhile doing the converse study, and looking for signs of streakiness in pitching. If the arousal/affect theory is correct, pitching should be more prone to clutch effects than hitting, and might be more prone to hot hand effects as well.
It's probably worth pointing out that 'random', in modern technical discussion, just means 'unpredictable'. It doesn't say anything about *why* it's unpredictable, and it doesn't necessarily imply God playing dice with reality.
Laplace, who first codified the mathematical foundations of probability, was himself a Mechanist -- he believed that the universe was a giant machine running according to Newton's laws, and that there was no such thing as 'randomness' in the sense of "uncaused events". He thought of probability statements as assertions of lack of information -- the first "hidden variables" theory, if you will.
The key takeaway is that Russell has shown that any hot hand effect that might really exist is either so small as to be irrelevant or so infrequent as to be irrelevant, or both.
The phrase "replacement level adult" has justified my subscription renewal all by itself.
For Bob1475 who noted (correctly) the shift from emotions to reasoning, there are also emotional tests that can be given. The Minnesota Multiphasic Personality Inventory is the most (in)famous, but there are others.
Even something like a Myers-Briggs temperament categorization should be mandatory, if only so the coaches have an informed clue what kind of instruction/interaction is liable to be effective for a given player. I somehow doubt that a lot of teams give Myers-Briggs, or that most coaches would modify their approach based on its findings.
How much should we discount that final 25-and-under list for the fact that most of them are currently under the knife?
Since you asked...
H's position on the list is due almost entirely to the frequency of 'the' and various pronouns (he, she, this, that, who, which, what, where, they...) in English prose. Since those words are never the solution to the puzzle, you need to demote H.
C is less common than D and L, but it is disproportionately the initial letter of words. Since knowing the first letter makes the puzzle vastly easier to solve, C should be selected more often than its raw frequency would suggest.
While applauding the correction, I will make one recommendation for "a way to improve in [how we introduced the metric]":
Changing your claim from "we beat everyone" to "we're as good as xFIP, possibly very slightly better" deserves more than an Unfiltered (and thus unarchived) post under the heading "SIERA Update" (as opposed to "SIERA Retraction" or "Reduced claims for SIERA" or similar).
"Statistics, Operations Research, Mathematics, Computer Science, or a related quantitative field" covers a lot of territory. 10 years ago, I'd have jumped at that second one.
One of the things academia struggles with is that there are no incentives to publish inconclusive or negative findings, and strong incentives to hide them. Kudos for saying "we looked at this, and found nothing we can use". That opens the door for others to either look in a different way (and find something), or look more carefully and confirm that there's nothing there to see.
"80's pop masterpiece" is an oxymoron of McCarveresque proportions.
It's not that hard to check for direct correlation among your predictor variables in the model. What does the variance/covariance matrix of the independent variables look like? Many stats packages will provide that as an optional output. It doesn't spot variables that are linear combinations of more than one other variable, but it spots direct correlation of 2 independent variables.
A variable that really ought to be signficant, but isn't, is a possible warning sign of multicollinearity. There's some pretty good discussion and advice at
I understand the intuition, but it's not like your p-value was .15 or something; it's huge. The regression is yelling "even with this little data, I can tell this term is totally irrelevant". It's not impossible that this is just bad luck in the sample, but it's really unlikely.
The other possibility in cases like this is some kind of multicollinearity -- that there's another term that is sufficiently correlated with GB*BB that you can't interpret their coefficients independently. Did you check for that?
Somewhere, Ron Hunt and Craig Biggio are systematically shredding a copy of Strat-o-Matic between them...
Question: are you folding HBP into BB, or ignoring them? With some starters in the 15+ HBP per season category, it could make a clinical difference.
Eric, you raise a very important distinction when you mention skill versus approach. We really don't have any idea at all what the effect on ERA would be for a particular individual pitcher to nibble more (or less), given his skill set.
I don't see any way for this new metric to help with that question, but then I haven't really thought hard about it yet. If you've already seen it, please share.
Original cast SNL reference for the win...
Might just be a vocabulary shift, Will. Laurila asked "best player" -- whatcha wanna bet that, in Face's day, 'play' is what you did in the field, and 'hit' is what you did at the plate?
So, the real weirdness isn't Maz over Clemente, but Aaron over Maz.
I think Kenny (and John) have forgotten that the link between steroids and sprinting is MUCH better established than the link between steroids and hitting baseballs... Nothing would surprise me less than to learn that (say) Vince Coleman was a Ben Johnson Fan Club member.
I'm a little concerned by the recent trend toward using LOESS Regression at BP. LOESS wants nothing more than to overfit the noise in your data; it's best-suited to extremely large data sets for that reason. 34 starts by CC Sabathia is not an "extremely large data set".
If you use a classical test for serial correlation on Sabathia's FIGS data, what kind of p-value do you get? If you can't reject the hypothesis of "no serial correlation", at a high confidence level, then using LOESS (which is an explicit attempt to model that correlation) is inappropriate. LOESS is really just "fuzzy splines"; good at capturing the pattern of what happened, but bad at telling you whether it was an accident or a real effect.
Interesting; it will be fun to see how this plays out.
For presentation purposes, though, please choose a different word than "Chances" to describe opportunities to field a ball. "Chances" already has an official definition, and it ain't that. The opportunities for confusion are legion. If "opportunities" is too long, how about just "opps"?
Tversky-style question for Russell:
How different is the answer you get from those front-office types if, instead of talking about a robo-pitcher who always gives up 4 over 7, you instead talk about a rule change that would allow you (but not your opponent) to skip the defensive half of the first 7 innings, instead spotting your opponent 4 runs...
Would you be willing to reduce your roster size by 1, in order to have this option? Would you be willing to pay money, in addition to giving up a roster spot?
Clay probably remembers that we had this conversation back in the Dark Ages, on rec.sport.baseball, when the "Flake" stat was first invented. That came out of my pestering Michael Wolverton for some stats on the variability of support-neutral game performances. The consensus back then was exactly what Clay said above -- that in general you want your above-average starting pitchers to be consistent, and your below-average starting pitchers to be flaky. The farther below average the mean performance is, the more valuable flakiness becomes. You would much rather have Joe Twoface, who gives up exactly 6 over 7 half the time, and 2 over 7 the other half of the time.
(Of course, if you pull Joe Twoface after he's given up 3, the universe implodes.)
It seems to me that relievers aren't being shortchanged by the process; they're being shortchanged by the voters. When the current voters die off and are replaced by people who understand pitcher value metrics, then the Mariano Riveras of the world will start winning major awards, and the mediocre save vultures will receive the apathy they deserve.
But, as others have pointed out, it's rare for the most valuable pitcher in the league to be a reliever. I'd sooner see a special award for catchers than for relievers; catchers have a better excuse for not putting up monster numbers, and (unlike relievers) the rules require you to have one on the field all the time.
"McGwire says he cannot wait to officially get started, even if the steroids controversy continues, because he disingenuously claimed that using performance-enhancing drugs in 1998 had no effect on him setting what was then the major-league record of 70 home runs."
From the wording, I can't tell if 'disingenuously' was McGwire's word (or a paraphrase of it), or an editorial comment. Could you clarify which?
Yeah, it's totally easy to determine the effect of steroids on performance, if you just make crap up that matches your preconceptions.
If you really think there is a "normal career progression" that can be applied, through all the noise of changes in ballparks, baseballs, strike zones, training methods, etc. -- with tight enough error bars to distinguish Barry Bonds from Barry Larkin, much less steroid Barry from clean Barry -- you're delusional.
The scary part is that juries think like you, and attorneys know this.
Remember that BR's similarity scores are based on raw offensive numbers, not adjusted for era. It ain't worth as much when Beltre does it today as it was when Santo did it then.
And when will we have searchable/sortable JAWS on the website...?
Raines, Belle, and Rose are a perfect storm of outfielders -- all clearly above the standard, but all overlooked (deliberately or otherwise) for various reasons.
I'd be curious to know how deep the pool of near-miss outfielders is. Where does Ken Singleton (my choice for "player clearly better than Jim Rice in every aspect of the game, yet clearly not a Hall of Famer") stand in the JAWS rankings?
I seem to recall Will Carroll posting something last year that showed that, no, playing 1B doesn't reduce your chance of injury, and that moving to a new position significantly increases it, at least in the short term.
When I think of players getting injured in the field, I think of middle infielders (collision with other players), outfielders (collision with wall, separated shoulder or wrist injury on diving catch), catchers (duh), and first basemen (collisions with batter-runner, getting stepped on, tangled feet, turned ankles on the bag, etc.). Third base seems relatively safe, other than diving into the stands/dugout for foul balls (which 1B do also).
Well, the ones I can see are Fair Witnesses...
I think you're confusing the shape of the curve with the level of the curve. The application of the shape to a specific player does not assume that that player will still be around when he's 35.
It's perfectly plausible that on average (say) 35-yr-olds preserve 80% of their EqA from when they were 29, and that this is essentially independent of what that age-29 value was. If you're looking at a 30-yr-old Mickey Mantle, that gives you an idea of how good he will still be at 35. If you're looking at a 30-yr-old Kevin Young, it gives you an idea how long he will have been out of baseball when he's 35.
(I gave that example in terms of percentages, but it could just as easily be how much VORP a player loses per year after his peak, on an absolute scale.)
I don't agree with this criticism. The study was based on a sample of players who, in essence, had careers that were uninterrupted by external factors. Two of the possible external factors were "injury" and "not being good enough to keep playing". That eliminates the survivorship bias, at the risk of introducing a new bias if good, uninjured players age differently from less good or more injured players.
The former is unlikely to be an issue; these are all elite players on any absolute scale, and the differences among them are unlikely to affect the shape of the career arcs due solely to aging. Independent research confirms this.
I'm much more worried about the latter potential bias. I understand that the point of the study was to identify when players peak when aging is the only issue, but in real life it isn't the only issue, as the Wang example makes clear. Part of why "players peak at 27" may be because they get hurt, and are never quite as good afterward even if they aren't washed out. Players who missed significant time mid-career are eliminated from your sample -- unless I misunderstood?
I would even assert that, if you want to understand contract value, there's no point in distinguishing loss of value due to aging from loss of value due to non-career-ending injury. Unless, that is, you also have a model that is good at predicting differential injury-related risk for different players.
At any rate, an excellent writeup of some VERY interesting research. Now to dig into the technical details of your longitudinal regression methodology...
Er, Lincecum. Consider me to have spelling-flamed myself.
That's an interesting point about Blyleven and Greinke/Lincicum. I'd be fascinated to know who, among the voters who vote for both, voted for Greinke or Lincicum this past season, but didn't vote for Blyleven...
Don't forget Blyleven's nearly 5000 IP in those 22 years. He wasn't just very good; he was very good for an average of 225 innings a year, for 22 years! Compare him to someone like Jim Palmer who, great as he was, pitched 1000 fewer innings and averaged 20 fewer IP per year in a shorter career.
It's one thing to say that the judges in the Salem witch trials (or the Inquisition) didn't know any better, and that we can therefore somewhat understand why they reached the conclusions they did.
It's another thing entirely to continue "...and therefore, the accused really were witches", which is what you seem to be saying about Rice. Just because people at the time thought he was really really good doesn't mean he was, or that he deserves to be in the Hall of Fame.
You are confusing "the player as he was viewed by his contemporaries" and "the player in the context of his era" -- two totally different things.
Mr. Van Winkle, it's time for your 11:00...
"Should the Yanks be legislated to forfeit any advantage gained from playing in the largest/wealthiest market on the planet?"
You act like having monopoly rights to half of metro NYC (and 3/4 of NYC baseball history) is somehow an accident. It isn't; it's a conscious decision by the owners of MLB to greatly restrict competition for the NYC market. The Yankees have the market they have because MLB replaced two storied franchises with one expansion franchise. If they had chosen to replace the departed Dodgers and Giants with the Mets, the Athletics, and the Braves, we would all now be discussing the Red Sox enormous revenue sharing payments.
Ideally, that's what revenue sharing should be: compensation for unequal market monopoly rights.
1. You clearly have never seen an actual rant
2. The disdain is not for Rice, but for people who think Rice belongs in the Hall of Fame, even after seeing all of the evidence
3. Given that there are still a fair number of people out there who are not yet resigned to the DH, your standards for 'obsessional' seem too lax to me
4. The election of Rice is, by far, the craziest thing the BBWAA electors have done in decades. It's big news, and ongoing news in that the same people are still voting.
"A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it." Max Planck (1949)
50 years from now, Jim Rice's plaque in Cooperstown will be a fossil in the history of science.
I have to disagree somewhat. These End of Science proclamations get made periodically, and they're always wrong. As one level of knowledge matures, new frontiers open.
An obvious one, already mentioned several times in this comment thread, is the use of PitchF/X data to revolutionize our understanding of the dynamics of pitcher/batter interactions (and umpiring). When is it a good idea to challenge a hitter, and when to nibble? When should you just flip a coin? How much difference does it make? These questions were always out there, but now we might actually be able to start to answer them. BP could be in the forefront of that, if they concentrate on it.
Business strategy is another. Being able to forecast production/value isn't the end of the question for a GM; it's a necessary prerequisite to being able to formulate a rational business strategy. How to do that goes way beyond MORP, or "where in the rebuilding cycle am I?". Doug Pappas (rest in peace) is not with us any more, but this is another area where analytical rigor would be useful. I'm delighted to see that BP now has a qualified psychologist on the team, since this sort of Decision Analysis depends as much on psychology as it does on optimization.
No, there's no end of fruit still to be picked, even if it isn't quite as low-hanging as the fruit that fed Bill James. The question is, can BP both pick the fruit, and tell the tale of it in an entertaining way? Reading any random Gary Huckabay article from days gone by makes it all too obvious how far we've come from the halcyon irreverent days of BP past.
Eric, we won't know how it fares vs. other metrics until we see how it fares in the future. Backcasting, it's very hard to tell a better predictor from an overfit. (For more than you ever wanted to hear on this subject, see the recent vitriolic mess concerning predicting runs from offensive components at insidethebook.com ...)
Just to follow up on this: I think a lot of the negative reaction you're getting to the new stats is a perceived hubris. You've been behind the times in various analytical sectors (defense, pitching). You also haven't been participating in the public debate about how best to measure these things. Now, suddenly, you announce that you have not only caught up, but you will be publishing new metrics that are better than the "open source" ones already out there.
That would be great, but you can understand a certain skepticism on the part of the audience at this point.
If I want pithy, I'll read wire service releases. One or two sentences per paragraph, no thought required.
Usually, though, I'd rather read Faulkner than Hemingway.
Jay, the 'peak' standard for 1B is different between the table at the top of the article and the section later on. Is that a typo due to copying a pre-revision table, or is there something subtle going on that I'm missing?
I love technobabble in Slavonic! Hospodin pomiluj...
I have to say, though, that the people noting that explaining 1% of observed variation as a random effect pretty much makes any conclusions drawn the science equivalent of tea leaves and goat entrails.
I remember when Storm Davis came up with the Orioles. 37-year-old former Cy Young winner Jim Palmer was still with the club, and his nickname was "Cy Old". Davis looked so much like a (mirror-image) young Jim Palmer, in features and delivery, that the club immediately christened him "Cy Clone". Great nickname, but not exactly prophetic.
I'd kill for a 6-pack of Squirt. Is it still available where you live?
There's nothing irrational about rejecting an unequal offer in The Ultimatum Game, because it's not the only social interaction you or your partner will ever have. Everyone benefits in the long run if unfairness is punished. (Which is why we have evolved a separate area of our brains that processes questions of 'fairness' and 'cheating', distinct from where we do the cool logical calculation.)
Pretending that the one-time monetary outcomes of the game are the only outcomes is a mistake that probably not even Kenneth Arrow would make.
This may not be what Joe was thinking, but I assumed he was looking at the distribution of batting outcomes. If you have two great baserunners with .390 OBPs, you want the one who walks all the time to bat leadoff, and the one who hits a ton of singles and doubles to bat second, so that those hits can lead to a lot of first-and-third and scored-from-first.
A long time ago, I did some lineup analysis that suggested that your #3 slot should prefer hits to walks the most (other things being equal). The #3 hitter leads off the fewest innings, and bats with runners on (and with RISP and 2 outs) more than nearly all other slots. Ichiro would be just fine as a #3 hitter, too, if you could find two high OBP guys to bat ahead of him.
Just out of curiosity, is a "trigger mechanism" in a swing always considered a bad thing? There have certainly been some extremely successful major league hitters who used them -- Al Kaline and Harold Baines spring to mind, but there are many more.
What happened to Billy Butler?
You missed my point. Joe was saying that not signing Abreu was shown to be a good decision *because* Abreu signed for so little in the end. I don't see the logic there.
He's not sprite? Is he Dr. Pepper?
(I think you meant 'spry'...)
Thanks for the links, Brian.
How was it to the Yankee's benefit that Bobby Abreu signed with the Angels for $5 million? I somehow can't see them patting themselves on the back for not paying more than his current market value, given that they had to do without his services in order to get that outcome.
If they simply didn't want him at any price, that's different. But if he would have been a better option than what you end up doing instead, and you have deep pockets, it's surely better to pay too much than to see your competition get a bargain.
I did some work a few years back using linear programming to identify, for a fixed OPS, what batting line would lead to the most/least RBI for a player who ended up batting in a league-average mix of base-out situations, and got league-average runner advancement on his singles and doubles. I don't have those results with me, but there was a surprising amount of 'play' in RBI rate for a fixed OPS, even if you limited yourself to batting lines that might occur in MLB.
This is the key point. Modern pitchers don't pitch as many innings, but it's not clear that they pitch fewer pitches per start than the Gibsons and Loliches did. Nibbling has evolved as a survival mechanism, in these days when shortstops routinely go deep and #8 hitters slug .450.
"Atahualpa Severino" is my new favorite player name. I really hope he makes it.
Even some actual evidence linking steroid use with "putting up huge power numbers" would be a start.
DG aren't really related to actual games played; they are determined by number of opportunities. I'm guessing that one DG is an average number of (weighted) opportunities for a game at that position league-wide, but I don't know that. There's nothing intrinsically impossible about Boston LF getting 50% more DG per actual game than Houston LF.
Bottom line is, they didn't guess -- they looked at every batted ball, and counted. I suppose a typo is always possible, but it seems unlikely.
Unplayable balls should not count against fielders, in Fenway or anywhere else. I would be stunned if the UZR calculation somehow fails to remember this, but I don't have a direct quote I can point to that makes it clear.
I can't explain why the Astros *should* see far fewer balls hit to LF (again, weighted -- difficult balls count less than easy ones). But it's a matter of record; as Yogi said, "You could look it up."
I'm not trying to be snarky here; partly I don't know exactly how they weight the zones for various parks, and partly I don't have the raw data -- but I know The Truth Is Out There. If they're doing something dumb in accounting for balls off the Green Monster, that should be easy to check. Given how much scrutiny and criticism that work generates at FanGraphs, it seems extremely unlikely that they've missed something so obvious.
There are links there to a primer on UZR, its history and development, and blog discussions of its strengths and weaknesses.
I'd like to think that nobody would cite VORP without knowing what goes into it, but that's just not true. (And even less true if the purpose is to dismiss it...)
The key is context. You cite the number of innings played by Bay -- that's an attempt to establish the context of his performance, so you can compare it to others. The problem is that not all innings are created equal, when it comes to fielding. A guy who plays behind a staff of left-handed ground ball pitchers will see a wildly different mix of batted balls in play than a guy who plays behind a right-handed fly ball staff.
For UZR, the context they use is "Defensive Games" (DG):
The number of outs made by an average fielder at his position given the exact distribution of balls in play for that player divided by the number of outs an average player at that position makes per game.
The key phrase there is "exact distributon of balls in play for that player". They look at where every ball hit against the Red Sox went, broken into fairly small zones, and at what fraction of balls in such a zone an average player would turn into outs. That's Bay's context of opportunity -- his "defensive plate appearances", if you like.
That's where the huge difference shows up. Bay had 174 DG, while Carlos Lee only had 115, in almost as many innings standing out there. That's a huge, huge difference in opportunities to accrue chances, and Lee did much better with the rare balls hit to left against the Astros than Bay did with the zillion balls hit to left against the Red Sox.
Part of what makes playing the outfield in Denver so hard is that balls simply fall faster. At lower altitudes, the backspin on fly balls gives them more lift, giving fielders more time to get under them.
That said, it's just not reasonable to think you can tell how well a typical fielder is doing by watching him -- any more than you can tell how well a typical hitter is doing just by watching him without writing any numbers down. You can tell how sure-handed he seems to be, and whether his footwork is ugly, and whether he throws like a girl -- but that's not the same as "how many plays is he making, relative to what the rest of the league is doing?".
Scouts, who care a lot more about sure hands and footwork and throwing arm than they do about current level of performance, can do their jobs on pure observation. And, they do the same thing on offense -- how long is the swing, how good is the judgment, how solid is the contact, how many more inches of height should we expect, will the kid get fat, how much uppercut in the swing, how skinny are the wrists, etc. None of that is about performance, really, and that's fine -- but you'd get upset if someone based an MVP ballot on those things, ignoring outcomes.
When looking at defense (just like offense), I think it's important to make a clear distinction between _ability_ and _performance_. None of our advance offensive metrics measure ability; they all measure performance -- what did you actually _do_? Similarly, UZR and FRAA and such all measure outcomes, not ability. None of them can say "you suck"; some of them can say "your performance sucked last year".
This is an important first step in getting past the mental barrier that says there's too much luck in defensive performance. Maybe this guy's pitchers gave up harder-hit balls; maybe he got a few bad hops; maybe his first baseman didn't pick low throws like he should; maybe the weather was bad more often. All of this is true -- but the best reply is to point out that it is true on offense, too, and ask whether the best offensive metrics are therefore suspect because they don't account for some hitters seeing better pitches than others, playing in different weather, getting different umpires, etc.
"There are different ways to measure things, but I submit that aside from tthe team statistic of wins, more than half will favor Lincecum, close to half will favor Carpenter and not many will favor Wainwright."
So? Not all metrics are equally relevant. If we throw in shoe size, IQ, number of wisdom teeth, and hair length, do they favor Carpenter or Wainwright? Who cares? The point of advanced metrics is to get us beyond having to decide by eyeball whether more WHIP here is more important than fewer innings there.
"When one looks at all the metrics, it is apparent that aside from won/loss stats, Lincecum leads, with Carpenter not too far off in second, and Wainwright a distant third."
As I already pointed out to you above, Wainwright was the SNLVAR leader. Until you address why that isn't a good argument for saying Wainwright deserves the Cy Young, you're doing the logical equivalent of putting your fingers in your ears and saying "La la la I can't hear you."
WPA is WXRL for batters; the idea dates back to Gary Skoog and Mark Pankin in the late '80s.
The problem is that a decent hitter who plays for the 2009 Phillies will have a much higher WPA than an identical hitter who plays for the 2009 Royals. The question is, do you give that hitter credit for the difference when voting for MVP? I can see the argument for saying 'yes' -- he did what he did with the opportunities he was given -- but I can clearly see the argument for saying 'no', if some other player would almost certainly have done even more with those same opportunities. After all, it was the teammates (not this player) who created those extra runs.
MVP voting is the only context in which I think WPA can be a useful metric. I'd probably use it as a tie-breaker, myself, but whether to make it primary or secondary is a question of philosophy.
Think about trying to evaluate hitters this way -- watching the film of their swings, seeing who having good swings and who is having weak ones, who swings and misses more, who makes solid contact more. Talk with pitchers about that. Now, who had the more valuable year at the plate, Derek Jeter or Ben Zobrist? If you think you can even make an informed guess without actually looking at the outcomes, you're dreaming.
I'm willing to believe your player source can probably tell who was making better pitches, based on film and conversations. There's no way he has a clue who was getting better results (much less defense-adjusted results), not without looking at the numbers. And there's absolutely no way he can account for the different number of starts and innings.
"A vote for Lincecum was great IMO. A vote for Carpenter wsa good. A vote for Wainwright came either from misinformation or an over-reliance on wins."
Wainwright was the SNLVAR leader. You may choose to ignore that, or give it less weight than other measures, but it's simply not true that the only case for Wainwright is a stupid one based on W's.
"Do we really know anything or just making the best guesses off of imperfect models?"
Welcome to science; that's all it ever is. There isn't any absolute knowledge that isn't trivial.
And yes, using the best-available (i.e. least-falsified, most consistent with the data, most predictive so far) models is qualitatively different from the way the mainstream evaluate things. The biggest difference is that the models don't care who was the most heartwarming story, who's a total jackass, or how excited the fans got at any particular point in time.
In Cy Young voting, it seems reasonable to say that the guy who pitched 28 great starts should not get any credit for the team's chance of winning in the 6 starts he didn't make. That makes it different from a calculation versus replacement level.
Basically, Carpenter increased his team's chances of winning by quite a bit in the 28 games he started. Wainwright increased his team's chances, by not quite as much per game, in 34 games. Given how close they were in performace, that means Wainwright added more expected wins -- which is pretty much what the SNLVAR column is looking at.
I'm dying to see the analysis of how Haren could be so far ahead in WARP (and PRAR), but trail in SNLVAR. That seems to be the crux of the case for Haren.
As you guessed, I misread that -- and it does make a difference. Thanks for the correction.
To counter-weight the previous criticism, Will, kudos for actually watching all of the performances that you were supposed to be judging. I'm guessing you were the only voter to do that. We may disagree on what is best evaluated by observation and what by data analysis, but I have only respect for the way you take the job seriously.
No, it wouldn't. It would be the same as voting for a reliever based on WXRL, or a hitter on BA/OBP/SLG, or a fielder based on UZR -- all purely outcome-based, with no regard for how repeatable that might be in the future.
The problem with W and L for pitchers is not that they are outcome-based, but that more than half of what they measure isn't related to the pitcher you're trying to evaluate. (Specifically, run support and bullpen support. Defense too, but less so.)
If you're going to down-vote a hitter for a .366 BABIP, you have to also be willing to down-vote a pitcher with a .200 BABIP because you don't think the fact that nobody got a hit off him is repeatable.
I think it's safe to say that both Pat Listach and Angel Berroa got a lot more second (and third and fourth) chances than they would have had without the hardware on the mantel.
No quibbles with your ballot, Joe, but I think you might be selling Zobrist a bit short. FanGraphs has him as the top WAP in the majors this year, half a win ahead of Pujols. That's because he was 20 runs above average (UZR) as a part-time outfielder, in addition to being 22 runs above average as a second baseman. Overall, Gutierrez beat him out as most valuable defender in baseball, but Zobrist was the only player even close. Throw in the "opportunity value" of his ability to play everywhere and make either half of a platoon while doing it, and it was an outrageous year.
Will Zobrist ever come close to a season like this again? Almost certainly not; but there's a lot of evidence that he did actually do it this past year.
So, HR are up in the AL because the richer players there can afford designer juice, but down in the NL because of steroid testing, right?
(I just finished reading "Fooled by Randomness", which probably needs to be on the BP recommended reading list. The book drips with contempt for people who make a living offering explanations for random noise...)
Exactly -- and here is where we might have a chance to finally do some _objective_ commentary on who "makes the most of his talent", as opposed to "having natural tools". Historically, that's been a lazy way of making moral (and racist) judgments about players, but this would really help separate identification/recognition skills from twitch skills.
Thanks for the clarification, Will.
I assume that the Royals have to come out near the bottom of your rankings, even though they're in the middle of the days/dollars chart. Where do their particular kind of failings fit into your evaluation process?
I believe that option is only available for BP Staff. For example, if I go to
I can pick any single year, but there is no "All" option.
We've been lobbying for cross-year reports -- both totals and multi-year seasonal reports -- for a while, but to no avail.
With a dunce cap? How about a simple choker (moonstones?) and those horrible feather earrings that (I hope) went away in the '80s. Owl feathers, preferably.
Or, there's always the Martha Stewart collection...
Careful; this is only true if the correlation between velocity and outcome, at the individual level, is always positive. If some player has an *ability* to get bloop hits or infield singles, then it simply is not true (for that player) that harder = better. This isn't ski jumping; there are no style points.
For most players, yes, velocity off the bat will be a good surrogate for luck-independent batting success. But not, I think, for all of them -- and Luis Castillo (like Ichiro) is a likely candidate for one of the exceptions.
D'oh, that'll teach me to skim when I think I know what you're going to say. Thanks for the polite response.
Speaking as someone who usually doesn't advocate outcome-based measures, in this case I think you really could just look at whether the non-foul contact was a hit or not. That's what Luis Castillo is trying to do when he swings -- not hit the ball hard, not hit a HR, but simply get a hit. If his approach produces a bunch of bloop doubles, that's *success*, where a speed-off-the-bat measure would call it failure.
Of course, for outcome-based measures you need a large enough sample to iron out the luck. But an analysis using outcomes (available data) according to in/out of zone would already be much better info than we have, and not obviously worse that what you were hoping to do.
At least for Luis Castillo.
Thank you so much for the splits tables. I can't help asking for one small addition, though...
For batter splits by pitcher handedness, could you show batter handedness as well? Diamond Mind and Strat-o-Matic players will want to know whether that great platoon split is normal or reverse.
(The holy grail would be PMLV or VORP splits, but I imagine that's vastly more complicated to calculate...)
Mandatory reading for those folks who *still* think that you can't draw a lot of walks unless you're a credible power threat.
This argument only works if you think that being in the All-Star game is an award, in the same sense that a Gold Glove or Cy Young is. I'd say it isn't -- the rules about having to have a player from each team are enough to make that pretty clear.
The purpose of giving awards is to recognize achievement. If you make those awards based on reputation, you're playing the "famous for being famous" tail-chasing game.
I was trying to get at the difference between better outcomes and better inputs. A batter who hits nothing but line drives, but all of them at people, has performed 'better' at the plate than someone who hits nothing but broken bat doubles -- but not better in the box score, or on the scoreboard.
We don't tend to cut batters slack for that kind of bad luck, especially not in MVP voting. For pitchers, luck is an even bigger factor, but we still tend to look at outcomes when it comes to MVP voting. In the debate currently running about whether Cole Hamels actually pitched 'better' last year than this year, or whether the difference in outcomes was entirely crazy luck (in both directions) on balls in play. Either way, nobody would argue that he should get the same Cy Young consideration this year as last year.
So, I was saying that it's quite possible that Anderson threw better pitches in better locations at better times, but that Porcello's *results* were better -- possibly just due to dumb luck.
It's more likely that the difference was Porcello's RA+ of 114, compared to Anderson's 100. Rookie of the Year (like MVP) isn't traditionally about how good you are likely to be in the future, but rather about how good your outcomes were. People don't take line drive rates into account for MVP votes; they don't consider xFIP over RA+ for the same reasons.
Would I rather have Anderson next year? Probably. Did he pitch 'better' than Porcello this year? Maybe. Did Porcello have a better rookie season? Yes.
Among AL players with at least 150 PA, A-Rod and Teixeira were 3rd and 4th in EQA, behind only Mauer and Zobrist. Depending on what you think of their defensive contributions (generally thought to be above average, especially for Teixeira) and the value of high rate over many games played (which would favor A-Rod), and where pitchers should figure in MVP voting (some people don't like to vote for pitchers), putting either in the top 10 is totally unsurprising.
Personally, I don't understand people who don't vote for pitchers -- but there are lots of them, and the ones who don't trust defensive stats could reasonably vote high for A-Rod or Teixeira or both, regardless of who they root for.
As with championships, the goal here is not to identify the best, but to name a winner. A 1 vote difference suffices for that.
...but you raise an interesting point. What's the margin of error in the electronic submission and tabulation of votes? Sample size isn't the key -- you're confusing polling with voting -- but measurement error could certainly be an issue. As noted above, in theory one lost or misrecorded vote could change the outcome here. What is the probability that Hansen actually received more vote points than Happ?
Salary caps are bad for everyone except owners.
There's nothing wrong with the Yankees' situation that two more franchises in the NY metro area wouldn't fix.
Aw, no "Tempest in a Teahen" line?
I think you're mistaking mean for variance. Joe's not saying Burnett is bad; he's saying he's highly variable. That's not necessarily a bad thing -- but the point is valid that looking for proximal 'causes' of Burnett's bad outing is a fool's errand. The flakiness is the cause, not the other way around.
Thanks, Mike -- a very useful tip.
Both of us.
Unfortunately, link-masking sites... mask links. I won't follow a link at work if I can't see where it goes in advance. Is it that hard to make a web page wrap long words properly?
"We know, empirically, that Ryan Howard didn't catch that ball."
Actually, I don't think I can say that. I watched all of the replays, and I did not see any that offered what the NFL would call "incontrovertible evidence" that the ball bounced on the ground, rather than bouncing off the webbing of the glove. In real time, it looked like a catch. On first review, the bounce became obvious -- but I never did get a view that clearly showed ball hitting dirt.
My point is not that the call was right or wrong, but that even having instant replay might not have reversed that play, depending on the exact standard involved. As we've seen in football, even instant replay doesn't always get the call right. You almost need a de novo review by an official who hasn't been watching the game and doesn't know what the original call on the field was.
Oh, and while we're at it:
aStr=|10/29/2009|Philadelphia Phillies @ New York Yankees
Again, the Yankees got more called strikes off the outside corner, and fewer balls on pitches inside the zone. I don't think that's a bias, except perhaps in the sense that the Yankee pitchers have a style that is better suited to fool umpires. They simply throw more pitches in that zone that the umps consistently get wrong.
Fair enough. My subjective impression, watching the game, had been that Saunders got frustrated by not getting some calls early, while his opponent was getting very generous calls, and that this messed up his approach for the rest of the game. It's hard to see grounds for that in the data, though, as you note.
I'm not sure which graph you're looking at. If I look at the all-game chart (so not just Saunders and Pettitte) for that game, I see:
8 NYA called strikes outside the zone, all on the side away from a RHB.
3 LAA called strikes outside the zone, 2 of them very close. (And one called ball, in exactly the same spot as 2 called NYA strikes.)
4 NYA called balls inside the zone (2 up, 2 down)
5 LAA called balls inside the zone, 1 of them not even close to the edge.
5-to-2 errors against counts as getting squeezed for me, though again this is not just Saunders. 8 gift strikes outside the outside corner is Glavinesque. And the five worst calls, in terms of distance of error, all went against Anaheim.
Should Saunders have thrown more pitches in the area where Pettitte was getting his bogus calls? If that's not his game, why should the umpire take away his strengths?
"It didn't hurt that Pettitte pulled out a start from his dynasty days, pounding the strike zone and spending the whole night ahead of the Angels hitters. [...]
Contrast Pettitte's work with that of Joe Saunders, a similar pitcher by type who on this night threw 83 pitches, 42 of them out of the strike zone. Saunders walked five and got strike three on exactly no one, the latter turning into a real problem for him during the game. "
Exactly. Go back to the Pitch/FX on this one; the difference wasn't in what they were throwing, but in what those throws were getting called -- especially in the first three innings. I'd love to see the alternate history in which those pitches were called based on where they were, rather than on where the ump thought they were. For at least this one game, the bad calls were NOT evening out -- Pettitte was getting the Tom Glavine Memorial Strike Zone, and Saunders was getting squeezed like a python's next meal.
That's not to say that the Yanks might not have won anyway; it's just noting that the playing field for this game was badly tilted, and that blaming Saunders for his "inability to throw strikes" is misplacing the blame.
Parity doesn't mean that everyone gets a turn as a winner. Parity means that it is not terribly hard to build a winning team without spending more money than your opponents. MLB passes that test; the Florida Marlins are a proof all by themselves.
Confirmation bias? We've had PitchF/X data for several years now, and the confirmaton of how often umpires get it wrong -- and how those errors are not uniformly scattered -- is clear. These playoffs simply focus extra attention on what happens in every game.
You say "Camera angles are notoriously unreliable", as if they were somehow intrinsically less reliable than the weird (single) angle umpires get. You say "PitchFX machinery has margins of error", as if human umpires don't. You're complaining that automobiles only go 20 mph when all you currently have is a broken-down old nag.
As for "MLB could and should judge umpires on how well they call balls and strikes over periods of months or years", I couldn't agree more. And the answer is "they ALL fail, compared to the machine". I can't imagine what you think would be a "radical change to the way the game works", when the only difference would be that the ump announces what the machine said, rather than his own guess.
I don't typically defend McCarver, who has become a fingernail on the blackboard of broadcasting, but I have to say that he was a much better (and more enjoyable) broadcaster in the dim past, working Mets games. Even then, the difference was clear between his enjoyable local broadcast approach and his national broadcast approach. How can you be pompous and ingratiating at the same time? He somehow manages it...
Actually, the most common phantom DP comes when the pivot man crosses the bag (and leaves it) before he receives the ball. This is essentially always called an out, despite the obvious advantage it gives the defense. (The pivot man doesn't have to worry about foot and ball simultaneously, he can get well out of the way of the oncoming runner, and he's closer to the player throwing him the ball, allowing a faster pivot.)
The sad part is that if Aybar had touched second base when he first got there, then received and fielded the throw exactly as he did, the out would have been called.
You're conflating 2 separate issues:
1. What rules should the umpires enforce?
2. What rules should the players expect the umpires to enforce?
In an ideal world, the answer to both questions is "the rules in the rulebook". This isn't that world, and MLB clearly has no intention of trying to make it that world. Given that the umpires are allowed to have their own rulebook, the best we can hope for is that they (a) apply it consistently, and (b) make it known to the players so that they can act accordingly. The egregious problems come when either the book rule is selectively enforced (making it a tool for bias), or the de facto rule is sufficiently obscure or inconsistent that players can't know what they're supposed to do. That's the current state of the strike zone, and it sure looked last night like Erick Aybar hadn't gotten the memo about which categories of phantom double play are acceptable and which are not.
You are Carl Pohlad, and I Claim My Five Pounds.
In other words, Aybar misread the unwritten rule, and was penalized for it? That seems to reinforce the idea that it's most important that the players know what rules are going to be enforced. It's not intuitively obvious, even to someone who's been watching MLB for a long time, that the rule is "you have to make a sham attempt".
Other sites (e.g. mlbtraderumors.com) are reporting that Mateo's people deny the degenerative eye condition. (Man, that's an ugly sentence.) I'm sure a lot of us would be interested to hear anything you hear about what's really going on there.
Best quote on the subject: Ryan Witt of examiner.com, who says that Mateo is just the latest victim of the "pre-existing condition" clause...
So in one article we get Ted Simmons thinking that Larry Bowa and Ozzie Smith were similar defensively, and strikeouts are especially bad because they are "wasted at-bats". Still clearly a long way to go...
50 points of OBP is not "a little better". It's a bigger difference than 50 points of batting average.
True enough; I forgot about him.
As far as I can tell, the only real happy note for the Tribe this year was Shin-Soo Choo establishing that his great half-season in 2008 wasn't a fluke. He ranks a shocking 6th in the AL (21st in MLB) in WARP.
How many more years of Cheap Choo (tm) do the Indians get? If it's less than 3, they probably need to trade him immediately to a team that appreciates OBP and defense and has some high-risk high-return arms they could swap.
Speaking as a former Pittsburgher and Pirate fan, I have to say I have no patience with fans who were loyal through the first 15 bad years, and now choose to say they can't take it any more. Any idiot could see that the first 15 years were a series of self-inflicted wounds, like a one-man-show version of Misery. Now that the team is finally doing something that might, some day, end the pain, the fans should be behind them completely.
At least for a couple of years.
Agreed here. As best I can remember, the Orioles did everything right this year, accepting the 'disappointing' record in order to build a future.
The last step, of course, will be to jettison Trembley as soon as the focus shifts from primarily player development to actual winning. He seems to be good at the former, but disastrous at the latter.
For a good laugh, look at Mauer's "Most comparable players" list on his PECOTA card. With a similarity score of 6, he is admittedly unique, but that doesn't detract from the savor of alleged comps like Ken Oberkfell and Ben Grieve.
Baseball-reference.com produces "Shanty Hogan", "Robinson Cano", and "Jason Kendall" as the most comparable through age 25. Hogan was at least a catcher, and his 1928 was pretty good. That's about all you can say. Kendall, of course, is a cautionary tale.
Morneau is 13th out of the 60 MLB first basemen in VORP, essentially tied with Kendry "MVP Candidate!?" Morales. Calling that 'average' isn't even hyperbole for rhetorical effect; it's just wrong.
As you note, though, defense counts too. The WARP leaderboards don't filter by position, so it's more work to see how the BP measures rank them, but it looks like Morneau is currently 11th in WARP1, at 4.3. He has enough of a lead over his closest competition (Joey Votto, Paul Konerko) that this isn't likely to change by the end of the season. There's also a knee in the curve here -- four places ahead of Morneau is only a 4.6 WARP1 (Morales), but 4 places behind him drops you to 3.0 (Carlos Pena).
Is it reasonable to call the 11th-most-valuable first baseman in a 30 team league 'average'? Probably not, but it's not as outrageous as it sounded at first. On the other hand, he's clearly an above-average AL 1B this season -- only Teixeira, Youkilis, Cabrera, and Kendry Morales have clearly been better, and he's well ahead of the rest of the pack. If you suspect that WARP1 doesn't do a perfect job of normalizing across the leagues, that might help Morneau's case.
(FWIW, WARP1 has Prince stomping Morneau in WARP, 5.8 to 4.3, mostly because FRAA2 only has Fielder at -12. But of the other 1Bs ahead of Morneau on the WARP list, Ryan Howard is the only other one who might drop a bit relative to Morneau if you use UZR instead of FRAA.)
Except the Red Sox problem was not that they didn't anoint a closer; it's that they didn't have any good relief pitchers. Naming one of them 'closer' wouldn't have fixed that; it would have made it worse.
"Also at San Jose, 2008 third-round pick Roger Kieschnick hit .296/.345/.532 "
Brooks's son? There can't be that many Kieschnicks (Kieschnicki?) around...
I think we agree here.
Ellsbury has a career SLG of .412 -- in Fenway. His translated SLG is .407, and the last superstar with a .407 translated SLG was Richie Ashburn. The result of Ellsbury stepping up his OBP and his defense another notch would be the second coming of Dwayne Murphy -- which would delight Gary Huckabay, but it isn't any sort of superstar.
Scouts don't care about UZR, and when they say "plate discipline" they don't mean BB%, but rather selectivity in getting a pitch to swing at. (I don't know whether Ellsbury has really improved that or not, but that's what I read the scout as saying.)
Part of what makes the scouts' comments so fascinating is the contrast between how they evaluate players, and how outsider performance analysts do it. One of the great yet-to-be-answered questions of baseball analysis is "Where are the scouts consistently able to see things that aren't (yet) showing up in the numbers, and where are they just seeing things?"
So there's slightly more consistency in ability to produce line drives than in ability to hit 'em (line drives) where they ain't? But not much persistent ability to do either? That makes sense.
Jay, this is an important point regarding who is doing better with RISP, and how it changes from year to year. It's not that individual Angels are (predictably) clutch hitters; it's that the guy who is at the plate in clutch situations is doing better than average, on average.
We've known since the peak years of Darryl Strawberry that an unusual mix of situational matchups can look like clutch (or choke) behavior. Is it possible that Mike Scioscia's great anti-sabermetric skill is really the most sabermetric skill of all: Strat-o-matic matchup management? Statheads have been saying since the days of Earl Weaver and Sparky Anderson that good managers put players in situations where they are more likely to succeed, and keep them out of the ones where they are more likely to fail.
I'd be curious to see whether (say) the Angels have faced same-handed pitching (or hitting) less often as a team, over the past few years, than have other teams. Or whether they have smaller platoon splits on average. Etc.
Other than the last paragraph ("just don't give a damn"), I pretty much agree, and it's nice to see this pointed out. I do think you underestimate the resilience of the "Pitching is 107% of baseball" crowd, though -- they can point to that Braves/Astros series as an indication of how the Astros won precisely because their pitching stepped up.
As an aside, can we please get some copy-editing, at least for the first few paragraphs? I can understand mangled sentences in paragraph 19 slipping through, but not the lead sentence.
I'll second Brian Kopec's comment on finding some way to control for confounding factors, and also ask if you've looked at whether the delta rankings are stable from season to season.
GB% is essentially a binomial random variable, and in 200 PA (best case split for someone who just qualifies for the study), +/- 2 standard deviations is +/- 6.5% of ground ball rate. That's a large swing relative to the PRIDE deltas you're drawing conclusions from; you might be measuring pitchers' serendipity more than their situational ability.
Nothing against the content here, but the copy editing (if there is any) is really slipping. I see 4 errors in the first 5 non-quote paragraphs...
Please tell me that Shelby Ford's nickname is either "Cobra" (already taken in Pittsburgh) or "Mustang"...
Or, conversely, he'll be out until his jones is chipper again -- probably 2 or 3 days.
Eric, why the 40 IP cutoff? I could understand it if you were comparing on a rate stat, but WXRL is cumulative. Better still, it's automatically capped in how much negative you can accrue in one game, since you can only lose a given game once.
As much as I loved the analysis here (good job), I still long for the day when I will never see the words "[his slider] breaks sharply just before crossing the plate" or "late-breaking curve" or other similar impossibilities. And at BP, of all places.
Excellent point -- it's starters that matter.
This year, NL DH's are hitting .277/.357/.457 in interleague games, while AL DH's are hitting .254/.338/.448 overall, mostly against AL pitching. In other words, NL DH's hit AL pitching better that AL DH's do. That doesn't look like the answer to me.
As far as I can tell, it's not particularly true.
Among all major leaguers with 1000 career AB or more, the average walks per extra base is 0.754. Among recent players, the lowest values are Rob Picciolo (0.197), Shawon Dunston (0.234), Shea Hillenbrand (0.252), Tony Armas Sr. (0.254), and Alfonso Soriano (0.256). Picciolo is remarkable -- he had no power at all, but far less patience, managing only 25 walks to go with his paltry 127 extra bases in 1628 AB.
The question at hand, though, is what is possible at the high end of the scale -- lots of walks without corresponding power.
The all-time champions in BB/XB turn out to include a number of pitchers:
4.97 Red Faber
4.04 Whitey Ford
3.53 Don Sutton
3.16 Tom Glavine
I think it's pretty clear that if Tom Glavine can draw walks in 7% of his PA, then drawing walks is a skill that has nothing to do with pitchers fearing your power.
Among position players, the list is dominated (as expected) by dead-ball era players. Turn-of-the-20th Phillies' CF Roy Thomas holds the all-time mark, with 4.57 BB/XB over 5330 career AB. Miller Huggins, Donie Bush, and John McGraw are also high on the list.
Of post-1920 players, Eddie Stanky is the king, with 2.91 BB/XB over 4301 career AB. Stanky was a post-WW2 utility infielder who managed to draw 996 walks (and post a .410 OBP) while slugging .348 for his career. More recent players who have also topped the 2.50 mark are:
Al Newman, 2.78
Rodney Scott, 2.70
Billy North, 2.59
John Cangelosi, 2.58
Bud Harrelson, 2.56
Otis Nixon, 2.55
Lance Blankenship 2.47 (good call, Joe)
Steve Jeltz 2.46
Jose Oqueno 2.31
Otis Nixon serves as something of a refutation of Joe's hypothesis all by himself: 11 career HR in 5800 PA, a lifetime slugging average of .314 -- and he was one of the most prolific base-stealers of all time, swiping 620 bags in 806 attempts (77%). If there was ever a player you wanted to make swing at the ball, it was Nixon. Instead, he walked in almost exactly 10% of his PA.
If I raise the threshold to 5000 career AB, the top of the "recent" list (plus a few other famous post-WW2 players) looks like:
2.55 Otis Nixon
2.04 Willie Randolph
1.93 Richie Ashburn
1.91 Mark Belanger
1.89 Luis Castillo (good call, FalcoT)
1.86 Eddie "The Walking Man" Yost
1.72 Mark McLemore
1.72 Ozzie Smith
1.72 Mike "The Human Rain Delay" Hargrove
1.61 Brett Butler (my guess, not as good as Joe's or FalcoT)
I think it's pretty clear that there's nothing impossible about drawing a lot of walks without having a power bat to intimidate pitchers with. Keep in mind that a BB/XB ratio of 1.5 can mean slash stats of .252/.314/.312 at the low end (Don Kessinger), or .291/.375/.378 at the high end (Pete Runnels). There are nearly 500 historical players with a BB/XB of 1.3 or better in 1000 AB or more, and 444 of those had fewer than 50 career HR.
So I think it's fair to say that Eddie Yost and Lance Blankenship aren't all that rare.
Thome had to waive a no-trade clause. The Dodgers are 99% likely to make the playoffs, and have the best chance of any NL team to win it all. For that, he was willing to waive. For the Rockies, I doubt it.
Interesting stuff -- but this article desperately needs some graphics.
Question: are there any traditional habits about which game of a series (home or road) a team will generally choose to rest their starting catcher, or other starters? We all know the "day game after a night game" thing -- do those fall more frequently on the penultimate day of a series?
"He has a reputation among umpires for being a whiner and complainer"
With Pitch/fx, we now have the ability to look at umpiring reasonably objectively. We've seen (some, but not enough) looks at who the most and least accurate umpires are. I haven't yet seen any analysis of which players get screwed the most. I'm betting there's even more variability on the batter side than on the umpire side -- they guys with the well-trained good batting eye will suffer the most, a la Frank Thomas in his first few years.
Maybe Gordon's just a whiner; maybe he really is getting jobbed. It's worth a look.
Gaah. That will teach me not to take time to think about how to phrase things...
"an objective formula which averages a team's actual, first-, second-, and third-order winning percentages"
That's W0, W1, W2, and W3. I suspect you were interpreting *actual* as a stray adjective, rather than a category.
Anyway, you're assuming your conclusion by calling wins the "biggest indicator of performance". Wins are the biggest indicator of wins, and the perfect stat for figuring out where you are in the standings. But if you want to predict future wins, you're better off looking at performance -- i.e. what players have done in individual plate appearances or batters faced. Better still, adjust those for park and league and quality of opposition.
After reading the whole article I know what you meant, but I can certainly see why some people read this as blaming Wagner (and thinking he doesn't deserve a welcome back). "Wagner forced the Mets to use lesser relievers" is simply not a useful way of phrasing it -- it sounds like a blackmail-and-kidnaping plot, not an injury.
The people cheering for Wagner were (shockingly) cheering for Wagner, not for the Mets or their playoff hopes. If that fell flat for you, well, I don't think that's Wagner's fault or the fans' problem.
"Supposed to"? Says who?
Agreed. These are fantastic improvements, Eric, and I salute your (collective) efforts, but we've been promised cross-year queries from the Stats page for (literally) years now, and they still aren't available for subscribers -- though they are available to authors.
Clearly, when Albert Pujols makes an out, it doesn't hurt the team as much as an out made by a fast guy with no power. Um, right? Isn't that what McCarver just said?
School me on this if I'm wrong, but the change in momentum is proportional to force delivered to the head/helmet system, not the head. If the magnitude of Vafter is similar to that of V0, that implies a very elastic collision and suggests the helmet took the brunt of it. If the magnitude of Vafter is small or zero, that suggests a very inelastic collision, which suggests a fractured skull (or at least a lot more force transmitted to the skull through the failed helmet).
It's pretty clear that a ball lodging in your skull is worse news than a ball ricocheting away, regardless of how far it bounces. What am I missing?
I agree on one level -- but he's seriously let his team down by exercising that freedom. As with Ron Gant's broken leg, this sounds like one where the player should not get paid during the downtime.
Highlander 2? Highlander 3? What is this, April 1? The emperor's new sequel?
Next you'll be trying to convince me there really was a "Citizen Kane 2: Rosebud Boogaloo" direct-to-cable...
The difference in Denver (I personally wouldn't call it a "problem") is not only how far the ball travels -- it's how fast the ball falls to earth. Backspin doesn't have the same effect in thin air, so line drives and fly balls that would hang up long enough to be caught at lower altitudes often drop for hits in Denver. The dimensions and the humidor do not alter this in any way -- it's a purely aerodynamic effect. For the same reason, pitches are straighter at altitude, and thus less deceptive and easier to hit hard.
The 25% boost is based on comparing how much better all players (including visiting teams) do at Coors, versus how well they do elsewhere. Those pitching staffs weren't as terrible as you thought -- indeed, one of the recurring problems of the Bichette/Castilla years was thinking they needed more pitching when they really needed better hitters.
Correct from a health/sanity point of view, but not in the culture he lives in now. We can eradicate smallpox, but not machismo.
Bless you, my son.
"Jim Rice -- almost as good as Ken Singleton, but in the Hall of Fame"
B-b-but... he's The Captain! _The_Captain_. You can't move The Captain, just to make room for the most valuable shortstop since Honus Wagner...
The scary thing about the Nationals is that they might not be wrong. It all depends on how soon they expect their incoming cohort of pitchers to be MLB ready, how good they actually are, and how long they last before some inevitably succumb to injury.
(Incidentally, the word is /linchpin/, 2 i's, no y.)
"You could say the same about ERA and RA. In the end, the goal is to allow less runs, so those stats are really the end-all-be-all of "descriptive" stats, but we would never use it to evaluate a pitcher's inherent value."
That's an awfully ironic thing to say as a BP writer, given that BP does not publish any pitcher evaluation stats that are not outcome-dependent in the way ERA and RA are. VORP, SNWP, etc. are all about how many runs actually scored.
Where is the stat that is to QERA as batter's VORP is to OPS?
Key missing stat from this analysis:
How many pitchers would we expect PECOTA to underestimate 3 times running just by chance, given the standard error of the projections? I have a hunch the answer is very close to 8...
I think the people complaining about your way of breaking the problem down are missing the point. You can't say "only swing at strikes" because batters don't have that skill. They can only decide in advance to not swing at ANYTHING, or to be willing to swing. Once they are willing to swing, they can be fooled, or get overanxious, or just plain miss.
As follow-on work, it might be interesting to see if it's possible to characterize the type of hitter who gets the biggest wOBA boost from swinging at 3-0 pitches. It looks like it's the already-selective power hitters, but quantifying that difference might be fun.
Not quite. Jim Thome will swing 3-0 if the pitch is exactly the one he's looking for. Alfonso Soriano will swing 3-0 if the pitch is one he would have swung at on a 0-0 count. That's why Soriano needs a red light, and Thome doesn't.
If you are an NL #8 hitter, with the pitcher on deck, a walk is a FANTASTIC outcome. In the early innings, it means you avoid having the pitcher lead off (and kill) an inning. (With 2 outs and runners on 2nd and 3rd, you won't get a hittable pitch anyway.) In the late innings, either you're getting a (semi-)intentional walk anyway, or you're setting up for a pinch hitter to have a high-leverage PA. Both of those are much more valuable than what a #8 hitter will generally do at the plate, and should be praised.
"players should not try to work a walk when more is called for"
I don't think you can say that. A walk is always a positive, except in situations where a double play is essential (in which case the other team isn't going to give you anything to hit anyway).
One of the big cultural barriers in baseball is the feeling that a player who draws a walk is passing the buck, where a player who takes his cuts is shouldering the load, manning up, etc. Unless you're Barry Bonds on the otherwise impotent Giants, leaving it to the next guy with no more outs and another man on is never the worst alternative.
MLVr is based on a player batting 5th in an average lineup. That's better than the old RC assumption of 9 copies of the player, but it's still not either lineup-specific or sequence-specific.
Because Bonifacio usually batted in the top of the order, his MLVr should be adjusted (downward, in his case) to account for getting more than 1/9 of the team PA. The same hold for Nick the Stick. So, even without The Hanley Effect, the raw team scoring difference should be more than the .35 runs per game that is the difference of their unadjusted MLVrs.
You could go the next step by using the Marlins' remaining lineup as the context for computing MLVr, instead of a league average lineup. I don't have the numbers to compute a non-Bonifacio OBP/SLG for the Fish so far, but it looks to be roughly league average, so no real effect there. However, as you note their offense is skewed toward low-OBP power, so Nick is a bigger multiplier than a lower-OBP higher-SLG player with the same OPS would be.
If I had to guess, I'd put the 'true' runs per game difference between the Emiliofish and the Nickfish at somewhere between .4 and .5 runs per game. Which is huge.
OK, other sources say "upper back". The good news is that means no chance of the really nasty eye or ear involvements. The bad news is that it's a fairly large area of sensitivity, and it's nearly impossible to sleep without touching your back to *something*.
It would be nice if the glossary actually said that SNWP = SNW/GS.
This is sort of a general philosophical quibble, but I thought the point of the glossary was to make it so that readers didn't have to search the archive or own the annuals in order to understand a stat, at least roughly.
None of the glossary entries on the SNW family of stats say anything about what they are based on, and only one or two explain relationships among them. The 'details' link for each entry gives the same text as the main entry. And some of those entries can be misinterpreted -- for example, SNVA says "wins above average added by the pitcher\'s performance", when it's actually based on the runs allowed (which BP correctly tells us is not just pitching performance, but also defense and luck of the timing).
As I said, just a quibble -- but it's why pointing to the glossary is not an answer to berkm3 in this case.
I will believe your indignation is not "mock" when you are clamoring for Gaylord Perry to be kicked out of the Hall of Fame.
The site glossary doesn't make it clear, but I believe SNWP (like all BP pitching stats) is run-based. It doesn't adjust for situational luck or quality of defense.
It's not a useless stat, but I would appreciate a stat specifically designed to predict how effective a pitcher would be going forward with a new team, or an average team.
" LaTroy Hawkins is dealing with an outbreak of shingles. That's said to be very painful..."
O dear Jesu yes. Or, rather, not so much *painful* as intensely tortuously uncomfortable. I went about a week without being able to sleep for more than an hour at a time, and another week before I could wear a shirt without whimpering.
There are mild cases and bad cases. I sincerely hope he has a mild case. Do you know whether it's torso, neck, or head?
Excellent point on the non-drug unacceptable behavior. I care a lot more about the Wil Corderos and Brett Myerses and Elijah Dukeses than I do about the alleged effects of alleged steroid use. One wonders whether Michael Vick would be back on the field today, had he been an MLB star rather than an NFL star.
Please read the whole sentence. Nothing special AT THRID BASE. Which is true -- a player who might be a perennial All-Star at shortstop or second could be a completely mediocre third baseman. The offensive standard is higher.
NFLers don't juice so the team can win. They juice so that they can earn/keep jobs in a league with few guaranteed long-term contracts and more competition for every roster spot than MLB has(*). Shorter careers, and you only get paid when you're still good enough -- that's a powerful motivation.
(*)This is easy to see from both the caliber of players released and the smaller dropoff in quality when "replacement players" show up during strikes.
Also, at least in his start against the O's, he wasn't facing the first team. As a RHP, he nevertheless faced a lineup that included Felix Pie, Ty Wigginton, Robert Andino, and Nolan Reimold -- but not Luke Scott, Aubrey Huff, or Cesar Izturis, even as pinch-hitters.
Um, subtracting the before from the after is exactly what you want -- it already factors in your opponents' change in odds.
Think of it as having a board-game-type spinner, with a dial divided into "playoffs" and "go home". Adding Roy Halladay replaces a portion of "go home" with a new section labeled "playoffs because of Roy". Subtracting the before odds from the after odds tells you how big that new section is.
As others have noted, using the ratio is particularly uninformative when dealing with small probabilities. Knowing that a certain behavior triples your risk of cancer sounds bad -- until you learn that it takes the odds from 1:30000 to 1:10000. The question you should care about is "what are the chances that this behavior will cause me to get cancer?", and that's what subtraction answers.
"What you’re missing as fans are very athletic people with a tremendous amount of ability," Torre said.
I hope someone shut him up before he started talking about how well they swim. I can smell the subtext in that comment from here.
Oh, and Ken -- way to justify your victory in BP Idol. Great article, both form and content. Love the image of the opening graphics as the lemon sorbet palate cleanser...
Love the "bullpen box" idea. Who is available, when did they last pitch (and how many pitches did they throw), and what is their opponents' OPS (or other rate stat). Ground ball rates might be relevant for those "need a DP" situations.
Sound familiar, you Strat and DMB players?
"If you’re a student of Cold War politics, or perhaps just a fan of early R.E.M. [...]"
Best article opening line of the year.
"'All the things that have been public are… not exactly as they were reported.' Not only is that an aspersion on Rubin’s reporting [...]"
Let's not be disingenuous; the standards for accuracy in reporting are very, very minimal. There have only been about half-a-dozen times in my life when I was directly involved in events that were reported in a major newspaper or wire service, but every single one of those reports contained significant errors of fact, sometimes even changing the character of the story. Is there a reason to suspect that the standards in sports reporting are higher than in the rest of the paper, or higher in the Daily News than in the Washington Post? I would say it's nearly certain that things were not exactly as they were reported. That's not to say that the report wasn't close enough for management work.
From the OED:
metastrophe (noun, rare)
1. Radical or fundamental change or transformation
1908 A. UPWARD (New Word 200) That is to say Metastrophe... I mean not growth and decay, but growth turning into decay, and decay turning into growth.
Yep, sounds about right. Nice wordplay.
#define pedant ON
"While a sample of this size may not be large enough off of which to base irrefutable claims[...]"
Off of which to base? Ouch. I gather Miss Pricknel used to whack your knuckles with a ruler for ending a sentence with a preposition? Shame on her.
If the terminal preposition (which should be "on", by the way, not "off of") doesn't work for you, how about "...may not be large enough to support irrefutable claims"?
#define pedant OFF
MWDoEU cannot even bring itself to agree that saying 'infer' when you mean 'imply' is something to be avoided. It may be a pretty good guide to how people actually speak (including persistent and widespread errors), but apparently not to how they should write.
...Clemente, Patek/Rojas, Jeter into the stands, ...
For me, the best may still be one of the plays that Jim Edmonds made, diving while running flat out away from home plate and toward RF, to catch a line drive just off the ground. Diving over-the-shoulder catches are the rarest of all.
D'oh, should have guessed Slidin' Billy Hamilton...
So, is Tim Raines the all-time Rickeyest player? Or is it some Tris Speaker type from the days of yore? (Cobb and Wagner probably gain too much power when you translate them forward...)
I would love to see the scatterplot of PECOTA absolute error vs similarity score...
Yeah, Brett Myers' demeanor off the field is so much more exemplary than Abreu's...
JAEPFCA = "just another ex post facto chemistry argument"
"Hans Lobert was hired for the 1942 season by owner Gerald P. Nugent, who was only running the team because William Baker left a controlling interest in the club to his secretary, Nugent’s wife."
I'd love to see the full-length article explaining exactly why *that* happened...
What he said.
I love it when you get all binomial.
Halladay has additional value beyond his VORP in how deep he goes into games. This makes the team more likely to win games in which he is not the starting pitcher, because it gives the bullpen more rest. It also directly reduces the number of innings thrown by pitchers who are not as good as Halladay, especially pitchers who would pitch the 6th and 7th innings of games with less durable starters.
"If a reliever comes in with the bases loaded and no outs, and then strikes out the side, shouldn't he get credit for that effectiveness?"
It depends on why you're looking at him. If you want to know, in hindsight, how much he contributed to team wins and losses, then yes. If you want to predict how useful he's going to be in the future, then not much. If you want to know whether to draft him for your Diamond Mind or Strat-o-Matic team, then not at all.
I'll echo the original poster's plaint, which I have aired here myself in the past: BP doesn't publish any pitcher metrics that aren't either context-dependent (WXRL) or outcome-based (VORP, ERA, SNWL, etc.). There's nothing that is based solely on the outcomes of individual plate appearances.
For the "it was the defense" story line, add in the fact that Justin Upton looked hopelessly lost on Granderson's triple. I thought the ball was going to be caught. When I saw where it landed, and the route Upton had taken, I still thought it should have been caught. I wonder if the fact that Upton plays exclusively RF for the Snakes had anything to do with it.
Was this sarcasm? If so, I approve. There *was* a skills competition at the ASG, back in the day, and it was great. But nobody cared, and the players didn't much like it either, so they killed it.
Um. Did you also filter out times when the batter didn't swing because the pitch was in the dirt or out of reach? Those would tend to boost the success rate on non-swings, relative to swings, because they're also harder for the catcher to field and throw.
Of course, anecdotally, there are quite a few starting pitchers who are shakiest in the first inning, then settle in for a while once they get through that. SOMA wouldn't work with such pitchers, and working around them would make the rotation a logistical nightmare.
In the late '70s, very few games were televised nationally, and those that were typically used only 3 cameras. And if you've ever tried to single-frame through a VCR tape, you wouldn't suggest that it's comparable to what we have now.
...Not to mention the fact that there's no way to tell whether a pitch was a strike or not from a center field offset camera angle. There's a huge difference between suspecting that the umpire is blowing a lot of calls, and having pitch tracks of every pitch to compare against the call that was made.
"closers won’t accept the uncertainty of not knowing when they’ll pitch"
What a sad indictment of their professionalism.
Then again, what could "won't accept" mean? What, exactly, are their options? Throw games? Retire? Sulk? The last one can be lived with.
The difference is that now every fan at home has the technology to see exactly how bad the umpiring is, on every pitch or play, in slow motion. The status quo can't last in that environment; the umpires would do well to embrace technology openly, because they can't win an us-or-it battle.
The bad news is that I'm in an AL-only DMB keeper league, and was expecting to get zilch from my shortstops this season.
The good news is that I was expecting zilch because the only shortstops on my roster were Bartlett and Zobrist, plus Erick "Replacement Level" Aybar. Zobrist is well on his way to being my starting LF for 2009. Now, if I just had a first baseman...
I've heard this comment before, that not having a human being call balls and strikes would somehow be counter to the spirit of baseball, but I have to admit I really don't understand it at all. I especially don't understand the ones who go on to say that it's somehow similar to replacing the players themselves with automata. Huh?
That's like saying that using photographs to check which horse actually won the race is tantamount to replacing the horses with robots. Or that no *real* fencing has happened since they started using electronics (instead of human observers and assumed gentlemanly sportsmanship) to determine touches in fencing.
Baseball is about the players, using their skills to accomplish things within the rules of the game. They determine what happens on the field. Umpires are about establishing the facts of what happened, and applying the relevant rules. What the players do is a sport. What the umpires do is accounting.
Yes, there are calibration questions associated with Questec or other automated systems -- but (and this is the critical point) those issues are small compared to the calibration, bias, consistency, etc. questions associated with any human umpire you care to name.
According to the last PitchF/X study of this I saw, the umpires we have are only getting about 85% of the *obvious* (not borderline) ball and strike calls right. Worse yet; those errors are not uniformly distributed across players and player types. A system that is poorly calibrated, slightly arbitrary, and consistent would be an order of magnitude better than what we have today -- and would only improve over time.
While I agree with everyone who is saying that Ricciardi could not do this deal and still have a career, it makes for fun speculation.
Here's the version I favor: Halladay to the Mets, in exchange for Wells to the Mets. Period.
Why should the Mets have to give up anything beyond the $20M per year? It's essentially a cash deal, where the Mets pay the Jays $100M for Halladay and Wells. When you phrase it that way, it doesn't sound as stupid for the Jays, and it's obvious that giving up any prospects on top of the cash would be way too much to ask.
It would be fascinating to watch the Commissioner's Office responsd to such a proposed deal. On the one hand, they'd hate it for being a pure cash transaction, and terrible PR. On the other, as a cautionary tale to teach owners the evils of big contracts, it's matchless.
Which is it to be -- the "best players in the game", or the leading vote-getters? You can't have it both ways.
I'm afraid it's too late. It already *is* the Pro Bowl of baseball, except that the Pro Bowl players were mostly good for an entire season before being selected. Pity, but there we are.
We hear a lot at BP about the consequences of short benches and bloated bullpens. Let's see some actual quantitative analysis -- analyze the difference between carrying a 12th (or 13th) pitcher vs another bat/glove.
I'm surprised that Nick Green's Chipper-induced toe tap got so many words, but not a peep about all of the work that Ben Zobrist put in with swing mechanics consultant Jaime Cevallos over the offseason. My understanding was that Cevallos pretty much rebuilt Zorilla's swing.
(How is Drew Sutton doing in the Reds' organization this year? He was also a Cevallos student, IIRC...)
"Coors also has a higher BABIP than other parks due to the ball traveling farther."
I realize that ESPN is not the place to get all technical on people, but this is both wrong and misleading. How far the ball travels doesn't affect BABIP. The key is how long the ball stays in the air (and to a lesser extent how long the ball takes to get through the infield, which can be offset with long grass).
The big Mile High effect for balls that don't leave the park is that backspin doesn't hold the ball in the air the way it does at lower altitudes, so fly balls and line drives hit the deck faster, giving fielders less time to get under them. (Think of the difference between a 'rising' fastball and a sinker.) This effect is independent of distance.
Outstanding as usual, Steven. I vote we make "Lest We Forget" a regular column, with exactly this sort of stat-inspired look back through the unjustly forgotten as its recurring theme.
(Aside: I wonder what Firpo Marberry's WXRL was in 1926 -- he went 12-7 with 22 saves in 138 innings, with only 5 starts out of 64 appearances... )
15cc of fluid... let's see. An ounce is about 30ml, same as 30cc, so this is half an ounce of fluid, or 1/16 cup, which is 1 TBSP.
OK, that's not as bad I it sounded at first, given how big a hip joint is, but it's still an unfortunate amount. (I assume there was no sign of infection, since you would certainly have reported that.) Am I right in thinking that Synvisc will help (temporarily) if the problem is friction against a smooth surface, but not if there's a spur or sharp edge somewhere?
Yep. Thumb up on momentum, given that it was borderline for me otherwise.
I'm confused about why you think umpires should get a free pass when they fail to make the routine plays. TIVO and Pitch f/x and such are going to bring umpiring more and more into the spotlight in the next few years; how MLB reacts to that is going to be prime fodder for BP and other outsider sites.
This entry needs Ritalin. There's probably a pretty good column scattered in there somewhere.
Loved the pinch-hitting stat, and kudos for getting it on such a tight deadline.
Fine, except that you didn't mention that in the article. It seems an odd thing to expect a random reader to know.
Is there some text missing at the start of this article? The first sentence I can see is "Trey Hillman knows that his team's record does not show it.", but I have no idea what 'it' is.
(I ask partly because there was a recent BP Idol entry where a paragraph was garbled for me, but some other readers could see it correctly.)
I thought the big takeaway was that La Russa has 'built' a bench with absolutely nobody who has any prayer of hitting (say) Eric O'Flaherty, should the other team have such a player. But I guess that's outside the scope of this one game, and the given topic.
Decent hook, and (unlike some entries) not a tedious play-by-play rehash. I would have hammered more on the "if you're going to carry 13 pitchers, shouldn't you be willing to use them?" point, but maybe that's just me.
Solid, and (unlike most entries this week) coherent. I only stumbled at "eventually fulfilling his prophesy". Huh?
I don't think there's any evidence that Gutierrez can ever hit right-handed pitching, which is a pretty bad problem to have as an ostensible regular.
I had the same reaction that locke623 and lyricalkiller had, namely that at his peak
Barry Bonds : MLB :: Manny Ramirez : Hi-A
(I also can't decide whether to hope that "Man-Ram" was an inadvertently unfortunate choice, or a deliberate one...)
Definitely _A Study in Scarlet_. ("A study in Scarlett" would be shown on Cinemax late at night.)
For a stathead, the big takeaway here is that the two most important (high leverage, pivotal, whatever) plays of the series were double plays -- It's All About the Outs. That's worth hammering on, and generalizing.
(Of course, the other thing it shows is that nothing that happens in Game 1 is going to affect CE much. That's mostly because CE is partly a measure of how complete our information is -- Game 1 counts as much as any game, but we don't have enough info yet to know how important it will turn out to be.)
Overall, yeah, a bit dry, but good concept and reasonable execution. I've been stingy this round, but I think this gets one of my 2 thumbs up.
Thanks for posting the missing paragraph, Richard. When I load this page, I get the sentence:
"As the game historically switched Lou Gehrig and Jimmie Foxx)."
That aside, I think Steven Goldman nailed the criticisms. Great idea, but wandered off into a swamp and sank. I really missed the paragraph discussing what the distribution of contribution among the positions *should* be, and why. "All equal" isn't obvious. And, as Richard notes, there are some "relative contribution vs. absolute contribution" issues, both among positions and between offense and defense, that were not handled cleanly.
Great choice of topic -- and that's an important skill -- but I think you bobbled the execution.
Paragraphs 2 through 9 dragged on and on; there has to be a more concise way to say that. Especially since, as you nearly said explicitly, you simply can't translate Ruth's K/9 or HR/9 to a translated modern equivalent and make comparisons to modern pitchers with those rates.
Modern pitchers are effective by minimizing baserunners and HR, because crooked numbers lurk in every PA. Dead-ball pitchers had different primary worries, and were successful in different ways. Was Ruth declining in his ability to do the things that made pitchers successful in 1917? It would have been nice to learn the answer to that.
I also liked this a lot better than the BP staff did. Some of that can be blamed on my ancient reverence for Jimmy Wynn, but not all of it. I'm not sure what the staff thought Ken was supposed to be doing with this piece.
I liked the breakout by decades. I liked the style. I followed the links I wanted to follow, and ignored the rest, as always. I thought "Little Big Man" was an inspired name. I'm even ok with the metric -- Jimmy Wynn and Mel Ott come out on top **in spite** of their high walk totals, which is just fine.
And it didn't feel long at all, because I was enjoying reading it all the way through.
I disagree, jepson. One side effect of the "coast until you really need to get outs" approach that was possible in the early days is that pitchers very probably really did "pitch to the score", being willing to allow a few meaningless runs in a game they were winning handily, and bearing down extra-hard in close games. That made ERA less useful as a measure of ability (or even performance) in those days, and W/L at least partially a reflection of how well pitchers adapted to the situation.
I wish Matthew had made this point in his article. Explaining why W/L wasn't a totally moronic idea from the start is an important part of the story.
Actually, I've heard several ex-player color commentary guys make *exactly* that argument: that because speedy players always have their speed, they will still be reaching base on infield hits and induced errors, and taking extra bases, when "all or nothing" power hitters would be having slumps and just striking out.
Speed is fleeting; outs are forever.
Great article, Steven.
** He'll be pitching against the Nationals, which can be safely called a "soft landing." **
Well, sort of. The Gnats can't *win*, but they can hit some, especially against RHP.
Eric Walker does something very similar to this in his look at the "steroid era" effect. He used total bases per hit as his 'normalized' measure of relative power, then looked at the history of that metric, to see if there were an identifiable "steroid era" effect. If you haven't read his work, you should: http://steroids-and-baseball.com/.
Well, sort of. It won't tell you how to pronounce Chone Figgins, though they do weigh in on Marquis Grissom. It will also tell you that Duchscherer is pronounced "DUKE-shur" (really?), but it won't tell you how to pronounce Koji Uehara (which I have yet to hear any announcer get right).
Agreed about trying too hard. The jokes were fine; there were just about 3 times too many of them. Not everyone can write like Gary "Fritters of Delight" Huckabay and get away with it.
(Keep the Batman sounds, though.)
Overall, solid and entertaining.
PS -- Again, a spelling error in the lead paragraph is A Bad Thing.
Points for taking on a controversial subject that BP has already weighed in on. Points for finding a new (and relevant) way to look at the data. Overall, very strong, especially the comparisons to Howard's PECOTA comparables.
Now, if you had only used those comparables for the platoon discussion, instead of current starting first basemen. Nobody cares that Howard is clearly better than Ishikawa, Overbay, Kotchman, Butler, et al. Still, great job.
1. Nats game, RFK, upper deck on 3rd base side. My friend is explaining to me how he once caught a foul ball off the bat of Jose Vidro in almost exactly that spot. Vidro steps to the plate, hits a pop foul into the seats, and my friend catches it.
2. Al Hrabosky. Manny Sanguillen. Jose Oquendo.
I'm having a hard time understanding the economics of signing bonuses. It seems like once you get past the first few picks of the first round, teams should either
(a) always be willing to pony up an amount that is small compared to a free agent contract, in order to get long-term rights to a potentially useful property, or
(b) never be willing to pay a significant signing bonus, because there simply isn't anyone left who has a high enough expected return on the investment to justify it.
I don't get the middle ground.
Did Rany's work look at the ROI for signing bonuses?
"Seven catchers had come off of the board before Bench was selected, including three in the first round (the best of these was Ray Fosse, the seventh player taken overall)."
My mother taught Ray Fosse in high school, in his home town of Marion, IL. I always thought it was poetic justice that Pete Rose, the man who trashed Fosse's shoulder in an exhibition game, ended up incarcerated in Marion.
Poor choice of word on my part -- I meant well-suited to the challenge given. 'Appropriate' might have been better.
Of course not. But the further discussion should be *aware* of the earlier discussion, and build on it or refine it or refute it.
MLB.com would love this article. That's not a plus in this context.
Loved it until the weak last 2 paragraphs. Not sure what the ideal wrap-up would have been, but that wasn't it.
Still, my favorite so far of your pieces, and the choice of nonstandard topic puts it over the edge into "thumb up" territory.
No mention of Babe Didrikson? Pity.
A worthwhile article, if not great. It needed more emphasis on "how much of an outlier would an MLB-able woman be?" and "how much missing infrastructure in HS and the minors would there need to be for us to ever really find out?".
On a minor note, the sloppy style (in the MLA sense) still bothers me. "A League of Their Own" needs quotation marks or italics. Don't write "try and" when you mean "try to", even if that's how you talk at the water cooler. "[O]ne and they can't play it the way they'd like to" needs a comma. Etc. Making the editor work hard is not a way to win employment.
My immediate reaction was "hasn't even read the recent BP work on this topic". Whether that's true or not, giving that _impression_ is a major failure. No cookie.
I don't understand the staff criticisms at all -- this one's an A+. A topic I knew _nothing_ about, made immediately interesting and relevant, discussed in just enough detail to not wear out its welcome, with real insight from the experts who do it for a living and some convincing humility from the author, who makes me believe he actually learned something unexpected.
The format was fine, or even a net positive. Way to go.
I give big points just for picking a topic I didn't expect. I agree that JAWS (or career WARP, or average annual WARP, or something) would have made the article better -- show the distribution of future WARP among AAA all-stars, and you get 3 thumbs up from me.
Nevertheless, well-written and intriguing. I'm amused that some of the BP staff consider "left me eager to come back and see part 2" as a negative... I'll bet Dave Pease is perfectly happy with that effect. And I respectfully disagree with Kevin -- an article on how minor league *performance* predicts major league success would have a totally different point, and a different feel. This wasn't about improving on DTs; this was about what the AAA all-star game roster means.
Jason Bartlett... TRIP?
One Primanti's creation is worth 3 late-round picks, or one (heh) sandwich pick. Make mine sweet Italian sausage, fried egg, cole slaw, and french fries on French bread.
"[W]hat is MLB's primary fiduciary responsibility? To maximize total revenue and profit? [...] To create competition for the fans? That sounds nice, but it's the owners that have taken on all of the financial risk."
Not quite. MLB is publicly subsidized in at least two important ways: stadium deals, and the anti-trust exemption. (I may be forgetting others.) Without those, the owners would face significantly more financial risk than they do today, with higher costs and lower revenues in general. That makes the public -- including non-fans -- a shareholder.
Looks like about the first half of paragraph 2 got sent to the bit bucket...
How could you leave Darren Daulton off that list?
My buddies and I used to play the game by these rules: "What's the best team you could put together using players who aren't in the Hall of Fame, and shouldn't be?"
Jimmy Wynn, Reggie Smith, Jack Clark, Harlond Clift, Jim Rice... oops.
Will, I love your stuff, but you're making me crazy.
Maybe I'm the only one, but I don't see a lot of other baseball journalism in-season, and especially not mainstream media. I just don't have the time. As a result, it's very frustrating for me when you allude to recent off-the-field situations, conditions, or events that you assume all of us are already familiar with -- which is happening more and more frequently.
I have no idea what you're talking about regarding Khalil Greene, or how to interpret your remarks. Could you at least throw in a link to a story, if you're not going to name the condition or describe the prognosis?
I play in a Diamond Mind keeper league, so I was excited by the prospect of an article that might give me some new and useful tips that would help me there. As I read, I kept waiting for the "why sim baseball is neat" to end, and the "...and here's how a BP reader might approach it" part to begin. It never did, leaving me wondering what the point was.
"Home run" is still two words. Before this contest, I hadn't realized that was esoteric knowledge.
The title caused me to read this one first. I'll be stunned if anything else beats it this week. This is EXACTLY the kind of thing a BP article should be doing:
1. Explain the state of the art
2. Explain how newly-available data extends the state of the art
3. Explain what you give up when you need to be able to look at a big chunk of baseball history on a uniform basis
4. Give people something they can *use*, now, to understand better or school their co-workers (or both).
I'd give this one 7 thumbs up if they'd let me.
Um, to clarify -- I was also going to ask where all the hip labrum diagnoses were coming from. Not the other part.
Don't forget Mike Lowell.
I was going to ask the same thing. Normally, I'd suspect a new and more specific diagnosis for an old problem (e.g. "UCL tear" replacing "dead arm"). But I don't remember any important hip injuries at all in the past, other than Bo Jackson's rather unusual one.
I think maybe you missed both of my points here.
1. The other pitchers on a team do NOT face the same hitters, especially under the anti-balanced schedule with interleague play. The differences are significant, and BP reports them in a canned report on the Statistics page. This deserved at least an aside.
2. The problem is not whether it is true that HR/9 is pitcher independent and BABIP is not. The problem is that the argument/explanation you gave for BABIP would apply equally to HR/9, and so can't be right -- or at least not sufficient. That's a flaw of logic and exposition, not of fact.
Sorry, no. Didn't like it at all. It seems unlikely that anyone who didn't already know OBP is important would be convinced by appeal to WHIP. Worse, the oversimplification of the probability model was sufficient to make it useless for intuition, and the implication that Monte Carlo simulation is the only alternative is both misleading and annoying to people who know some math.
I approve of the topic; I think there *are* things to say about OBP that most people don't get right away. Things like "every out you make robs your teammates of about 1.5 PA's they could be using to score runs". Things about the tradeoff between long-sequence offense and short-sequence offense. Things about how OBP is (perversely) more important in a #5 hitter than in a #3 hitter. But I didn't get any of that.
Started out great, then kind of petered out for me. I'll agree with Christina that the external references and historical story (including smarts pre-Moneyball) were well done. The ending was muddled and abrupt, though, and the importance (and complexity) of park adjustment glossed over too quickly.
I would have liked to see the correlation between DEff and wins compared with other, more familiar stats, for shock value.
I'm more positive about this one than most of the comments here. Yes, the prose could have been better (and snappier). Yes, the organization was less than perfect. But the article did its job well, introducing a concept and making a point.
I, for one, strongly encourage the links to external anaylsis, especially defining analysis that created a field of study. Nobody's forcing the reader to follow them, but they're there if they want to. And the recurring criticism of BP as being too insular is not without merit; this kind of article would help fix that.
ObQuibble: If HR per fly ball is roughly the same for everyone, then shouldn't fly ball rate be the core defense-independent stat, not HR rate? I understand that HR rate is much more widely available, but it would be nice to note that using it is a convenience, not a necessity.
"When David Ortiz connected for his first home run of the season, in the middle of a six-run fifth inning that powered the Red Sox to Wednesday’s win over Toronto, he told reporters in Boston that for the time in nearly 40 games he looked "like a real hitter." But if Big Papi is to keep poppin’ homers for the Sox, Ortiz may needs more than confidence: the solution could be to hit more ground balls."
I count at least two typos in that opening paragraph. Not professional.
This is also not a "Basics" article.
The sentence introducing G/F is extremely awkward; I think you meant 'ratio' and not 'rate'.
I also wonder how much of this is just a surrogate for line drive rate? It would have been useful to show something that GO/AO is showing you that line drive rate does not.
Overall, I was ready to love the topic... and didn't. The writing style suffered, but not to great effect.
"For pitchers, the best way to have a low BABIP is apparently just to face Chipper Jones less."
...and yet you didn't really address quality of opposition faced, or how much it might affect year-to-year correlation of BABIP.
Pretty good overall. I'll agree with the people who have called the first sentence "a disaster", and also those who found it disingenuous to present your discussion of randomized strategies as an *explanation* of the low year-to-year correlation in BABIP. Hint: why doesn't the same explanation lead to low HR/9 correlation?
I agree that leaving out any discussion of the league averages in the two years being compared was a mistake -- a new reader won't 'get' why Santo was better than Rodriguez without seeing that much context. That's my only complaint.
I disagree that "equal difficulty" is an issue. Evan said it well up above -- baseball is a zero sum game, and the only meaningful comparison is against your competition in the season at hand. The question of how Jack Cust would fare as a baseball player if Jack Finney taught him how to go back to 1882 is interesting after 2 beers, but not relevant to this article.
Very engaging, fun without working too obviously hard at it. Good job.
"Home run" is two words; getting that wrong in your first sentence is not a great start. Getting it wrong again later in the article is worse; you can't possibly sound like an expert at that point. One hopes that these would get caught by the editor, but BP is not the gold standard in that regard.
Overall... odd mix of detail and generality. Nothing grabbed me in the prose, much as I love making fun of Buster Olney. C+
The key point has already been made above: do specific individual pitchers do better when they throw more first-pitch strikes, or is it something that depends on what kind of stuff they have? Even if you can't answer that question in this article, you could at least pose it, and talk briefly about how it _could_ be answered.
Hmm. Not a "BP Basics" article, as I see it, since it didn't really go beyond defining the stat. I agree with others that the flip from team defense to individual defense was abrupt and potentially confusing, and I didn't really learn anything new except that the Reds have a guy who knows what DER is and values it.
I also thought some of the critical comments either needed a brief justification or a red pencil. If you're going to say the A's went too far, especially in a Basics article, then you need to back it up.
...But maybe I'm just grumpy about the gratuitous use of impact-as-a-verb in paragraphs 6 and 7.
Before I quibble, I'll first say that I thought this was well done, and at the right level of sophistication. The title was borderline; the gnome joke was fine.
Pure nerd quibble: if you're going to talk about using Run Expectancy or Run Frequency to evaluate whether or not to bunt, you need to mention the possibility that the bunt attempt does not result in a successful sacrifice. Strikeout, pop up, lead runner forced on FC, both runners safe on error... all of these things happen more often that most people think.
DeLongo, speaking as someone who doesn't believe steroids affect abililty much, I still agree with you completely about the cheating part. There's no sport without rules, and they're ALL arbitrary so rationalizations about which ones matter are inane. Corked bats, scuffed balls, greenies... I don't care. You cheat, you're not eligible for the All Star game for N years, value of N to be determined.
That said, it's just not true that you can't draw any inferences about whether steroids are affecting people's numbers without a controlled experiment. That's like saying we can't learn anything about weather, because we can't do controlled weather experiments.
"Molina [...]experienced walk droughts several times last year:
Start End PA
September 3 September 23 70
April 28 May 18 59
August 1 August 21 59
June 25 July 11 52
May 29 June 13 42
Interestingly enough, none of his longest droughts overlapped [...]"
Um, yes they did. Unless he had 11 PA on both Sept 3 and Sept 23, there's a drought of more than 59 (but less than 70) that starts on Sept 4 and ends on Sept 23, and another that starts on Sept 3 and ends on Sept 22. You just didn't include those on your list.
Obviously, if you don't include any droughts that are subsets of a longer drought, then (as Mountainhawk notes) it's impossible to have overlapping droughts.
The problem with this is that there isn't any "current year" to base things on; there's a "current month".
...unless you're talking about basing your vote on what players have done since *last* May, which is a defensible concept but really awkward to evaluate.
I used to worry about who was most deserving, and what the most defensible criteria for selection were. I gave up some years ago, and now I vote for the players I most want to watch play. In a given year, that's a mix of all-time greats (A-Rod, Pujols), intriguing young players (Kinsler, Adam Jones), entertaining players to watch (Jose Reyes, Ozzie Smith back in the day), and sheer competence (Chase Utley, Kevin Youkilis). In the end, it's an exhibition game, and I know what kind of exhibition I'm looking for.
"Of course, that makes my job really hard -— those guys get to injury news quickly."
Others have said it in other contexts, Will, but I'll say it again: your job has nothing to do with how fast other reporters are. For them, working in the paleo-media, it really is a race: they get scored on whether or not they are *first* to break a story. We don't care whether you're first, or second, or 23rd -- so long as what you tell us is open, reliable, and informative, and not TOO long in coming.
So far, you're doing a fantastic job; please don't risk screwing that up by racing against people who have little incentive to be accurate or nuanced, and every incentive to be first.
"the two Alexes (Alexii?)"
Unfortunately, if you went for the Latinate plural you would get 'Alices'.
That's Jim Palmer, all right. Fascinating stuff, but man is he an incoherent speaker. How many dozen times in that interview does he start a fairly straightforward sentence, and then... you have to understand, in the big leagues... to bring in Tippy Martinez, I mean, I was proud of my ERA.
You beat me to it, jdavlin.
For the record: Jones injured the hamstring early in the game, and Dave Trembley (during his stupid from-the-dugout interview for the local TV guys) said mid-game that Jones had said he was fine, but that his job as manager was to not take that at face value.
...at which point he let Jones finish out a plate appearance in which Jones winced visibly and had to take 30 seconds to recover after a swing-and-miss. Jones, swinging very late, eventually hit a fluke ball into the right field corner, but could only make it as far as first base. THEN Trembley took him out, after any additional damage to the hammy had already been done and re-done.
Corncobs? Who knew. If I recall correctly, Hank Aaron tells a very similar story about learning to hit bottle caps with a broom handle.
"Anytime you are attempting to evaluate the significance of a correlation, a reasonable vehicle or pathway becomes a relevant part of the discussion."
Perhaps. In my day job, it's all too common for the accepted/obvious explanation for the correlation to turn out to be totally bogus.
But what correlation were you referring to? The important point here is that there is, as yet, no demonstrated correlation requiring explanation. Nobody has found a correlation between "PED" use and performance enhancement in major league baseball players. That's partly because we have only very spotty data on who has used what, and partly because a lot of the players we DO have data on suck, and continued to suck after using PEDs.
Is it really that hard to understand the difference between "PEDs are not performance enhancing" and "We don't know which drugs, if any, enhance which kinds of MLB performance, or by how much"?
As for PECOTA litmus tests... try it this way. Many of the top offensive (and pitching) performers since the '80s are known to have prayed to God, or to Allah, or to Jobu. Other, less successful players are also implicated in prayer, especially fringy latino players trying to break into the major leagues. How much of the success of the successful ones should we attribute to prayer? Can we ignore the high incidence of prayer use among the most valuable players of the last few decades? Have we passed the point where we can expect PECOTA to predict the career of an atheist?
That's a pretty sobering list of potential side-effects -- but no worse than the list that goes with (say) most antidepressant drugs. Is the use of antidepressants a performance enhancement that should be banned? If not, why not?
The fuzzy line most people seem to be trying to draw is the line between "fixing a problem" and "enhancing beyond what's normal". But which conditions we call "problems" and which we call "normal" are social conventions, not objective facts.
Just to be clear -- mine was not a "how can we hold back the ungodly tide of medical advances" post. I'm trying to point out that the line between "fix what's broken", "new training technique", and "Illegal Performance Enhancement STOP STOP STOP" is already much less clear than the media and the fans tend to want to make it. Especially when we really have no idea how much (if at all) most of the illegal techniques really help elite baseball players perform better on the field.
I'm reminded of the line the Olympics tried to draw between amateur and professional. Today, we all pretty much dismiss that attempt as rampant classist elitism, intended to keep the riff-raff out of the Gentlemen's Club of international amateur athletics. And so it was, in large part -- but not entirely. There really was a component of trying to preserve a purity that was easy to want but hard to define. I wonder if future generations will look back on the struggles against PEDs and medical advances the same way.
Before someone brings it up, I do have some sympathy with making known-to-be-destructive PEDs and techniques illegal, especially when kids are choosing (myopically) between success now and a pleasant life sometime after. But let's be honest -- HGH abuse probably isn't any more long-term detrimental to health than, say, pitching for the University of Texas for 4 years, or playing at all in NFL.
It's pretty easy to see how the lines between "performance enhancing" and "corrective" and "recuperation enhancement" are going to get impossibly fuzzy.
Remember when some people thought that Tommy John surgery made you better than you had been before? Any day now, a procedure will come along where it's true -- and then what do you do?
"[T]he Jays play just nine of their first 78 games against the big three, then get them 42 times in their next 71 contests."
Prediction: the resulting drop in the standings will be called a "fold".
But what everyone really wants to hear is a compare-and-contrast of Josh Hamilton's rib injury with Melissa("Dancing with the Stars") Rycroft's...
Let me rephrase your question, then answer it.
Q: Have we learned anything yet that would justify modifying our pre-season predictions about how good various players and teams will be this season?
Some of the apparent surprises will pan out; some will go away. We don't have any way (yet) of telling which are which, and more data is the only cure for that.
Which isn't to say that I can't *hope*, as a fan, that Josh Johnson and Nelson Cruz are for real, while Erick Aybar's cold start is just random noise...
Just to expand on that -- the idea of "two standard deviations" away from a priori expectations only applies in the usual way if you pick a player at random. If you chose those players to look at precisely because of their season-opening ofers, you need to do a "Bonferroni adjustment" to account for the size of the population you picked them from.
If everyone were a .300 hitter and always faced average pitching, you'd still expect 3 hitters each year to start 0-for-13. Throw in the fact that most players aren't .300 hitters and that you can open the season against 3 hot starting pitchers, and even worse starts are wholly unsurprising.
More Info on Bill Veeck
By Nick Acocella
Special to ESPN.com
While Commissioner Happy Chandler admitted he was amused by the ploy, American League President Will Harridge, more dour, struck Gaedel's name from the record book and banned further appearances by midgets. Veeck, never one to let a good gimmick die - and intending to use Gaedel again, preferably with the bases loaded - demanded Harridge declare a minimum height limit, wondering whether, if it were placed high enough, the league could get rid of Yankees shortstop Phil Rizzuto.
I'll add my geezer vote to yours. I couldn't care less who broke a story first, or that someone else was 47 whole minutes behind them. I choose my news sources by reliability and depth of analysis, not by promptness. For the stories where minutes really count, I'm dead already.
Would grip fatigue be an issue here? Grip fatigue would affect fastballs as well as breaking balls, without any wrist action required.
Gotta disagree, Jay. The apparent 'higher' value of late-season games is an illusion caused by lack of information. The teams who lost those April and May games aren't contenders, so you ignore their August and September games when noting the higher leverage of late-season games. But they aren't contenders precisely because they lost those April and May games -- just ask Detroit.
I think he meant to contrast with signing free agents who are *already* well into their 30s at signing time.
Many dictionaries these days will also tell you that 'infer' can mean 'imply'. They're wrong; what they meant to say was that lots of people get that wrong regularly, which is not quite the same thing.
[Padding my resume for pedantprospectus.com...]
PedantProspectus.com? I'm there.
Has "stress test" become a synonym for "sensitivity analysis" when I wasn't looking? Or were you saying that this is a stress test on PECOTA, to see if it breaks?
70 IP for Mock -- the convention for multi-role players is that each line shows the playing time in that role, but every line's stats are totals for the season.
Is the missing OF time caused by a lack of Zobrist? He's shown at SS only (unlikely), and only 10% of the time (also unlikely). Giving him 10% of the LF time and 15% of the RF time might be closer to reality.
D'oh! I looked 3 times, and still missed it. Thanks.
Can we get the list of the lowest similarity scores in the current PECOTA? I checked the weighted means spreadsheet, but it doesn't include the similarity score -- just the top comparables.
Moyer is a 5, Ichiro a 17... Is anyone else close to that?
Re: Chipper Jones
Do the current DT Cards and/or PECOTA cards reflect the new defensive metric? Jones is listed at -12 FR for 2006 on his PECOTA card, -9 on his DT card, and still a whopping minus 180+ for his career. Is that the kinder, gentler score, or the same old?
So, where are all the 27-year-olds in the AL?
BTW, was that 0.03 correlation between Flake and season total SNVA, or between Flake and SNVA per start? If it was the latter, that says that standard deviation is a better measure of flakiness than coefficient of variation. If it was the former, part of the reason for the small correlation might be that SNVA is a cumulative season stat, and so depends a lot on health as well as performance. There\'s also a selection bias -- flakier pitchers might be less likely to get starts, due to managerial aversion to inconsistency. Since opportunities ARE correlated with performance (for sure) and flakiness (possibly), you probably want to normalize for that when testing for an effect.
One of my favorite stats here is Flake
Thanks, Eric -- I\'m glad we talked Michael Wolverton into including it in his \"Support Neutral W/L\" reports, back in the prehistory of BP.
One of Michael\'s very early discoveries was that, for almost all pitchers, flaky is more valuable than consistent. The exceptions are the future Hall of Famers (Maddux, Seaver, etc.), where a small variation around their outstanding average performance means that they always have the opponent at a severe disadvantage.
For everyone else -- where 1 standard deviation better than their personal average would rank high in the league, but 1 standard deviation below wouldn\'t drop their rank much -- flakiness means a better chance of pitching a winnable game, at the expense of a few more bad games that you would have lost anyway.
Franklin Gutierrez misses the age cutoff by less than 6 weeks; where would he have fit on the combined list?
So, who are the greatest players ever who were selected with compensation picks for FA signings? Enquiring minds want to know...
While I agree with the overall content of your comments (including the Hornsby comp), I have to quibble with:
There are other among A-Rod\'s comps who were declining slow-moving sluggers. Albert Belle fits into this group
I think your opinion of Albert is coloring your memory. He was 17-3 in stolen base attempts in 1999, his last full season. That doesn\'t look like a \"declining, slow-moving slugger\" to me.
Thanks for the reply, Kevin.
Drawing walks is a tool, not a skill
I suspect you meant that the other way around. I\'m willing to be convinced I\'m wrong about this, but it seems to me that developing unexpected power is at least as common as developing unexpected plate discipline as a pro. Sammy Sosa is pretty much the only example that springs to mind of the latter.
If you presented me with a .400 obp guy who can\'t run and a .365 obp guy who can fly, the second player is a better leadoff man.
That\'s precisely where we disagree. The difference between those 2 guys, over the course of a season, is that your #2-#5 hitters bat an extra 35-40 times behind the .400 guy. You can count on one hand the number of guys in the history of MLB who were good enough on the bases to make up that kind of deficit year after year in a modern run-scoring environment.
(Besides, if you lead the speed guy off, you waste part of the value of his speed, which is avoiding more outs by staying out of GIDP.)
I\'m writing English, and trying to say what I mean without lying. I tend to do that whether it\'s a dissertation or a text message. YMMV.
Saying that he\'d be a good leadoff hitter if he improves his plate discipline is exactly like saying he\'d be a good cleanup hitter if he started hitting home runs. It\'s not quite as bad as saying I\'d be a good goalie if I were 11 inches taller, but that\'s the flavor of it. It\'s misleading because it looks, at first glance, a lot like \"he has a starting pitcher\'s tool set, but needs to perfect some kind of changeup\", which is a much more plausible development.
[Bourjos\'s] tools are that of a leadoff man, but he\'s an impatient hitter
Is this really a BP article? Avoiding outs is the only leadoff hitter tool that matters, to a first approximation. We\'ve known this for decades now. Basestealing is a leadoff tool the way a powerful arm is a first baseman\'s tool -- great as a bonus, but irrelevant if he can\'t do the important things. Eddie Yost (who went 1-for-12 in SB attempts one year) was a much better leadoff hitter than Vince Coleman.
Require that proceeds from any sale of the team go first to pay back the public investment!?!? Now, why didn\'t I think of that...?
And most gained a benefit.
Prove it. No, that\'s too hard -- provide some reasonably compelling evidence for it. And no, \"he hit a lot of HR\" is not evidence, unless you\'re claiming that Babe Ruth and Roger Maris were juiced.
Where are all of those David Segui HR? We *know* he was a user...
I disagree with your underlying sentiment that PED\'s don\'t necessarily improve your performance.
It\'s called a \"null hypothesis\" -- you assume no effect, until you have actual compelling evidence of an effect. There isn\'t any such evidence. It\'s the \"clutch hitting\" debate all over again. Everyone \"just knows\" that using anabolic steroids to bulk up will cause you to hit lots more HR.
Even without data on who used, we can make predictions about what patterns we would expect to see in the data if a large subset of players were using a cheat that boosted HR ability significantly. So far, the people who have actually looked at the data using methods that wouldn\'t get you laughed out of a peer-reviewed journal have failed to find any such patterns.
You\'re assuming your conclusion -- namely, that there is a \"statistical anomaly\" to be explained, and that steroids explain it adequately. That\'s not science.
(My favorite version of this is the circular argument -- we know Sammy Sosa was on steroids because of how many HR he hit, and we know steroids increase HR because look what they did for Sammy...)
Both of you are wrong to use total HR, rather than rate, as a measure of HR ability.
If you plot these numbers:
Age HR freq
...you get a less-noisy-than-most curve that is reasonably typical, and shows no identifiable \"steroid effect\". 2003 is the year labelled \"27\".
...and this STILL isn\'t right, because you would need to normalize it for league HR rate, and then (if you think steroids are ubiquitous) you\'d need to find a control group that you could be pretty sure was \'clean\', and then...
I\'ll listen to the media on the subject of PEDs when they have performed (or at least understood) such a study.
My personal fantasy is that Cal Ripken gets outed at some point. After all, there\'s a lot more evidence that anabolic steroids could help you sustain a consecutive games played streak than that they could help you hit more home runs...
Eric Walker\'s site
The cheaters had a better chance to win.
This is the part there\'s still no actual evidence for, with the possible exception of being able to keep good players in the lineup more. There\'s still no credible data showing that anabolic steroids or HGH help you hit or field (or pitch) better. \"B-b-b-but look at his head!\" is not analysis.
If you want to crusade against cheaters who really did get an edge from their cheating, go after Gaylord Perry. If you want to crusade against players who threatened the integrity of the game, go after Gary Sheffield.
Keith, that\'s about the best layman\'s summary of a set of complicated procedural and legal issues as I\'ve ever seen. Simply outstanding.
[I was a vicarious law student (lived with 3 law students and tutored them on statistical standards of evidence) for 2 years, so I\'ve seen a lot of really impenetrable attempts...]
Every time a batter is out on a ball in play (or hits into a FC), it counts the same in DE and in FP -- plus one to both numerator and denominator. Every time a fielder boots a batted ball, it counts the same in both -- plus one to denominator, no change in numerator. The only outcomes they differ on are
(1) when a batter gets a non-HR hit, which counts in the DE denominator but not at all in FP.
(2) when a fielder gets an assisted putout or an assist/putout on the bases, which counts in the FP numerator and denominator, but nowhere in DE
(3) errors that lead to extra bases, but don\'t turn an out into a safety
But both (1) and (2), as rates, are pretty consistent from team to team. Non-HR hits per ball in play is what BABIP measures, and it doesn\'t vary much at the team level. Assisted putouts per BIP is closely related to the pitching staff\'s strikeout rate, which I suspect is generally similar among the top college teams. (3) is relatively rare, among errors.
So it shouldn\'t be a surprise that DE and FP are strongly correlated; DE is essentially FP corrected for irrelevant or \'easy\' chances and for BABIP above the norm.
I love the scouts-vs-PECOTA throwdown that Fausto Carmona represents.
PECOTA is very bearish: it projects him as about the 100th-best starting pitcher in MLB, with an EqERA of 4.76, a 25% \"Collapse\" chance, and an 18% attrition chance. Favorable comps Kevin Brown and Scott Erickson are offset by Zach Day and Andy Hassler. Upside is a very ordinary 55. In fact, PECOTA would rather have Todd Wellemeyer, Dana Eveland, Jesse Litsch, Oliver Perez, Andy Sonnanstine, John Maine, Paul Maholm, or Kyle Lohse going forward.
It would be great to see Carmona prove PECOTA wrong... but I don\'t think I\'m ready to bet that way.
So, does T-Bone Shelby\'s kid have a derivative nickname? \"Brisket\"? \"Porterhouse\"? \"Flat-iron\"?
Jack Cust can play first base? My DMB team wishes that were true, but he\'s never done it in 340+ major league games.
One piece of advice: what\'s valuable in a fantasy league isn\'t what\'s valuable in real life. Look at what earns points in your league, and use the specific predictions of those stats to guide your draft. VORP is useless in most fantasy leagues, because it doesn\'t match up well with where the points are.
So, Gil Meche has more upside than either Matt Garza or John Lester?
Some of the people posting to this thread need to remember something about these projections:
1. They\'re based on historical patterns among similar players. Historically, players with a career start (and body type) like Pedroia\'s were more often than not playing over their heads. Look at the cloud of comparables -- \"A lucky two first full seasons in the majors\" is a lot less rare than \"star 2B on the rise\". This projection gives him 1 chance in 6 of further breakout, and a 50/50 chance of being the player we\'ve seen (overall) so far in his career. Does that sound so wrong?
2. The variance in these predictions is huge, partly because the variance in real player performance is huge. What fraction of players end up performing between their 40% and 60% levels? Not as many as you\'d hope. Over a whole team, this mostly averages out over a season.
People throw the word \"breakout\" around a lot, and tend to mean that a \"breakout\" season reflects a new level of ability. But the vast majority of big improvement years are had by players who combined being actually improved with being lucky. Part of PECOTA\'s job is to separate the part that was new ability from the part that was luck; it does that by looking back in history to see how much of the new level \"sticks\", on average, for players of a similar type and established peformance.
The problem with this argument is that, year after year, they *don\'t* find them. DH has been an underperforming roster slot for most AL teams since it was invented. There\'s just no excuse for that.
I mean, Jose Vidro got more than 280 PA at DH last year. On the Blue Jays, their banjo-hitting infield of Eckstein, Scutaro, Bautista, and Inglett combined for about 60 PA at DH. If you did that in a Strat league, they\'d accuse of you of tanking in order to get a good draft pick next year.
Seconded. Or one of those bizarre eyelash appliances that my wife doesn\'t own.
The two topics that consumed the most electrons on rec.sport.baseball in the early 1990\'s were Ken Griffey Jr.\'s defense and Roberto Alomar\'s defense. Those were the two players with the most striking contrast between reputation and what the numbers seemed to be saying. In those days, there were no UZRs or probabilistic range distributions, but there was \"defensive average\" (DA) and its relative, the original Zone Rating.
After a lot of digging and observation, what everyone decided (if I recall correctly) was that Alomar was very weak at making plays to his right. He played shaded toward first, and make any number of spectacular plays behind him and to his left, but anything up the middle was a hit. The data saw that; the live observers simply noted \"ball out of reach, not his fault\".
I\'d be very curious to see what the retroactive play-by-play defensive evaluation of Alomar looks like.
I can just see new line of colognes from Tommy Hilfiger:
As you note, the 2B standard is skewed by the existence of 4 Immortals who were so far ahead of the rest of the pack. We don\'t want to penalize the Charlie Gehringers of the world for only being Charlie Gehringer and not Rogers Hornsby, but we also don\'t want to just ignore what\'s possible at the high end when trying to identify greatness.
Have you played around with eliminating outliers at the high end, as well as the low end, when computing your adjusted average for the position? That would also help in LF, where Williams and Henderson and Bonds would stop distorting the standard for \"worthy HOF inductee\".
More concretely: suppose we say that the center 50% of HOF JAWS scores defines what it means to be a worthy HOF inductee at each position. That corrects not only for horrible bad past choices, but also for Ruthian distortions. How do the positional standards change, and who becomes more (or less) worthy by the new metric?
\"One other tenet of sabermetrics holds that when a team does well in close games one year, it will tend to go in the opposite direction the next.\"
Not quite; that would be the gambler\'s fallacy.
What is true is that when a team does extremely well (or poorly) in close games in one year, it will tend to regress to the mean the next year. As with overperforming Pythag, there is a combination of randomness and actual ability here. Good managers can consistently overperform in both, by aligning the ability of the pitcher with the leverage of the situation.
Scioscia has been better than most managers at using his best pitchers in the highest-leverage situations. There\'s no reason to think he will suddenly forget how to do that, so you would still expect the Angels to outperform Pythag by a bit, and be over .500 in close games. But not by 12 wins.
Jay, can we get a list of the LFs who rank ahead of Rice in Peak, Total, or both? I\'d also love to see a list of the non-HOFers who rate higher than Rice in all of EqA, BRAA, BRAR, baserunning runs, FRAA, FRAR, and Rate2.
A) Yaz & O\'Leary. I saw R.J. Reynolds make some spectacular catches, too. That doesn\'t make him a great fielder. Ken Griffey Jr. proved years ago that being spectacular doesn\'t make you good. We know O\'Leary wasn\'t a great fielder; we know his numbers look a lot like Yaz\'s numbers, and a lot better than Rice\'s, playing the same position in the same park.
B) No, you clearly don\'t accept the negative aspects of Rice\'s record, because if you did you wouldn\'t think he was an obviously better player than the long list of guys (Smith, Singleton, Wynn, etc.) who didn\'t hit for as much power, but did those other things better. To assert that Rice is a \"borderline HOFer\", but that these other players are not, makes that point.
Fundamentally, you\'re disagreeing about how important things like OBP and defense and not hitting into double plays and hitting on the road were in his era. Those are questions of fact, amenable to analysis -- which WARP and Win Shares and such provide. You have rejected the conclusions of those analyses. If you have a grounds for rejecting them, other than that they don\'t agree with your prior opinions, I haven\'t seen it. I don\'t see how it\'s disrespectful or snarky to point that out.
If Jay were the one who builds the PECOTAs or the depth charts, and you were the only reader of the site, these might have been valid criticisms. Next time, try just not reading the stuff you don\'t want to read...
Personally, I think it\'s fascinating that (say) Ken Singleton was a significantly more valuable player than HOFers like Rice, Billy Williams, and Lou Brock. Unlearning the false is even better than learning the truth.
If the Green Monster is the problem, why don\'t Yaz and Troy O\'Leary also look bad in FRAA? Neither of them was much like Roberto Clemente out there.
It\'s not BP\'s fault that Rice had a low OBP, hit into a ton of double plays, disappeared on the road, struggled defensively, and was finished at an early age. In reality (as opposed to mythology), those things count too. Of course, if you choose not to see that, you won\'t be alone -- everyone who voted for Ryan Howard as 2008 MVP is with you.
There are still some of us around who think of Olbermann as a SportsCenter anchor who moved on and is now doing something else, somewhere. From that point of view, it could have been a LOT worse -- it could have been Berman.
I\'m astonished that nobody has made the obvious jokes here about what they\'re getting for their money, given what \"Kaka\" (or caca) means in most of Europe and the romance-language-speaking world...
Wrestlers take diuretics to be able to weigh in at a lower weight than their healthy, normally-hydrated weight. It\'s very much not good for them.
I\'ve been making my own Italian Beef sandwiches from scratch for 20 years now. I tend to prefer pepperoncini to giardiniera, too. But yeah, what a great sandwich that nobody in the rest of the country has ever heard of.
Dave \"from *Real* Illinois\" Tate
Which makes him different from Gaylord Perry... how?
\"He threatened the game\'s integrity. We know this.\"
Amazing. And here I thought there were no actual positive tests, no credible witnesses, no tangible evidence, and (most importantly) no reason to think that doing the things he\'s accused of would \"threaten the game\'s integrity\".
Gary Sheffield threatened the game\'s integrity; he\'s the one who deliberately played badly in order to get traded. Hal Chase, Joe Jackson, Pete Rose... they threatened the game\'s integrity. Gaylord Perry, too, while we\'re at it -- where\'s the angst over *his* election to the HoF?
The problem with voting for Rice is that, to be consistent, you now have to vote for another 50 players who aren\'t in, and will never get a vote from anyone else. Ken Singleton, Roy White, Reggie Smith, Chet Lemon, Jimmy Wynn, Cesar Cedeno, Jack Clark, and I\'m still on mostly-contemporary outfielders. That\'s how far below the established standard Rice falls -- accepting his level of play as \"deserving\" would double the number of outfielders enshrined. We\'re not talking borderline cases here. We\'re talking about a player whose weaknesses aren\'t understood and strengths are drastically overrated by the voters. Joe Carter on (if you\'ll pardon the expression) steroids.
\"Rice\'s \"non-roid\" numbers now look excellent.\"
It never ceases to amaze me that people say these things, and apparently believe them.
Let\'s grant for a moment that everyone since Don Mattingly is a proven cheater. Let\'s grant that steroids somehow help you hit the curve ball and lay off tough pitches. Let\'s ignore the ton of evidence Eric Walker has assembled about what really caused the offensive explosion.
EVEN AMONG HIS CONTEMPORARIES AND PEERS Jim Rice\'s record is nothing special. A vote for Jim Rice\'s numbers demands a vote for Reggie Smith and a vote for Jimmy Wynn and a vote for Chet Lemon and a vote for Roy White and so forth. If Jim Rice now looks \"excellent\", so do they.
As a *real* legend and feared hitter once said, \"You could look it up\".
...and postgame analysis by Dr. Dave, who was born the same day (and year) as Mark Portugal.
You said: UZR attempts to strip the luck out of the equation to measure \"how many runs did fielder save over the average fielder?\"
If that\'s true, then it isn\'t answering the question \"how many runs...\" at all -- it\'s trying to say \"how good a fielder is X?\". That\'s an interesting and important question -- but not at all the same thing.
It\'s like the difference between APR and WXRL for relief pitchers. One of them takes into account leverage; the other doesn\'t. APR tells how good your induced outcomes were; WXRL tells how many wins that was worth. Very different questions.
You could make a good argument that MVP (for example) awards should be based on context-dependent (including luck) outcomes, rather than ability or raw performance. Value-added metrics that include getting lucky and taking advantage of increased opportunities aren\'t predictive, but they *are* descriptive.
More analysis based on data mining the Pitch/FX data. That\'s the future arriving.
Cross-year queries in the sortable stats.
Not just VORP -- translated platoon splits. I know there\'s a problem with getting the data for old-time seasons, but Retrosheet is getting there...
To tie in with the Fantasy League request above: how about Strat- or DMB-style player cards, with normalized/translated rates of various outcomes?
Yes, please. I\'ve been asking for this forever.
\"It seems that Joe Carter was doing what was valued by contemporary sportswriters and GMs, even if those who read the Baseball Abstracts knew that Wade Boggs was the vastly better player.\"
Exactly; couldn\'t have said it better.
Of course, that still leaves the question of whether the HoF is supposed to be for the best players, or for the most popular/admired/lionized players. The answer to that isn\'t obvious, and a case can be made both ways. It would be nice, though, if people would be consistent about which they favor.
Oops, small typo -- Rice\'s best season was 9.7, not 9.2.
Turns our Rice\'s offensive career numbers are almost identical to another guy with a food name. Check out the translated career totals for Chili Davis, and compare them against Rice... Pretty eerie. The Ellis Burks comparison is unfair -- 1100 fewer AB, for one thing -- but Chili is an excellent comp.
\"[...]Jim Rice was among the top five in his league in MVP balloting six times. Where that objectively ranks is overlooked, though.\"
It doesn\'t objectively rank anywhere. It\'s not an objective thing; it\'s an averaged subjective opinion of a group of writers who (as others have mentioned) have a terrible track record at recognizing value or the lack thereof.
Let me turn your argument around. Jim Rice has way more award shares than Carl Yastrzemski or Rickey Henderson or Wade Boggs. All of those players were obviously vastly more valuable than Rice, 2 at the same position and 2 in the same city and park. Boggs was Rice\'s teammate, and the voters couldn\'t even tell which of them was doing better. Indeed, Joe Carter got more career MVP shares than Wade Boggs. That\'s pathetic.
Conclusions: MVP shares aren\'t particularly well correlated with value.
There was nothing about Rice\'s era that made it hard for players to stand out offensively. Rod Carew put up WARP3\'s over 10 in 4 years out of 5 from \'73 to \'77, and 9.2 in the remaining year. George Brett did the same from \'76 to \'80. Reggie Jackson *averaged* 9.6 WARP3 from \'71 through \'76 -- that\'s higher than Rice\'s best season ever, for six years, and it doesn\'t even include Jackson\'s best year. Don Mattingly averaged 11 WARP over \'84-\'86. Boggs averaged 11 (!) WARP for a 7 year stretch, playing in the same park.
Even Ken Singleton, not much of anybody\'s idea of a hall of famer, averaged nearly 9 WARP over his first 7 years in the AL, \'75-\'81. Remember, 9.2 was Rice\'s best year. \"A peak almost as good as Ken Singleton\'s, then nothing\" is not a hall of fame career. Or, to put it more positively: if you think Jim Rice is a deserving Hall of Famer, then you also think that Ken Singleton is a deserving Hall of Famer. And Albert Belle, and Jimmy Wynn, and Reggie Smith, and Paul O\'Neill, and ...
There are three main problems with the current system, as I see it:
1. Many of the voters don\'t know enough about performance evaluation to judge how good players were.
2. Popularity thresholds for staying on the ballot give voters an incentive to vote for undeserving players, in order to keep them on the ballot.
3. The psychology of voting for players who have lingered on the ballot for a long time encourages sympathy votes.
The first problem will only be solved by the gradual attrition of the old voters. They aren\'t going to change their minds about the importance of RBI or the defensive prowess of Larry Bowa; the best we can hope for is that they go away. The correct policy change to encourage this is (no, not euthanasia) term limits for HoF voters. You get a decade as a voter; that\'s enough.
The second and third problems can be fixed by Joe\'s approach -- get rid of the threshold, and drop everyone from the ballot after X years to keep the ballot manageable. But you need to compensate for problem #1, at least for a while. For that, you need some equivalent of the Veterans Committee, as much as I loathe their past work. Maybe the new VC would not elect HoF members directly, but would instead collectively nominate ineligible players, who would go back onto the eligible list for 3 more years from that date. It would be amusing to see a couple of decades of the voters ignoring Ron Santo or Bobby Grich, the VC making him eligible again, lather, rinse, repeat. (It would be better to see them both elected, of course.)
McGwire is suspected of something that it is suspected (or, rather, assumed) would have affected his performance. There\'s no evidence for that, either -- but nobody in the press or public wants to believe that. I mean, it\'s just \'obvious\' that muscles = HR, right?
Still need that link to Eric Walker\'s steroids research site on the BP homepage.
I\'ll second the nod to Gary Huckabay -- without his dream and drive, BP wouldn\'t have happened, or wouldn\'t have happened effectively. Take a bow, Gary -- this BBWAA\'s for you.
And Doug, I hope you\'re watching. We haven\'t forgotten.
\"but getting the hit actually gets the run in\"
Yes, and making the out actually sends your pitcher back out to the mound minus that run. You seem to think that \"getting the hit\" is the alternative to \"taking the walk\"; it isn\'t. \"Making the out\" is the far more likely outcome, for every player, especially when the pitches he\'s getting make \"taking the walk\" an option.
To a first approximation, run scoring and run prevention are equally valuable. In the range of runs a MLB team is liable to score or allow, the difference between runs scored and runs allowed is a pretty good predictor of winning percentage. (Things like Pythagenport are better, but not by a huge amount.)
If you think you can quantify how many runs above average a guy is on defense, and how many on offense, adding the two is a good measure of overall contribution to winning. Historically, the trick has been getting at that estimate of defensive value.
So, if you have one shorstop (call him \'Derek\') who is typically +28 on offense and -23 on defense, he\'s worth about half a win versus an average SS. If you have another guy (call him \'Davey\') who is typically -5 on offense and +10 on defense, he\'s helping you win just as much as Derek is. He\'s also probably a lot cheaper.
OK, I found a decent comp. Compare Mike Cameron through age 25 with Gutierrez. I\'m not a huge Cameron fan, but he hasn\'t been a bust.
Agreed on Gutierrez\'s defense. He also has stretches of totally destroying left-handed pitching, which is more useful than being mediocre against everyone. And he\'s still young enough to bloom a bit. Of course, this is also wishcasting -- I own him in a DMB keeper league.
Is there a reason why his PECOTA card doesn\'t have a \"comparable players\" section? Baseball Reference gives his closest comps (at this age) as people like Matt Lawton and Jeffrey Hammonds, but those are (a) based only on offensive totals (using BA but not OBP), and (b) not adjusted for park, era, or anything else.
Fan-damn-tastic. Christina, for you especially, I\'m extremely pleased.
So, how many games per year are you required to attend in person? :-)
Good for Kubek. My grumpy curmudgeonly stathead self no longer agrees with him on much of anything, but those broadcasts with Joe Garagiola did more to make me a fan than anything else, except maybe Grandpa the Cub Fan.
Chris Britton was the sleeper star of my Diamond Mind bullpen for the 2006 season. I couldn\'t believe how the Yankees failed to use him; it\'s not like he was blocked by quality.
For me, I look for K rate, low opponent OPS, and a small platoon split. As roycewebb noted, that last one is hard to find lately, with lots of extreme platoon splits even in 2007. (And no, I can\'t see how PEDs could possibly be related to an increase in platoon splits...)
Of course, based on my favorite indicator stats, my crystal ball would have predicted a breakout last year for Santiago Casilla. Ooops.
All three of you beat me to it.
Maybe the BP home page should have a prominent link to Eric Walker\'s research: http://steroids-and-baseball.com/
If a player doesn\'t get in during the 15 years of eligibility because the voters during those 15 years were the same senile idiots, then yes -- they might really be a HOFer. Ron Santo, Bert Blyleven, Bobby Grich, and I fear someday Tim Raines, are in no way diminished because Bobby Doerr thinks RBIs are the key metric and that nobody has played the game right since 1961.
OK, you\'re saying they have access to DATA that outsiders (or even other teams) don\'t have. That makes perfect sense -- but that\'s not \'work\', to an analyst. Work is what turns data into information into knowledge. Knowing how to mine and refine actionable knowledge out of masses of raw data is what makes a good analyst. Other things being equal, it\'s certainly true that mo\' better data will support mo\' better models and inferences -- but other things generally aren\'t equal.
Now that Dan Fox won\'t be calculating Simple Fielding Runs for the public any more, will someone else at BP take up his methods and run with them?
Which part of that was unclear? The MVP voters have, traditionally, not known their asses from their elbows. They look at silly teammate-dependent things like RBI totals and W\'s, and ignore actual valuable abilities like not making outs and getting the other guys out. When they get it right, it\'s by accident.
But of course, it\'s all irrelevant. When you\'re not even *allowed* to vote for Bobby Grich, but you can vote for Ron Gant, the system is too broken to ever hope to fix it.
Maybe it\'s just me, but it seems like the Dodgers\' best bet with Jones might simply be to put him out there in CF every day and hope he returns to something vaguely resembling his past. They need a real CF, it\'s a sunk cost, and it\'s just as likely to be useful as any of the free agent options. Last year\'s performance is no more relevant than last year\'s salary. What does PECOTA expect from him, I wonder?
Good call, Will. Stick to your guns.
Rumors are the opposite of information -- the hearer knows less after hearing the rumor than she did before.
Yeah, Joe P\'s mystery comp makes a much better case:
Davenport Translated Pitching Stats:
Pitcher A: 3256.3 IP, 3.90 ERA, 1.12 WHIP, 2162 K, 574 BB
Mussina: 4218.7 IP, 3.66 ERA, 1.13 WHIP, 3215 K, 686 BB
Pitcher A is Juan Marichal.
Oh, and FWIW -- WARP3:
This is a no-brainer; Marichal\'s only advantage was that he had two tremendous seasons that were a couple of wins(each) better than Mussina\'s best single year.
Goldman beat me to it. \"YMCA\" and \"Disco Duck\" were \'campy\'; \"Short People\" was not.
This is why, when talking about pitching stats, I always say \"Ws\" (dubyas). I reserve the word \'wins\' for what teams do.
The Reds won; Cueto was credited with the dubya.
This is a great example of how WARP informs MVP decisions, but isn\'t the whole story. Any of the top 5 in WARP is a legitimate choice for MVP, depending on what you think of the relative importance of starting pitching vs relieving, rate vs totals, pitching vs hitting, positional value (beyond adjustments for league average), replacement level vs league average as the standard for comparison, etc.
Question: Rivera\'s WARP1 is 10.3, but his WXRL is only 6.2. Is WXRL relative to league average, rather than to some estimated replacement level? Is a 6.2 WXRL worth more than 10 wins above replacement level? Given Rivera\'s high leverage score, I\'d expect his context-dependent stats to be better than context-neutral stats like WARP.
No, I don\'t think you could say that. The AL is prone to all sorts of weirdness, like making it a \"Most RBIs\" award.
If you\'re serious that you want the award to be for \"greatest actual contribution toward winning, even though that depends on how well his teammates played\", then you should look at the measures like WXRL that calculate exactly how much that individual player\'s contributions increased the team\'s expected win total, given the situations the player hit and pitched in. That will reward players for getting lots of PAs with men on base, pitching well in close games, etc. Baseball Prospectus doesn\'t calculate any such \"value added\" stat for offense, but other people do. But beware: it might still tell you that the biggest contribution to winning came from a guy on a last-place team.
Once you go beyond that, to say that contributions only matter if the team succeeded in the end, you\'ve stopped handing out an individual award. We already have an award for teams; it\'s called \"making the playoffs\".
Great interview, Will. Costas has always been a class act, and a voice of reason and moderation.
The problem with Huff is that he played fewer than 60 games in the field, and not so very well when he played. It\'s extremely difficult for a DH to get MVP support from me, because I have to account for the opportunity cost of putting some someone who might be a real liability on the field or in the lineup. David Ortiz was a great hitter for a couple of years, but to get that you also had to live with Manny in left (argh), where normally Manny (or someone like him) would have been your DH.
Similarly with Huff -- if he\'s not playing the field, you\'ve got a Kevin Millar or (Lord help us) Brandon Fahey or somebody like that in your lineup, dragging the offense down, and that hurts the team just as if you\'d made those extra outs yourself.
Which electorate? The real question is how Aramis Ramirez, with his .297 EqA, won the Hank Aaron award over Pujols, Utley, Berkman, Ramirez, etc. Anyone who seriously thinks he was the \"top hitter in the NL\" in 2008 knowns nothing about baseball.
Oh, wait, I see -- 30% of the vote is unmetered online fan balloting. A ballot-box-stuffing contest, in other words. All the relevance of goldfish eating, without the belching.
Gotta agree with buddaley here. No current Oriole starter will be a factor on the next good Orioles team. Adding cheap veterans to fill out the roster is one thing, but wasting money on expensive low-future free agents like Furcal or Cabrera would be insane.
Huff must go; his year screams \"fluke!\". Roberts, much as I love him, is too old and too valuable on the trade market to be kept. (They really really needed to trade him last year when he was even more valuable.) If they can acquire some upside for Melvin Mora or George Sherrill, they need to do it.
They might also try stockpiling failed uberprospects who are still young enough to pan out. Andy Marte could certainly be had for cheap. I wonder if KC has given up on Alex Gordon or Mark Teahen yet...
I\'m afraid my hockey fandom started with Bobby Hull and ended with Mario Lemieux.
This was the solution I decided I would use in Maddon\'s place -- have David Price treat the remaining 4 innings of Game 5 as a starting pitching assignment, and go all the way with him if he\'s on.
Delaying? All of the games so far have been scheduled to start at 8:37 Eastern Daylight Time. If this one actually DOES start at 8:37, it will be the earliest start yet.
If you\'re not going to pick Alexander, with his 376 innings of 1.22 ERA (league avg. 2.75) in the Baker Bowl, why include the 1915 squad in this exercise at all? It was a peak season from one of the best ever. Carlton was great in \'80, but not THAT great.
\"This is what we have to emphasize next year in spring training, scoring runs with outs\".
Both games have been strongly affected by the umpiring. In game 1, Utley\'s HR came after he had already struck out on 4 pitches -- check out the Pitch F/X sequence. The balk, the pickoff, other outrageous called balls/strikes, the inconsistent checked swing rulings... Most of those broke the Phillies\' way in Game 1, and wildly the other direction in Game 2.
BTW, there was nothing debatable about the missed balk call. Hamels stepped almost directly toward home plate -- not a full stride, and without turning his toe toward home, but all that matters is the direction of the step from where he picked the foot up to where he put it down.
I almost commented on this as well. Certainly if you\'re going to count PA\'s that end in a K, you should also count the ones that end in a walk or HBP -- that\'s clearly relevant to \"nibbling\", and those PA should count against the pitchers who so thoroughly waste an 0-2 count.
Whether other PA\'s should be included is less clear, and depends on exactly what you\'re trying to measure. Guys who give up a lot of HR after 0-2 counts are clearly not nibbling, but is that a good thing? For the rest, I\'m willing to believe that BABIP is less pitcher-independent than normal after an 0-2 count, but someone would have to actually come up with the numbers to prove it.
I think your \"nibble number\" can\'t distinguish between pitchers who don\'t have an out pitch (e.g. Dan Wheeler) and pitchers who nibble (e.g. Chad Gaudin). You might want to look at the number of balls after 0-2, rather than the total number of pitches -- the guys who get fouled off repeatedly aren\'t nibbling.
I\'m basing my opinions on Pitch F/X at MLB Gameday, not the TBS broadcast graphic.
That was my reaction at the time; I said \"makeup call\" to my wife when Bay was rung up.
Gotta disagree with kabaret and agree with Joe -- the ball and strike calls are just embarrassing when everyone has PitchF/X at home. Let the machine decide whether the ball passed through the strike zone; humans can\'t do that reliably.
Almost as embarrassing was the commentary on pitch location from Buck Martinez and Ron Darling. Invariably, 10\" off the outside corner was \"on the black\", right at the navel was \"missed way high\", and on the inside corner was \"right down the middle\".
(Or, in the case of Varitek\'s HR, low and on the *outside* corner was \"right down the middle\". It was a good pitch; you might expect Vlad Guerrero to have a decent shot at launching it, but not Jason Varitek.)
Yeah, arguably. It was an atrocious call, but there have been a lot of really, really putrid calls in the history of sports. Even restricting ourselves to MLB, the Merkle call was probably more egregious, and just as high-leverage in the end.
\"Youkilis is a below-average third baseman now.\"
Joe, you\'ve said this a couple of times now -- what are you basing it on? Youkilis had an absurdly high Rate2 of 117 at 3B this season, and his RZR of .729 is essentially identical to Evan Longoria\'s. All of the evidence I can find says that he\'s not only an excellent 3B this year, but farther above average at 3B than at 1B at the moment.
I\'d have guessed Tycho from \"Penny Arcade\", myself.
As far as gibbering despair goes, this particular prose style isn\'t so much due to Poe as to the later trio of H.P. Lovecraft, E.R. Eddison, and Clark Ashton Smith...
Chase Utley suffers from Bobby Grich Syndrome -- does everything very well, but no one thing well enough to make people notice, and not in New York. He\'s been peak Bobby Doerr or Joe Gordon for the past 4 years, but nobody in the press has noticed.
\"Joe would have you believe that you\'re just looking at random rolls off a Strato card but in reality you\'re looking at the performance of human beings.\"
Joe would (as I interpret him) have you remember that you\'re looking at the performance of that tiny fraction of human beings who had so much ability, physical and mental and emotional, that they made it to the major leagues. This is NOT a random sample of the population at large, and there\'s no reason to think that they will show the same distribution of \'character\' traits as the population at large, any more than they show a typical distribution of physical traits. You might as well argue that lots of major league players are probably women -- after all, you see women around you every day, it\'s perfectly natural for someone to be female.
Treating players as if they were stratomatic cards does a remarkably good job of predicting how they\'re going to perform in the future. That doesn\'t mean they\'re robots; it just means that they\'re selected in part for NOT being the kind of people who respond to that kind of pressure the way you or I might.
You can get sortable VORP, SNLVAR, WXRL, etc. through the \"Statistics\" link at the top of the page. You can even roll-your-own \"WXRL for rookies\" report, if you want. None of the defensive stats are in the sortable database, though, nor is any flavor of WARP.
(Nor can you do cross-year queries unless you\'re a BP staffer, but that\'s a different complaint...)
It really would be nice if you would post WARP1 (or FRAA or SFR) leaderboards for the recent season *before* the voting closes for the Internet Baseball Awards. Looking up the individual DT cards of every player or rookie with a decent VORP is awfully tedious.
He\'s a GG-caliber DH. :-)
On Youkilis\'s DT card, his Rate2 stats are:
LF: 125(!) in 36 games
3B: 114 in 154 games
1B: 105 in 396 games
That\'s a very small sample in LF, but certainly an outstanding performance. His 3B totals are also outstanding -- Brooks Robinson only had 2 seasons better than Youkilis\'s season-equivalent so far. The FRAA numbers don\'t show Youk as such a godly 1B, but I\'m not sure to what extent they account for receiving skills.
You hit the key right there -- Twins can\'t hit lefties. Haven\'t all year, didn\'t tonight. No idea why that was never mentioned in the chat.
Sorry; I figured that if he didn\'t recognize Joe\'s comment as sarcasm, he might pick up on my more blatant version. Of course, OTINOCHYI (On the Internet No One Can Hear Your Irony). My bad.
Yes, certainly, the fact that the top WARP1 on the Angels this year was Torii Hunter at 7.9 means Joe hates the Angels. Good catch.
I understand what you\'re saying, Joe, but Neyer\'s comment is explicitly a context argument -- closing out games in save situations. WXRL just refines that a little bit, by factoring in how difficult the save opportunity was.
I have more sympathy with context-dependent value-added metrics for MVP voting purposes than for just about any other purpose. You don\'t have to believe that they\'re predictive or have any moral value -- they just tally up how much a player was able to do in the situations he was fortunate enough to be in. That\'s a defensible approach -- as is going purely context-neutral and refusing to give players credit (or blame) for how well their teammates set them up.
\"The Lidge pick seems out of place, but I\'m swayed by Rob Neyer\'s notion that Lidge\'s perfect season in save opportunities means he was some number of wins—five, six?—better than even a good closer.\"
Actually, BP has a stat designed to answer exactly that question -- WXRL. Lidge\'s WXRL of 7.6 (!) as a closer means that the Phils won 7.6 more games that he (mostly) closed than they should have expected to win with a replacement-level closer. That\'s an upper bound on how much better than an average or good closer he was. 5 or 6 vs average, 3 or 4 versus a good closer -- that sounds about right.
You can pretty much compare that directly against WARP1 or VORP/10 as a baseline, then let the narrative nudge things around from there. I\'d throw in a little extra value for wear and tear saved on the rest of the bullpen, stability of roles, etc.
I hope you guys are going to post the final leaderboards for WARP1... That\'s still the biggest hole in the stats available at the site.
Arguing that Howard deserves anything based on his performance in \"high leverage AB\" is ludicrous. His batting line in 100 \"late and close\" situations was (I am not making this up) .160/.309/.340. Howard did NOT \"produce in those AB\". The memorable hits in September don\'t change what he did (or didn\'t do) in April - August.
This is why we keep records: because memory is simply not reliable for this sort of thing.
\"The advanced metrics do not support Howard\'s status as an MVP candidate—he is 31st in the NL in VORP (33.0)—but it is hard to deny that he has played a huge role in the Phillies being able to rally past the Mets in September to win the NL East for a second straight year.\"
It\'s also hard to deny that he played a huge role in the Phillies _needing_ to rally past the Mets. Why doesn\'t that count, too?
Chase Utley is the Rodney Dangerfield of baseball. Every year, he\'s the most valuable player (by a lot) on his team. Every year, some lesser teammate ends up the writers\' favorite irrational candidate for MVP, with Utley left completely off the ballot (as above). Sheesh. Can you seriously argue that Howard has been the 2nd-most-valuable player in the league, but the guy next to him who has been 3 wins better offensively (and another 3 wins defensively) at a tougher position, who doesn\'t vanish against left-handed pitching the way Howard does, isn\'t even in the top 10?
Some non-flabby photos of Prime Boog can be seen at:
Gotta disagree with the characterization of Boog Powell and Mo Vaughn as \"blubbery\", at least if we\'re talking about them at the same age that Howard is now. They ended up blubbery, yes -- but that\'s part of the point about Howard.
I\'m trying to think of a player who was as big as Howard at a comparable age, and kept his value well into his 30s. I can\'t. Ryan\'s no Sam Horne or Cecil Fielder, but he\'s still heavier than (say) Frank Howard or Willie Stargell was at that age.
On a slight tangent, both the \'73 Cardinals and the \'77 White Sox had 3 outfielders named \"Cruz\":
Hector, Tommy, and Jose Cruz (\'73 Cards)
Tommy, Jose, and Henry Cruz (\'77 Chisox)
I don\'t know whether all 3 of them ever played at the same time in either case. Probably not, given how few games some of them played.
Disparity between WXRL and FRA is a good thing to look at. Another way to get at the same thing would be to look at the correlation between FRA and Leverage for the pitchers on each team. Good reliever usage means very negative correlation between FRA and Leverage.
I think you\'re right that the way the Save definition (I can\'t bring myself to call official scoring definitions \'rules\') has corrupted reliever usage patterns is a huge part of why it grates on analysts. It\'s even worse than W\'s and RBI\'s, with the added shame of having been invented and adopted at a time when we should have known better.
That said, I think you\'re confused about what SABR is. This is probably Bill James\'s fault; the term \'sabermetrics\' was his own pernicious contribution to future misunderstanding. SABR is an organization of mostly older guys who love to talk about baseball, and especially baseball history. If you were to kidnap a random SABR member, you\'d be much more likely to find someone who is fascinated by the Negro Leagues or baseball in the 19th century, and who thinks the guy with the most RBI should be the MVP, than someone who knows what VORP is and ignores pitcher W\'s.
But, yes, Mariano Rivera is the epitome of pwnership.