CSS Button No Image Css3Menu.com
New! Search comments:
(NOTE: Relevance, Author, and Article are not applicable for comment searches)
Yeah, this article felt like preaching to the choir some more. Then again, I didn't see the Tweets.
From looking at the graph, ages 34 plus could be .292, a three-point difference. Agree about the survivor bias. Just saying that the graph didn't automatically make your argument for you, IMO.
I believe you, but here's how I did my math. Let's say a full-season pitcher incurs about 600 balls in play in a year. 5% of 600 is 30 outs that become hits, true? Apply half a run (difference of hit vs. out) to each ball, that's 15 runs. 15/200 innings times nine innings is 0.675 runs--I conservatively said half a run.
What's wrong with my math?
Okay, thanks. But five points of BABIP is something--could be half a run in ERA over a full season, right? And I understand variation, but this is a big sample.
I don't understand. Doesn't the graph show a downward trend? I know survivor bias and all that, but doesn't the basic graph exhibit Ryan's trend?
What's your charge?
The only part I didn't really follow was Proof #2. I think you created an xBABIP based on where the ball was hit and compared that to actual BABIP, or something like that. But did you try to account for a skill in which a pitcher induces *where* the ball was hit. I guess I just don't understand how that analysis was pulled together.
Also, what's a logged odds ratio? I understand logs and I understand odds ratios, but what are you doing when you put them together? Perhaps there's an article somewhere where you explain it?
BTW, fantastic job. That said, whenever I read one of these studies that debunk DIPS theory, I'm still always struck that pitcher BABIP is dang hard to predict. Which, to me, is all that DIPS theory is.
Looks like a really nice job, Russell, but I'll need several days to digest it to be sure. ;)
Just re-read the entire THT Annual piece. A couple of guys who stood out across multiple approaches were Frank Crosetti and Bobby Tolan. Mort Cooper deserves a mention, as does Gene Moore. Best teammates ever!
In the 2010 THT Annual, I did something similar (not in terms of the math!) in which I looked at players whose teams had most outperformed expectations. I did this several ways:
- Their teams' Pythagorean variances over their careers (Ruben Sierra is a great clubhouse presence by that measure)
- Their teams improved the most compared to the year before the player joined the team (All sorts of biases, but Dennis Cook), and
- Their teammates most outperformed their projected WSAB totals (previous three years regressed). Smokey Joe Wood and Mort Cooper were the men.
I called it Luck, but who knows? Maybe Smokey Joe Wood was a great clubhouse presence.
Nice job, Ben. You definitely paid better attention than I did.
Yes, allowing more than ten votes makes sense. Given the current logjam, I also like loosening the 15-year and minimum voting rules.
Thanks for adding this, Joe. In case folks missed it, here is the link for the petition:
Never mind, I see your answer below. I like the distinction between multiplicative vs. additive park factors for runs vs. components, respectively. Makes intuitive sense, though part of me still wonders if additive is best for runs as well.
thanks, Colin. Perfect. And great point about multiplying park factors to negative RAA. Have you thought much about additive park factors?
Great article, Colin. If you don't mind, a couple of math questions.
Why introduce lgRPA the way you do? Why not just divide by parkadjust? Mathematically, isn't it the same thing?
By absolute linear weight runs, do you mean runs based on zero instead of average? If you were to use linear weights centered on average, would you approach the math differently?
Man, Russell, you're into purging your old ghosts, aren't you? Don't be too hard on yourself! You also might be interested in Bill James' recent column about the old bullpen by committee controversy with the Red Sox (it's on his subscription site).
In the meantime, I have a simple test: will managers use their closers in the ninth inning of tied games? I don't see any reason for them not to--it's keeping with the one-inning closer role. And yet a tied game in the ninth is higher leverage than a two- or three-run lead. Last time I looked (a couple of years ago) few of them did.
To me, that one change would go a long way toward finding a balance between the sabermetric bliss and the real world.
Phew. Nicely done, Colin. But who, exactly, are your people???
BIS addressed the bias claim in the latest Fielding Bible. They don't believe it's been an issue for several years.
I agree the batted ball data isn't perfect, and we all should continue to point that out as appropriate. We can even throw in some snark if that's our writing style.
But persistent trends can be spotted and interpreted in the data. We can have a reasonable level of confidence saying that that so-and-so tends to hit more line drives. We can't be 100% confident, but so what? As long as we interpret the data correctly, we've gained something.
30 years ago, we had nothing. I can't believe how spoiled people are! ;)
Oh yeah. Just had to say something, ya know. Someone's got to speak up for the poor, oppressed batted ball data. There's some good stuff in there!
We still don’t have good batted ball data, of course...
IMO, what we have is pretty good and worthwhile for a lot of analytic purposes. It's just not clear at what level the usefulness of the data breaks down.
Nice one, Russell. I must say that you rarely came across all that cocky when you were writing. And, IMO, the people who were the most one-sided about DIPS were the ones who hadn't studied it closely.
Pat, the first 100 pages are a description of the system and the last two-thirds are basically a Win Shares encyclopedia. In between those pages are some excellent essays, in which Bill talks about things he discovered while creating the system.
As someone who spent a lot of time understanding Win Shares, I have to say that I learned a lot by doing so. And give Bill credit here. He systematically described his system in detail, so that others could read it, analyze it, critique it, whatever. In fact, I still understand Win shares better than I do WAR or WARP, because Bill is so good at explaining things.
When Colin says "One of the most instructive failures we can look at is the madness of King James the Bill" he is way out of line, IMO. Bill started changing his system almost immediately after he published it. Bill's thoughts and work are always evolving, and he's to be complimented for it, not denigrated.
Right, bhalpern. That series blows away every other LCS, even though it lasted only five games.
The 2011 ALCS is 17th among all ALCS's, out of 42. 2004 is 12th. 2003 is first.
My recollection is that Guy doesn't buy the fundamental premise of WPA--assessing impact in linear "real time." I'm pretty sure you and he see eye-to-eye in that regard.
You're welcome. I appreciate the lengthy quote, particularly coming from you. :)
Just to detract from the warm feelings a little, while I don't disagree at all that WPA overrates relievers vs. starting pitchers, I do think there's value in using WPA/LI to assess reliever usage patterns.
Hey Colin, this is nicely put:
"What the win expectancy model is truly capturing is not how much a play contributes to team wins, but how well an event predicts the outcome of the game itself."
A fine distinction, but one that's worth chewing on.
Ah, okay. Speed is better than angle. I missed that. Why is that, though? From the graphs, it would appear that angle is as important as speed. Or is it that speed is more predictable than angle?
I guess I love all the data and analysis, but I'm not sure what the take-away is.
So, to summarize what I think is your main point (if I may), we can better predict a pitcher's future (or "true") BABIP by looking at the underlying batted ball characteristics (speed and angle off bat) of his previous batted balls (as opposed to looking at previous BABIP).
This is exactly what people who have played with batted ball data (such as ground balls and line drives) have found, but you're looking at the the underlying physical data instead of "observed" batted ball data.
Have you shown that your approach is better than the batted ball approach? I'm not doubting it, just wondering.
Thanks, Mike. I think I understand that. :) It was basically a way of "normalizing" speed off bat, unless I miss your point.
By the way, thanks for referring to Brian's article from the THT Annual. This general finding (that GB pitchers allow more line drives) is in agreement with David Gassko's analysis from 2006:
Interesting stuff again, Mike. I need to digest this, but there's something I just don't get. I think you say your looking at a subset of batted balls:
"It is also interesting to look at the batted ball speed in the plane inclined by 12 degrees above the horizontal, the launch plane for which the ball is most likely to become a hit."
In the data, you show an .800 BA on those balls. But in the hSOB graph for 12-degree balls, BA only reaches .800 over 100 mph. So I don't get what we're looking at here.
Interesting, Mike. From a physics perspective, does it make intuitive sense that the pitcher would have more influence on the speed of a batted ball, since he initiates the pitch and the batter reacts? I wonder what the correlation between pitch speed and hSOB is?
Also, the next question that would occur to me is whether the batter/pitcher interaction has an impact on Batting Average. That is, batting average may not follow the line graph published above on a batter/pitcher specific basis. I assume that's something you'll touch on in the next piece?
By the way, I should have said that I agree with your idea of just using hues with of a single color in things like heat maps.
Colors are tough. Personally, I stick to ROYGBIV as much as possible and avoid lighter/darker hues. In the boxes above, the blue boxes stood out most to me, so my first inclination was to think that they were the highest/best zones for batters. Turns out they were worst.
I'm partially color blind, so I may be a bad interpreter of this sort of thing.
Awesome, Mike. Sidenote: I was just saying yesterday that I don't trust heat maps, and you've nicely articulated why.
BTW, now that I've wasted your afternoon with many comments, I should say nice job, Derek. Regressing against pitchers with similar flyball rates is an elegant solution.
I need an editor. This...
FB/BIP "stabilizes" in about 100 BIP, while IF/BIP "stabilizes" in 288 BIP. I assume this is almost entirely due to the fact that there are many less FB than IF in a group of BIP.
should say this...
FB/BIP "stabilizes" in about 100 BIP, while IF/BIP "stabilizes" in 288 BIP. I assume this is almost entirely due to the fact that there are many less IF than FB in a group of BIP.
I'm going to attempt to recap what I think I found here. Sorry for taking up the space, but I'd be interested if you think I got it right.
FB/BIP "stabilizes" in about 100 BIP, while IF/BIP "stabilizes" in 288 BIP. I assume this is almost entirely due to the fact that there are many less FB than IF in a group of BIP.
0.07*FB/BIP "stabilizes" against IF/BIP more quickly than IF/BIP itself does (216 BIP). I assume it has something to do with the relative frequency of FB and the tight relationship between FB and IF. If you take anything that has a strong relationship with another thing--and that second thing happens a lot more often than the first--you will naturally have an equation that "stabilizes" more quickly. This is probably natural mathematics.
IF/FB takes longer to stabilize (414 flyballs). I assume this is due to two things. One is again the low rate of infield flies, but exacerbated by the fact that we don't know the pitcher's flyball rate. The rate at which flyballs are actually infield flies will partially depend on how many flyballs that pitcher gives up per ball in play. I wonder how quickly IF/FB would stabilize if you included the pitcher's flyball tendencies?
So, if you want to predict future flyballs, you need to base your calculations off something that includes "information" about his contact rate and his flyball rate. Due to the correlation between infield flies and all flies, previous IF rate does that, but previous FB rate stabilizes more quickly because there are a lot more of them.
Does this make sense to people?
You know, I think THT started this whole infield fly thing when we requested infield fly data from BIS back in 2005. We thought it was a new angle that would be interesting. Guess that's why I have such a strong interest in it.
They mark infield flies based on distance from home plate. The latest I heard is 140 to 150 feet.
Okay, I think I see. By using a linear formula that isn't based on zero, you're approximating what might be a curvilinear relationship.
So, how quickly does this formula stabilize? Is it an improvement over the straight 7%?
That is, they don't use "outfield grass," which obviously can't be used in some ballparks anyway.
I'm confused. Derek said that any fly caught by an infielder is an infield fly. You're saying that MLBAM uses distance instead? That's exactly what BIS does too (and they don't just use the infield parameters).
Got it. Thanks. It makes sense to me that IF% and FB% would be correlated, since IFs are included in FBs. At least, I think it makes sense. :)
So, could you improve your model by varying the 7%, according to the pitcher's FB%? Or would that not be worth it?
I get confused when people start flipping off r-squareds as to what is being correlated with what, and how that relates to other correlations.
For instance, you say that your r-squared of .68 is "much more significant." But than what? The .21? But those were two different regressions--one regressing IF/FB rate vs FB/Contact rate and the other comparing the IF/Contact rate in two halves. Why compare them?
OK, I think we weren't quite addressing each other. That helps clarify. Thanks.
Oh, okay. Thanks for the clarification. I thought you were referring to the HR/OF vs. HR/Contact analysis Colin had a couple of months ago. I left several confused comments in that one, too.
Last comment, I promise. When you say this...
"Yes, it does go up. The graph shows that relationship."
...are you saying that you didn't use a standard 7% of flyballs in your first table? You increased it as the pitcher's flyball rate increased?
By the way, I am just totally confused about these points. What does HR/FB have to do with things? Are you referring to Colin's previous analysis, which I also didn't get?
Should I just give up on this stuff?
I guess when you throw off r-squareds like that, I don't know what you're saying. I often have that problem with BPro articles (and some of my own too!). What is the point of that correlation?
Also, is correlating something the only point? How about having a better handle on what makes pitchers unique and interesting?
Why would you say that, Mike. Cataloging flies according to the person who caught them is dependent on the range of the infielders in question. Why would that make more "baseball sense" than where the ball actually lands?
Wow. That is significant. I'm having problems, then, reconciling that fact with the notion of just using straight flyball rates as a proxy for infield fly rates. Wouldn't the rate (the 7% you used in your first table) go up as the overall flyball rate goes up?
I have to admit that I'm still a fan of IF/OF, though I can't adequately explain why. It feels like it captures a nuance that IF/Contact misses.
Thanks, Derek. That's what I thought. One observation is that BIS is more rigorous about identifying infield flies. It doesn't depend who caught the ball--it's based on where the ball lands. I know Colin has issues with how well they measure it, but I'd have more faith in the BIS classifications being meaningful than MLBAM's.
Don't know if that would affect your analysis at all--just an observation.
Quick question #2: other studies have found that IF/FB goes up as FB% goes up. It appears that your graph might support that theory too. Any thoughts?
Apologies for not knowing the answer to this, but how is infield fly defined? I believe you use the MLBAM version, which is the same as Pitchf/x? Is that right? How do they define it?
Wow, Mike. I'll have to spend some time on this one.
Just wondering, do you read the old "Plunk Biggio" blog? I think it's now called Plunk Everyone. I thought he might have inspired you.
I'm looking forward to the movie, too, but dreading the old arguments. Oy.
Never thought of Steven Collins as a character actor. Hasn't he almost always been a leading man type? Good choice for Art Howe.
I can see that. It's just the change in language that threw me for a loop, I guess.
Thanks Jeremy. A stolen base breakeven table for all inning/base/outs/score cells would be way cool. I'll work on it! (though it's probably been done somewhere already).
I don't mean to be nitpicky, but I had interpreted 1% to be WPA/Pre-attempt WE. I'm not sure putting a percentage to .01 WPA should be done.
Sorry. Leverage Index.
Nice job, Jeremy. Can you talk a bit more about what drives the breakeven rate? Is it related to LI? Score? Other things? Also, what do you mean when you say Gardner has averaged 1% WPA for each stolen base attempt?
Also, are the BPro win expectancy tables still based on actual data? Probably doesn't matter for your specific comparisons, but I was just wondering.
Yes to this:
"...we have now conditioned large swaths of people to have a Pavlovian response to predictions based on a limited number of observations - 'small sample size.'"
Well, you may not like the tone of Colin's article. It may not match that found in academic articles. But I still have no idea why this means that sabermetricians are hypocritical for criticizing the Murray Chass's of the world.
BTW, Colin is definitely saying that SIERA is "bad" in some ways. Look again at his statements about multicollinearity or why using HR/FB is bad (which I still don't get). Mike is saying that an approach that relies too much on regression is, indeed, "bad."
But that's a different point. Now you're saying that people are slinging mud; before you were saying that people should work together to find a "unified alternative."
I have no problem with people openly disagreeing with each other. In fact, I think it's healthy. But I agree with you that it's to no one's credit if people start getting personal and slinging mud.
I also agree that there is a little bit of mud slinging here. Colin can be harsh in his assessment at times. But it's to his credit that he doesn't hold back in his assessments and, as far as I know, doesn't let things get personal.
Okay, but why does that reflect poorly on sabermetrics? BPro is just one site--there are plenty of other sabermetric places on the web to talk about SIERA if you'd like.
I have just the opposite reaction--in fact, I'm glad that people disagree about these types of issues. Being willing to discuss what's right and/or wrong about the current "baseball way of thinking" (whether it's in the mainstream media of just among us baseball nerds) is exactly what sabermetrics is supposed to do.
I see no hypocrisy at all.
Sorry to keep asking questions, but this also confuses me:
"...there is a reason that this sort of analysis can be unsatisfactory: it assumes that a fly ball that wasn’t a home run is equally predictive of future home runs as a fly ball that is a home run."
But isn't this just as true as HR/CON? In fact, couldn't you turn this argument around and say that this is why HR/FB is preferable to HR/CON? Because fly balls are more predictive of future home runs than, say, ground balls?
Third question: By using first half and second half results, you're typically looking at pitchers who played in the same environment. Won't that skew the results toward the HR/CON figure? Part of the reason to look at flyball rate is to take the park (mostly) out of the equation.
A good test might be to ask whether HR/FB is more predictive of second half home runs than HR/CON?
Also, you say this:
"Fly balls on balls in play are a much poorer predictor of future home runs than home runs on contact, with an r-squared of only .014."
But your table shows an R-squared of .023 for FB/CON. What's the diff between .023 and .014?
And aren't those abysmally low R-squared figures? I'm used to getting R Squareds in the .20 to .30 range.
Colin, I don't understand how your graph of HR rates supports what you're saying. Aren't you graphing actual HR rates vs. predicted HR rates based on flyball rates? Of course the actual HR rate is going to be wider--that's the nature of real life vs. projection, right?
Are you implying that the distribution of expected home runs based on contact rate is wider than that for fly ball rate?
I thought you said "pinochles"
Mike, thanks so much. I always enjoyed working with you too.
I agree 100% that sometimes there is a need to have projection systems, even if we don't totally understand them. There are a few things you're willing to take on faith. Not many, just a few.
Hey Colin, nice job (and nice job in the comments, Mike). I won't comment on the "personal" side of this issue cause I don't really understand it.
But you've certainly given me something to think about, particularly the contention that HR/Con is better than HR/OF (or HR/FB). That is counterintuitive to me, and I'm going to have to read your explanation several more times to understand it.
To me, xFIP is a useful stat that tells you something about a pitcher, but (IIRC) I resisted putting it on the THT stats page because I didn't see it as a "reference" stat. That was silly of me, I guess. Many readers asked to see it, so we eventually added it.
Similarly, we used to run "ERA-FIP" at THT, which was also something requested by readers. I was kind of uncomfortable with that too and when we got a request to run ERA-xFIP too, I refused. I thought it put too much emphasis onto a single number and calculation. Time has shown that I'm in the minority in that position.
I guess I am someone who is uncomfortable with the quest for a complex stat that explains everything. I am leery of issues like multicollinearity and other things I can't pronounce or understand. I intuitively won't trust a stat I can't understand. I make only two exceptions: projection systems and total-win based systems.
So I wish you guys success on your own statistical quests but please: don't try to do too much. Keep it simple.
Okay, I thought you were interpreting that graph.
I admit that I don't understand your methodology. I mean, I understand your description as far as it goes, but I don't understand how that translates into specific numbers on your graphs.
Probably just me. ;)
Okay, so I guess I don't quite understand this:
Basically, I compared how players who faced each other in consecutive years fared in those confrontations and in their matchups against everyone else. If they did better against the same players, that means their environment was different.* If they did worse against everyone else than they did against the same players, that means everyone else got better.
How does this paragraph translate into the numbers on the graph above it?
That said, I'm still digesting the methodology.
Jeremy, did you regress the stats of the first year? If not, I believe you have a selection bias.
Oy. I wouldn't foist the Cook books on my worst enemy. They're unreadable and do a wretched job of "laying out the basics of baseball analysis." Cook was a terrible writer, his methodologies overwrought and many of his findings flawed. Just because he published before James, Palmer, et. al., doesn't make his books noteworthy in other ways.
Good stuff, Mike. The graph breaking out different years is pretty interesting. Worth staring at.
So, is the correlation between temperature and fastball speed 100% real (occurs on the pitcher level) or might it also impact the PITCHf/x recordings?
BTW, I hope that's good news for Kerry Wood.
Just want to add my own three thumbs up, Mike. I keep the third thumb in reserve for articles like this.
Thanks, Colin. So, picking on the 100-200 bin, when you use SIERA to predict future ERA, 66% of the results will be within 2.1 runs of ERA and 95% will be within 4.2 (that's taking the RMSE on both sides).
When you use ERA to predict ERA, 66% of the results will be within 2.5 runs of ERA and 95% will be within 5.0 runs.
Is that right? If so, it's obviously an improvement, but I think somewhere we should acknowledge that ERA is just plain tough to predict, regardless of what measure you use.
Looked at that way, I don't think there's much to choose between SIERA, xFIP and FIP. Just use whatever you're comfortable using.
Congrats, Sky, and welcome to BPro.
I'm also someone who feels that, once you reach a certain threshold, ERA is just as good a predictor as anything. Happy to see you peg the threshold. I'm pretty comfortable saying that, after a couple of years as a starting pitcher, actual performance is as good an indicator as these other metrics.
Technical question: how good are any of them anyway? At 100 innings, it appears that the average RMSE is 1.2 points of ERA. Is RMSE similar to standard deviation? Can you say that 66% of all estimates fall within one RMSE, or something similar to that? So the best any of these estimates might get is 66% within plus or minus 2.4 ERA?
If so, it makes the difference between the RMSE of these measures kind of trivial.
Nice dig, Rob.
Here it is. I forgot that it was inspired by a BPro article.
Yes, that is my reaction, too. The point would be that closers have more value in low-scoring environments.
Excellent article, Colin. I think that's the best job of articulating the issue I've seen--not the issue of "after the game, everything is the same," but the issue of "diminishing leverage for those who come later in the game." Different things, I think.
Anyway, I'm someone who is okay with the way WE works. Guess I'm an "in the moment" kind of guy. But I wonder if there is some way to develop another system that accounts for "game impact" in the moment but also incorporates the idea of "creating leverage."
I've done a few things along those lines, such as writing about giving weights to batting and pitching events based on how close each game turned out to be, but that's probably just the tip of the iceberg.
Okay, I understand that perspective. Thanks.
Well, if I'm understanding your point (always a questionable issue due to my lack of powers of comprehension), you're saying that the "disagreement" between UZR and DRS--that left by the partial correlation--is a bad thing. I don't know that that's true, because the two systems have different components and also measure them differently.
If your point is that those other components and measurements make the systems more different than people might realize on the surface, then you've raised a good point. I think a lot of folks are already aware of how different the systems are, but it's good to reinforce the point.
But perhaps the bigger reason for my reaction is that this sentence raises my hackles...
"For a time we stopped doing science when it comes to fielding analysis, and instead have been doing baseball alchemy..."
...because it's condescending and I think it diminishes the terrific work that MGL and BIS continue to put into their systems, to better understand the dynamics of fielding and improve the work.
I did mention runs, forgot that. But it is an issue to the extent that the two systems interpret runs differently. DRS includes "home runs saved," for instance, which Dewan gives a lot of weight to (1.4 runs). I don't believe UZR has that.
I don't believe that the differences in the two systems are as straightforward as you and Colin imply. Things like pickoffs, bunts, "home-run saving catches" etc. are all handled differently in the two systems, and those are only the things I know about. Park adjustments, yadda yadda.
"2) realistically, expected DER likely accounts for the lion's share of any overall difference there is."
...is a hypothesis, as far as I know, and I'm skeptical of it.
I'm not worried about the units. I guess I don't understand your point. The two fruit salads don't have the same fruits.
You seem to want to pin your findings on bad data, or inconsistent misinterpretation of the data, but I think the systems are too different to conclusively say that.
Isn't DRS Defensive Runs Saved? If so, that is very different from DER, isn't it? For example, I think it includes catcher ERA, runners picked off and other things not related to DER. Plus, it's denominated in runs, which would make it different from DER.
I don't know about UZR, but this feels like apples and oranges to me.
You made me go back and look at Win Shares fielding stats, because James' system is not unlike your new FRAA system. Win Shares agrees with you in 2010--it calculates that Jeter was more than 30 assists less than average.
I'm know Win Shares are anathema to some (and for some good reasons), but it's interesting to me that his fielding metrics may still be relevant.
Burr, that's what I did/do with Win Shares at THT. Here's an example:
"WSP" refers to Win Shares Percentage, where .500 is average. It's a Win Shares rate stat. Yes, sometimes players are better than 1.000.
Yup, that's my point. If we want the replacement model to reflect economics and if we use your parameters, then we can only apply it to players with more than six years of experience, right?
Colin, sorry. I should have posted the link.
The approach is still the same, but I have changed the percentages. In particular, I found that the old criticism of Win Shares is true--it undervalues starting pitchers--so their replacement level is lower. I think I wrote about the updated values in the Annuals and not on the web.
But the basic idea is the same: starters against next level of bench players. I didn't feel beholden to the "freely available talent" approach.
By the way, the average major league player paid the major league minimum actually performs at least as well, if not better than, bench level. We've discussed this before and I don't mean to shout out Colin here.
Tango, you're again insisting that replacement player is only an economic concept. I'm saying it's not. It's an important adjustment to any metric that compares players to average, when those players have significantly different amounts of playing time.
I agree that chaining is an important factor, and one that ought to be discussed. That's why I'm not convinced that the "26th man" approach is best.
However, if you want to compare players with different amounts of playing time with just one number, then you will be limited by using an average baseline. Yes, you can make the comparison without it, if you throw in other stats and do the math. But why make people do that? It's like saying this:
"Hey, here's a stat, but you can't use it the way it is. Here's some other numbers that will help you. Do the math yourself."
Why not make it usable in the first place?
Replacement level is not just an economic concept, and I think that limiting it that way misses the bigger point. If you want to compare players with significantly different amounts of playing times, then you DEFINITELY need a baseline that is different than average.
But replacement level tells you as much as average level does, and more. Targeting 2 WAR over a full season may be roughly equal to average--just adjust from there.
On the other hand, you lose something with the average baseline, as Colin explains. It gives no credit to a player who plays an entire season at an average level WHEN COMPARED TO a player who was above average but had only one plate appearance. That player actually helped his team reach the postseason more than the first one did.
If players all played the same amount of time, then I'd agree with you. But they don't.
I agree 100% that we need replacement level due to the playing time issue. As far as I'm concerned, the exact level can be set a number of places. I used "bench" as my level for Win Shares Above Bench (and posted my research at Baseball Graphs), and I think Keith Woolner did the same thing in one of the BPro books.
Is there a particular reason to believe that a replacement level equal to the "26th man" is better than another level?
Exactly right, Colin. Well said.
So what *can't* Fieldf/x do?
It's sort of irrelevant now, but here is the study I was looking for:
Very cool, Matt. The other variable to consider is number of outs. It would be important with a runner on first.
Good question about controlling for batter effectiveness. I'm not sure. But I also was thinking along the lines you suggested, that perhaps the 2/3 situation balances out the first base situation in some way. It would be interesting to dive into.
Good stuff, Matt.
Actually, I'm not sure that these articles definitively answer the question. FWIW, Bill James and John Dewan have argued about this topic for many years.
By the way, John Walsh doesn't study this specific issue in the following article, but it's a nice overview of left-handed batters and why they seem to perform better than righties:
This is the one I remembered off the top of my head: Page 323 of The Book by Tango et. al. With a runner on first and less than two out, left-handed batters have a wOBA that's 20 points higher; for righties, it's ten points higher.
That doesn't address BABIP specifically, though. I'll keep looking.
Logically, lefties should have a higher BABIP with a runner on first and no out, cause the first baseman will be playing close to first and all batters naturally pull groundballs.
Matt, it seems to me that your results don't jibe with other studies I've seen that point to a significant increase in BABIP for lefthanded batters (relative to righties) with a runner on first. Perhaps it's not apples to apples?
Yes, exactly. That has been documented several times in the past. I remember it from Tango's Book, as an example. It's a good reason to put left-handed batters in the #3 spot of the lineup.
But that's the nice thing about Sam's proposal. It doesn't force a team into the FA market, it only makes it cheaper for them. If it doesn't make sense to sign a free agent, even at a cheaper rate, then the team won't do it. Theoretically...
Excellent perspective, Matt. I agree with you and Sam Mauser that a much better system would be one that tied the tax and its distribution to salary/payroll. Revenue is too squishy.
On the surface, Sam's proposal makes a lot of sense. It might be worth getting more specific with it.
Well, never mind. Stupid question. If it works for 162 games, it should work for one. I'm just slow, is all.
By the way (and I understand this is off the point), but does the Pythag formula work well for determining the probability of winning a specific game? I don't believe I've seen it used that way before. Just curious...
Got it. Thanks.
Great stuff, Colin, though I agree with evo34. Looking at this data by team would get rid of the multicollinearity, or whatever you call it.
BTW, I think there is an error in your formula. You use RPG for offense, but RA for defense. I assume that's Run Average and not Runs Allowed, but you multiply the RA by the total number of innings pitched for both starters and relievers.
That would give you total runs allowed, not runs allowed per game. If I'm reading it correctly.
Congrats, Colin! Excellent move for BPro.
Yah, good point. I have no idea how to apply standard deviations to non-normal distributions. Think it substantially changes the 68%?
Yes, that's 68% (34% on either side). Two standard deviations are 95% and three are 99%.
FWIW, I would advocate using two standard deviations. 68% is okay, but not compelling, and using two standard deviations, or 95%, is more understandable and intuitive to the average reader.
Obviously, down the line, you can make statements of how likely it is that Ozzie *isn't* the best fielding shortstop in your dataset.
Big huzzahs for introducing margins of error, by the way. Love it. Another stupid question, though: what range does the margin of error represent? 99% of potential outcomes?
By the way, I do see that you addressed that in your article. I guess I'm just pulling it out a bit more.
I have a stupid question and/or comment. With the "ground ball adjustment," we're hurting above-average infielders who are paired with below-average outfielders, right? And vice versa?
I think the reply is that the system is worse without that adjustment, and I would intuitively agree. But it seems that it ought to be noted that this is a possible problem with the system. It would also seem to set up the next question: can we make the system better by using simple batted ball types, or do you feel there is too much systematic bias in even that data? What's the tradeoff?
Good point, Eric. I am not a fan of K/BB ratio at all. I think it tells us very little and can be downright misleading.
Another approach, which we use in the THT Annual, is to assign linear weights to strikeouts and walks and total them up.
Neat interview. I find it ironic that a guy who is famous for what he did in Vegas is surprised that business is all about money.
I'm somewhere in between Colin's viewpoint and many of the posts. The strikeout rate doesn't bother me, but it does seem that Wright is trying to find a new approach after his power outage of last year and still hasn't settled on it. He may be in the process of morphing into more of a TTO guy, which would be sad in many ways but not unprecedented.
The really sad thing is that Wright has turned into a New York media whipping boy even though he's clearly been the Mets' best hitter so far. Heavy hangs the head...
I think you're misinterpreting the 26-31 year olds who signed with new teams. Their WARP went from -.34 to -.18, but you have them providing more value in their first year in your percentage table in the article.
BTW, I also have a problem with this methodology, using the percentages to argue that players that re-signed with their teams "aged" better. Unfortunately, I can't really articulate why--I'll see if I can come up with something constructive.
OTOH, I find your general conclusion (teams that re-sign their own players get better results cause they know their players best) intuitively correct. It would be great to really "prove" it.
Seems to me that WPA is what it is. That's it's strength and weakness. As a story stat, I think WPA is superb. When it comes to giving "credit" to players, you're really quantifying their role in the game story. As Matt says, you're not passing some sort of moral judgment on them, and I don't know many people who are using WPA to pass moral judgment.
Regarding Colin's specific example, Sweeney did get credit for extending the inning to Ichiro, but you're right that he didn't get credit for what Ichiro subsequently did. Should he?
Well, if you give him that credit, that might be "fairer," but it also disrupts the game story. It becomes a "post hoc" story instead of an in-game story. To me, that's not worth it.
Isn't leverage index the way to solve Clay's boundary issue?
New BPro contest: find the Nash equilibrium in the MLB draft!
I would think there is one flaw in the draft slot game theory. IIRC, the draft salary slotting only started a few years ago--less than five years or so. Yet the AL superiority developing minor league talent dates back to the early 1990's.
Hey Matt, still chewing on this, but are you aware of the work Steve Treder has done in this area? He's presented at SABR, and also published his work in the most recent THT Annual. Bottom line, he published a long series of work at THT similar to your "No Turnover" Standings, but througout baseball history. He called his the "Value Production Standings." You should be able to find it easily at our site.
Bottom line, the balance in developing "original" talent shifted from the NL to the AL in the early 1990's and has stayed there ever since.
Matt, responding to your previous post, I did read your article in detail and didn't "show up to hate." In fact, I posted a review of it at THT. I'm sorry you find my comments hurtful, but I've certainly received a lot worse and not minded.
"For Phillies phans, this is actually a great sign."
...seems pretty close to "best contract ever."
You remind me of hot-shot consultants I used to hire who thought they could model how my business worked but wound up adding no value. Total waste of money. The bottom line is that teams still have a budget based on what they think their revenue will be in the next year.
If you pay someone a certain amount of money from that budget, it represents less money for someone else. And budgets are typically based a year at a time, not for a three- or five-year period. The nature of baseball--in which wins can be hard to predict--makes long-term budgeting an intellectual exercise only.
The bottom line is that, if the Phils win less games next year, they will have less money to pay ballplayers. No amount of theorizing can refute that. Howard's contract is more than just a "sunk cost"--it's an actual payment due from a future budget.
One other comment. I don't understand this statement: "The myth here is that teams spend money to be nice to their fans, but only up until they reach some arbitrary budget, and then they stop. This is untrue: teams spend money to make money."
My experience is that teams don't follow an economists' ideal of matching marginal cost to marginal revenue. They don't consider player expenses to be investments, at least not financially. They have annual budgets, just like every other business. Do you have evidence to the contrary?
Matt, putting the analytic talk aside, the crux of your article seems to be that the Phillies know Howard better than analysts. That may be true, but there have been many instances in which team overvalue players on hand. Groupthink can cut both ways, and I don't think an analytic approach that rests on the assumption that the Phillies have avoided groupthink is going to put the rest of us outraged analysts at ease.
George Lindsey probably deserves primary credit for creating the run expectancy tables in the early '60's. The Mills brothers took his work a step further and turned them into win expectancy tables and calculated WPA (not their term) for all players in 1969. However, the Mills brothers didn't create a "Runs Created" sort of stat.
Skoog built on Lindsey's ideas(and Palmer's) to create his RC stat. I'm not aware that anyone calculated a specific RC stat before Skoog (though many, like Lindsey and Palmer) had the idea), but that's because Skoog had the Project Scoresheet data.
At least, that's my understanding.
I'm enjoying the discussion. Just one small point: FIP doesn't talk about fly ball percentage except to the extent it's implied in the home run rate.
Never mind. Eric just reminded me of our discussion about it last September.
Additionally, as it pertains to xFIP, we spoke with Dave Studeman of The Hardball Times in order to determine that the expected number of home runs to be substituted into the FIP formula is to be calculated through home runs per outfield flies, not the sum of those and popups.
I don't remember any conversation about this at all.
One correction: xFIP is not based on HR/FB, but HR/OF (home runs per outfield fly). Sounds minor, but it's an important distinction.
Of course, if the Mets had actually created some offense at the time, Torre might not have changed his opinion after all.
I don't know about VORP, but that's been done with a number of other value stats. I did it with Win Shares for several years. In every case, free agent pitchers were much more expensive than free agent hitters. And I used WSAB, which attempts to adjust for the imbalance between starting pitching and everything else in Win Shares.
Fun stuff. Thanks.
I may be missing the point, but it seems to me there's nothing wrong with not spending a lot on pitching if you have a lot of good young pitching. A heavy payroll in a certain category usually means the team hasn't done a good job of developing that talent internally, right?
This is particularly true of pitching, where the aging curve is mostly flat before eventually trending down.
Plus, pitchers are generally not as good an investment on the free agent market than hitters. So this may be a consequence of smart money management by the Nats, not bad money management.
A couple of critical comments:
1. That first paragraph was incredibly mind-numbing. Talk about not grabbing your reader at the beginning.
2. What the heck is the definition of all those stats in the table? Z-Swing? I think you assume too much on the reader, particularly since those aren't BPro's stats.
We try to avoid things like that at lesser sites like THT.
What most baseball analysts really object to is the reference to chemistry and pressure when those things may or may not have had an effect. How do you know the Rays \"choked\" or lost confidence? You don\'t, anymore than I do.
I think the tone of Joe\'s article was right on. Those things may have had an impact, and it\'s fun to think about them. But those psychological pressures may not have been the real story of the game. Sometimes players do and don\'t perform just because that\'s how the ball bounces.
Nice article, Joe. I don\'t have a problem with a baseball analyst being willing to consider the role of confidence or anxiety in a game or situation. Just don\'t make it a habit!