CSS Button No Image Css3Menu.com
New! Search comments:
(NOTE: Relevance, Author, and Article are not applicable for comment searches)
Shouldn't you be telling us the % of innings at SS in MLB?
As a general note, there are two kinds of projections, especially for marginal players. We don't talk about this often enough. David Gassko (the smart baseball analyst who went into a more lucrative industry I think) turned me on to this several years ago.
One is "projections projections." These are exactly what we expect a player to do (mean, median, whatever) IF he plays in the major leagues (assuming they are MLB projections) or if we forced him to play in MLB.
The second kind are projections of players who end up playing in the major leagues (whether they played in the majors before or not).
There's actually a third one of players who end up playing AND amass a minimum level of playing time - any minimum.)
All three of these require different projections (with increasing optimism). All of them but the first one are a fudge. And while the first one is the most "accurate," it will also fare the worst in any kind of testing.
I'll let the readers figure out what this means and why it's all true!
Good job Rob. Really would have liked to see a control sample of relievers who has similar <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=ERA" onmouseover="doTooltip(event, jpfl_getStat('ERA'))" onmouseout="hideTip()">ERA</span></a>'s (the means should be about the same) but who did not qualify for your group.
Otherwise we don't know that Pecota isn't under-projecting ALL relievers.
Trust me, it's easy to have too high (bad) of a projection for bad pitchers. The reason is that teams are more likely to bring back the pitchers whom THEY know (but WE don't) are better than the ones whom they don't bring back.
So, if 30% drop out of any sample after a bad year, it is likely that those 30% would have performed worse than the 70% who did not drop out even if they had the same projection and even if they had the same overall stats in year 1.
Again, that is because the teams "know" things that the projection models don't. That's especially true if the projection models don't use minor league stats. The pitchers that live to see another year likely have better minor league stats than the pitchers who don't even if they have the same MLB stats.
To be honest, <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=PECOTA" onmouseover="doTooltip(event, jpfl_getStat('PECOTA'))" onmouseout="hideTip()">PECOTA</span></a> historically has not had the best projections, so I would not be terribly surprised if they under-projected all their relievers who had bad seasons.
In my piece, I probably should NOT have said that I had no Bayesian prior, we I often do when doing these kinds of analyses.
It might make sense that when we have a group of pitchers who have had a few disastrous starts, that SOME of these pitchers had something really wrong with him for those starts to SOME extent, as compared to a group of pitchers who have not had any disastrous starts. If that were the case, we would expect the first group to outperform their projection assuming that they are not somehow constitutionally inclined to continue to have a few outings where something is really wrong with them.
Nah, you can do it (channeling the right-wing jerk Rob Snyder)!
Here is what I said at the end of my piece:
"Our certainty of this conclusion, especially with regard to the size of the effect – if it exists at all – is pretty weak, given the magnitude of the differences we found and the sample sizes we had to work with. However, as I said before, it would be a mistake to ignore any inference – even a weak one – that is not contradicted by some Bayesian prior (or common sense).
Basically I found a .28 run difference between the projected and actual RA9 NEXT SEASON for starters with exactly 2 or 3 terrible outings in one season. For all other similar starters, I found only a .03 difference in the same direction (actual was BETTER than projected).
.25 or .28 runs is a little less than 2 SD for the number of <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=IP" onmouseover="doTooltip(event, jpfl_getStat('IP'))" onmouseout="hideTip()">IP</span></a> (3,500) in the experimental group (the ones with the terrible outings).
Given that I don't really have any Bayesian priors (which is rare in analyses like this), a 2 SD difference is, well, you decide. And remember, when we find a 1 or 2 (or 3 or whatever) SD effect, it's not a binary thing, like we accept or reject that difference or the hypothesis that there IS a true difference. We're trying to find out what the magnitude of the true difference or size of the effect IS, if there is any at all (and what does NO effect even mean - if the true effect is .01 runs per 9, is that NO effect? What about .02? .05?).
If we find a 2 SD effect or difference (from a control group or the null hypothesis), well that means that an effect of that size or more is unlikely to occur by chance. But, what if the true effect were small but it did exist? Well, now that 2 SD difference is more likely to have occurred by chance. If the true effect were half that small, then we have a 1 SD random effect which occurs quite frequently (16% at one tail).
So be careful with and when drawing conclusions from empirical tests like these. That's all I'm saying.
First of all, you can't bet "against" any of these Sportsbook pennant odds (unlike win totals where you can bet over OR under), so the books add in a lot of built in juice which projects them from being wrong.
Because of this, they'll also tend to be more wrong below the line than above it because a bettor cannot exploit anything below the line.
10 or 20 years ago, Sportsbooks could and would mostly concern themselves with anticipated batting patterns. That is no longer the case. The reason is that there are legions of very smart, well-financed betters out there (not so much with these season-long bets though) so they have to put out good lines otherwise they will get exploited.
For example, say they anticipate equal action on both sides of a Yankee game where they post a line of Yankees 2-1 odds (-210 on Yankees), but they know that the Yankees should only be, say, 5-3 favorites (around -175). In the old days they could get away with it. Can't anymore. Tons of smart bettors will pound the other side at +190 until the line moves to around where it should have been in the first place. The public becomes irrelevant because the proportion of their money to the total market, including the smart bettors is so small, compared to the old days.
So, yes the market will move based on Pecota numbers, as they already moved when the ZIPS and Steamer numbers came out. It's early so they might not move much until we get closer to the season. But by the time we get close to the season, the opening numbers will have moves tremendously to reflect the projections we see online. That is especially true of the win totals because bettors can bet both sides and the spread is "only" 20 or 30 cents (around 5-7% "juice"). I don't know how much juice or vigorish is built into these pennant lines but I imagine it's a lot (maybe 10% or more). You can figure it out approx by seeing how much return you get on your money if you bet every team and assuming their odds of each team winning or someone else's (like Pecota's) - it doesn't matter that much.
Very good, thanks for the reply.
Russell nice article and I am glad you critiqued this paper. I was not very impressed with it for exactly the reasons you articulate.
I surely would like to see your results converted to expected runs, i.e., how the change in the various events would affect runs per game, using linear weights.
I'd also like to see simply W/L records and <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=RS" onmouseover="doTooltip(event, jpfl_getStat('RS'))" onmouseout="hideTip()">RS</span></a> and <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=RA" onmouseover="doTooltip(event, jpfl_getStat('RA'))" onmouseout="hideTip()">RA</span></a> numbers for jet lagged and non-jet lagged teams overall and for west to east and east to west, adjusted for H/R, team strength, schedule, and era of course.
Just listing a bunch of significant variables with no numbers is not very helpful or interesting to be honest. I mean we don't really care about <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=GB" onmouseover="doTooltip(event, jpfl_getStat('GB'))" onmouseout="hideTip()">GB</span></a> or FB rates other than how that might affect run scoring.
Also you talk about them building Type I errors. However they also used Benjamini–Hochberg false discovery rates. Isn't that supposed to account for that?
Off the top of my head, if I had a choice between command f/x with all of its weaknesses, and CSAA (which is much "cleaner"), for identifying and quantifying "command" I think I would much prefer the former. "Cleaner and more precise" doesn't necessarily mean you're capturing something more accurately than some messy or brute force type of metric like command X.
Then again I don't think they are exactly capturing the same thing, so the results of one or the other might be more or less useful for one inquiry or another.
I am not at all convinced how much of your definition of command is captured in pitcher CSAA. Certainly some, but it is not clear, to me at least, how much and to what extent what it does capture is useful. I do like the analysis and methodology though.
Thanks for the clarification (or more accurately, correcting my misunderstanding - your definition of <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=CS" onmouseover="doTooltip(event, jpfl_getStat('CS'))" onmouseout="hideTip()">CS</span></a> Prob was clear) and your other comments!
Does CSAA for pitchers control for umpire and catcher?
Throughout the article you keep saying <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=CS" onmouseover="doTooltip(event, jpfl_getStat('CS'))" onmouseout="hideTip()">CS</span></a> prob is "called strikes per pitch." As in:
"That is, any given pitch thrown by Colon has more than a 50 percent chance of being called a strike."
"That means Colon is getting two extra strikes <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=PER" onmouseover="doTooltip(event, jpfl_getStat('PER'))" onmouseout="hideTip()">PER</span></a> 100 PITCHES simply because the strike zone that is called isn’t the one the rulebook lays out."
CS Prob is percentage of strikes CALLED per CALLED pitch right?
That being said one of the difficult issues with this model and analysis is that the quality of a pitcher's stuff affects whether a pitch is swung at or not and thus the pitches not swung at are a selected sample of pitches that are inherently related to stuff. I'm sure you guys are aware of this. For example, for a pitcher that has nasty stuff, the pitches that are not swung at may tend to be very much away from the K zone giving the impression that the pitcher has poor control even if he doesn't. If a pitcher does not have good stuff batters may be able to lay off pitches just outside the zone giving the impression that the pitcher has better control than he does have and perhaps better command (IDK about that).
You might want to consider ALL pitches for a control metric. If fact, I'm not sure why you wouldn't. Why wouldn't you?
As far as the definitions and differences between of command and control, YOU can define them anyway you want. There are no parochial definitions for those words. They general way the authors have chosen to define them is useful and reasonable and exactly the way I like to define them. Command being simply how close a pitcher can come to his intended location (which tends to vary a lot with pitch type even within pitchers). Control is simply how many pitches end up in the zone. Control is generally a function of intention and command ability. If I have poor command and great stuff I am going to just fire balls near the center of the zone and have good control. If I have poor command and poor stuff I am going to be forced to stay away from the middle of the zone and because of my bad command my control will be awful (and I'm probably an awful pitcher). You can go through all the possible iterations of "stuff" and "command" and you will arrive at a pretty good projection for control.
But again, if you're measuring control (and perhaps command) using only called pitches you are going to run into some trouble.
Fair enough on both counts.
Although to the first point, if you merely did a KR-21 or Cronbach's Alpha on large samples of batter data alone (against all pitchers) even if that data spans 7-8 years, surely you are going to get a pretty high reliability, no? Not only does talent not change THAT much over time, but if it changes systematically that won't even affect the correlation, no? In fact, I'm guessing that if you controlled for underlying sample sizes, you would get around the same reliability whether your used batters alone or batter/pitcher matchup data. The batter/pitcher matchup data is really just a proxy for batter v. any pitcher data, with smaller sample sizes (assuming the null hypothesis that batter/pitcher matchups are nothing more than a log5).
Surely you should have split the batter/pitcher matchups by handedness platoons though. Couldn't that alone account for most or all of the "1-part" batter/pitcher matchup in your regression?
Again my guess is that between handedness and <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=G%2FF" onmouseover="doTooltip(event, jpfl_getStat('G/F'))" onmouseout="hideTip()">G/F</span></a> platoon, that will account for virtually all of your "1-part batter/pitcher matchup."
Technically you COULD call the G/F platoon stuff part of a "batter/pitcher" matchup even though I don't think most people look at it that way.
The ultimate question is how much would you regress a specific batter/pitcher result in X number of <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=PA" onmouseover="doTooltip(event, jpfl_getStat('PA'))" onmouseout="hideTip()">PA</span></a> toward the log5 expectation? And my guess would be about what we have always thought which is close to 100% for any reasonably small sample and maybe 95% for any really large sample.
Which pretty much means what we thought it meant, which is, as with clutch, "You can ignore it other than as a tie breaker."
No one every said it means NOTHING. Or should say that. We don't say that about clutch, protection, chemistry, etc. We simply follow the evidence. So far I don't think we've found any significant evidence of "batter/pitcher effect" and I'm not sure that you have changed that to be honest, for the reasons we have discussed. You have merely opened up the inquiry again.
Two comments, the second one similar to some of the ones already above.
One, your first inquiry, looking at reliability of pitcher/batter matchups - what is that telling us? Of course they will be just as reliable as regular ole pitcher and batter stats. You're just capturing the talent in the batters and the pitchers (the log5 result) but not any "extra" information from "specific batter/pitcher" matchups.
As far as your second inquiry, again, of course batter/pitcher specific matchups will give you extra information than just using log5 alone if you did not break it down by platoon handedness! How could you not do that?
Even if you did, it would still give you extra information. And that is because log5 is only an approximation of the result of a batter/pitcher matchup using limited variables.
The actual result of a batter/pitcher matchup, for example K rates, requires more variables, namely <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=G%2FF" onmouseover="doTooltip(event, jpfl_getStat('G/F'))" onmouseout="hideTip()">G/F</span></a> ratios of batters and pitchers.
As well, log5 doesn't work that well at some of the extremes because the outcome actually reflects more on either the batter or pitcher whereas log5 always assumes equal contribution. Log5 also assumes independence between the batter and pitcher rates, which isn't true in most cases.
So basically the reason you find that batter/pitcher results adds to your regression is because log5 simply doesn't use all the relevant information and doesn't treat what it does use properly. The batter/pitcher matchup results picks up some information that log5 does not.
The real question is whether batter A versus the universe of pitcher B's where pitcher B's were all the same in terms of handedness and G/F ratio (and maybe a few other things) is always the same regardless of the different results of each batter/pitcher matchup in that universe. I don't think you addressed that question at all.
I wouldn't get too hung up on the outliers in your graph above. There looks like a linear relationship but I would plot all players and then look at my regression especially the R^2. Don't know how many players in each of your deciles but probably not much which means that the differences in each of those deciles is not going to be very reliable.
Even with 20 players in each decile (around 8000 <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=PA" onmouseover="doTooltip(event, jpfl_getStat('PA'))" onmouseout="hideTip()">PA</span></a>), one binomial standard deviation of the difference from one year to the next is around 7 or 8 points! Expect to see lots of dots all over a chart like that, by chance alone.
Also in 2016 with the ball going significantly further than in prior seasons for some reason, it is probably better to put the ball in the air compared to if the ball were the same. For whatever that is worth as far as conclusions are concerned (IDK).
Tango had an interesting thread on his blog regarding your approach and the problems associated with selecting sampling. Did you read it?
He's right in that, to sum up his argument is a few words, a group of players who changed their approach (has to be a deliberate change not the result of something like an injury or age) from one season to the next (you have identified this change) will tend to have gotten lucky whether or not the change had any positive or negative effect on his true talent.
He explains the reason. After, say, a month, if the results are bad (which will always be a combination of luck and skill, but mostly luck in only a month), players in general will have a tendency to switch back to the old approach.
The ones who continue will have had a tendency to have gotten lucky in the first month.
This pattern will continue throughout the season such that players will "drop out of the new approach" all along the way if their prior performance was bad (mostly unlucky).
The net result is that of all players who continued with this new approach for most or all of the season will have had a somewhat lucky season by definition. This is unavoidable and incontrovertible.
It is exactly the same dynamic as looking at player with the most playing time. The more the playing time (mostly for non-established players - established ones are not as sensitive to the whims and decisions of management in terms of playing time) the luckier the season (and the greater the talent of course).
Any time someone makes a decision about what to do based on X (playing time, change of approach, eating chicken, taking steroids, etc.) if you find out that they did or took X for most or all of a season, they will have had a lucky season and will regress in the next season whether they continue to do or take X or not.
So you samples players for large changes. Presumably players with large changes kept up those changes for most or all of the season. So you have selectively samples these players such that their season 2 would be lucky seasons. So you should see an increase in <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=TAv" onmouseover="doTooltip(event, jpfl_getStat('TAv'))" onmouseout="hideTip()">TAv</span></a> in the second season and not a decrease which you found (although we don't know if it's a true "decrease" or not, which is why really need projections or a control group).
Not only that but there is another factor at work in your selective sampling. (I don't mean to imply that you made a mistake in sampling these players - you didn't. You just need to recognize it and adjust for it if you can.)
Players who deliberately change their approach from one year to the next will tend to have had a bad first year! Otherwise they would be less likely to want to change their approach. So both year 1 and year 2 are biased in terms of results. Year 1 will be unlucky and year 2, lucky. More reason for there to be an even larger increase from year 1 to year 2.
I suspect that the reason you found a decrease was a combination of aging (there will always be a decrease from any year 1 to year 2 unless your sample is very young players - that's especially true when you include survivorship bias) and injury.
However, to see to what extent selective sampling is biasing the results, in this case TAv in years 1 and 2, try this:
Look at same players (make sure you always weight by each player's <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=PA" onmouseover="doTooltip(event, jpfl_getStat('PA'))" onmouseout="hideTip()">PA</span></a>) in year 0 (2014 in this case), and year 3 (you can't do that here, but if you switch your study to changes from 2014 to 2015, you can then look at 2016).
Year 0 will be an unbiased estimate of these players' talent before the change and 2017 (or 2016 if you study 2014 to 2015 changes) will be an unbiased estimate of their talent after the change (with the assumption that the change was permanent to some degree at least). You just have to make sure that you include aging effects OR that you compare to a control group (such that the control group will have the same aging effects). For example, ALL players unless you select or happen to select very young ones, will have better stats in year 0 than in year 1 and worse stats in year 2 than year 1 and year 3 than in year 2.
I just want to reiterate that my comments and suggestions not withstanding, I really like this study and your basic approach.
First question, no I don't think so. How good or bad these players performed compared to their projections has nothing to do with the issue you are addressing. Again, you can look at individual players to decide if there's a compelling reason not to include them in your overall sample, but I strongly advise doing that.
Second question, the overall average is the only thing that matters in most cases. The only reason you are using multiple players in most of these analyses is to increase sample size. For all practical purposes they might as well be one player. You are inlcuding these players because you have determined that they're part of the population you are studying. Period. They are now simply N1, N2, N3, etc.
Making any inferences based on individual player results is almost (not quite, but close) like doing a study where you have a homogeneous sample size of N and then you arbitrarily and randomly break it up into multiple small samples and then make inferences about one or more of those small sub-samples. You can't do that.
You're investigating a certain effect within a population of players, period. The results of the individual players within that population mean nothing. You have to get out of the habit of thinking that the individual results mean anything at all or even reporting them. It's nice and it lends a "familiar air" to your research but it's not necessary. I almost never report on the individual players in a study or even look at them myself and I have done hundreds of these.
I mean if you look at individual players and you determine there is something about one or more such that they shouldn't be included in your sample then remove them, although that's a dangerous path to do down as I indicated in my last comment.
Basically think of your players as just random sub-samples of your overall sample. The problem with inferring anything from individual results, besides the fact that it's just not mathematically or methodologically correct to do so, is that your'e simply going to get all kinds of random anomalies among those sub-groups. That is exactly what you are trying to avoid by including as many players as you can that "fit the bill." Your entire point is to have all those random fluctuations "even out" by combining them. Bringing individual results up is exactly what you don't want to do.
In most cases, as in your second example, the distribution of individual results will have nothing to do with the effect you are trying to analyze. It will everything to do with the number of individuals in your sample and each of their underlying sample sizes (like <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=PA" onmouseover="doTooltip(event, jpfl_getStat('PA'))" onmouseout="hideTip()">PA</span></a> in the season in question). In your example the only reason you have 8 players at +5% and 2 at -25% is because you only have 10 players and you expect lots of random fluctuation within 10 players. Has you had 1000 players it would be pretty likely that half would be on one side of your mean and half on the other side. Why would you want to report some random fluctuation within your overall sample? If I was testing a coin and I had 9 people helping me and each flipped the coin 100 times, would it help my inquiry if I reported, "8 of those people favored heads and 2 favored tails?" Of course not. You would simply report the results of 1000 flips. You realize that nothing changes in your analysis how you split up the 1000 flips. It's presumably the same the multiple players in most research like this. Each player is merely 10 flips of a coin. There is no difference among the players.You might as well split your overall sample up by age or months of the year or first half/second half. Presumably every player in your sample belongs to the same population you are studying AND is equally likely to be effected by whatever it is you are looking for. If not, then the methodology of your study is poor to being with and you shouldn't be combining these players in the first place!
Anyway I probably beat a dead horse!
sounds good! Looking forward to seeing more work in this area. Yes, the means for full-time and part-time players will be different however one has to be careful about putting players in one group or another and also to establishing the means of each group. To some extent players get into the full-time group because they have over-performed (relative to their talent). Always remember that when establishing means for purposes of regressing the means MUST be the actual true talent of that group. There are many ways to assign means to groups that are NOT representative of true talent because the groups themselves were determines by random over- or under-performing.
One more thing. Try not to ever get caught up in "Well, if I eliminated this or that person from the sample." It is rarely productive to even think that way, let alone express it in a research piece, and it can lead to bad research. I highly recommend never going down that path. Consciously or not, it can and will lead to skewing results to lead to a desired end.
I would also recommend not to go down this path either: "6 out of 10 of my sub-samples did this even though..." Unless there is a compelling methodological reason to treat differently or as separate, meaningful entities, or even mention separate samples within a group, there is no statistical justification for doing so. It is actually a red flag for amateur or bad research. In most cases if a sample of players (or any entity) points in one direction it makes absolutely no difference if 3 out of 10 or 7 our of 10 of the individual players points in one direction or the other and pointing that out can only serve to mislead or influence the reader to go in a direction that he shouldn't be going. Again, unless there is a methodological or mathematical reason to do so - in which case you must articulate what that is.
Of my last 2 points, you wrote:
"Now, granted, six of the 10 improved, and the overall average swings to positive .003 if you exclude <span class="playerdef"><a href="http://www.baseballprospectus.com/card/card.php?id=66018">Bryce Harper</a></span>’s fall from greatness to really goodness."
You should not be writing that, again, without articulating a reason (which there probably isn't) why it might make a difference or be germane to the analysis.
I like this study a lot Rob. One thing you have to be careful about is this. Comparing changes from one year to the next without taking into consideration regression toward the mean. This will especially wreak havoc when one or more of your groups happen to be well-above or below average players and you don't have any reason to expect that bias to continue. That may or may not be the case with your study.
Best ways around that are to do one of two things. One use a control group of around the same performance level. Two, use projections as your baseline. The latter is often best.
For example in your first chart you get a decline in Tav of 5 points. But is that really a decline? You have to compare it to a projection for that group of players. So a group of players who hit .292 for one season would actually expect to hit maybe .280 the next season with aging and if the mean of their population were .260 (I don't know if it is or isn't - it's probably higher for several reasons). So hitting .287 the next season is actually BETTER than expected.
The second group probably hit around what was expected the next year so there really wasn't a "decline" there either. Again I don't know what their projection would be so take those statements as merely illustrative of my overall point.
Anyway nice work!
While we have little idea how much noise is present in that 58%, I think it is perfectly reasonable for the average superior team to be 16% better than its opponent for various reasons. I think we can easily simulate seasons to test that. The only thing we really need is an assumption about the spread in talent among teams. Interesting exercise for an aspiring saberist!
"The highest salaries in the game are just barely over $30 million, but the best players routinely crack the 8-WAR barrier."
No. Players are paid based on their projected WAR. You know that. The top 10 WAR for 2017 average 6 WAR so the top 10 players would average 48 mil on the FA market if we believed in $8mm/WAR. That the best players after the fact get around 8 WAR is irrelevant. Actually only 2 players cracked the 8 WAR mark in 2016. 3 in 15, and none in 14, so "the best players" is limited to around 2.
Let's simplify the gory math for closers. Take expected average LI pitched by closers (2.5?) or any pitcher for that matter and multiply that by <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=IP" onmouseover="doTooltip(event, jpfl_getStat('IP'))" onmouseout="hideTip()">IP</span></a>. That's all you need to do.
Doesn't WAR already do that for relievers/pitchers? I honestly don't know.
Great stuff. Agree 100%. This is a no-brainer. Not that I don't like <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=DRA" onmouseover="doTooltip(event, jpfl_getStat('DRA'))" onmouseout="hideTip()">DRA</span></a> (I do, other than the fact that it's a black box to anyone other than a high level statistician), but the exact same conclusion could be reached by merely looking at team <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=BABIP" onmouseover="doTooltip(event, jpfl_getStat('BABIP'))" onmouseout="hideTip()">BABIP</span></a>, as you did, or team UZR (or DRS). Team UZR (actually a regressed version of it) directly comes off a pitcher's RA9 or <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=ERA" onmouseover="doTooltip(event, jpfl_getStat('ERA'))" onmouseout="hideTip()">ERA</span></a> to establish his role in preventing runs.
As you said, the "prediction" versus "what actually happened" is not at all relevant to this discussion. ERA minus defense, or DRA, tells us EXACTLY what happened. That it also happens to be an excellent predictor is only by accident (any stat that captures "responsibility" (talent) will tend to be a good predictor, especially when context changes, like when a pitcher moves to another team or park or the defense or catcher changes).
Anyway, good and correct reasoning and presentation. By all rights, Scherzer was a better, probably much better, pitcher this season than Hendricks or Lester because of the reasons you correctly articulate.
Also, the notion that a pitcher "takes advantage of good defense," is 98% nonsense. To the extent that a pitcher might deliberately allow more BIP with a good defense behind him or fewer with a bad defense is de minimus. In addition, ALL pitchers would (and should) do that. If one were to do a study where a pitcher moves from a team with bad to good (or vice versa) defense, I'm pretty certain that the % of BIP he allows would not change by very much. It's simply not possible (or prudent) for it to change much. Yes, the pitching approach should change a little (based on quality of defense), but not by much.
It took the voters over 50 years to discount wins in the CYA. I'm guessing it will take another 50 years to "factor out" defense and pitch framing. And they still don't fully understand park factors (or know which ones to use).
BTW, in citing raw ERA for these players, shouldn't you at least have mentioned or adjusted for park effects? I know that's included in DRA but still you should have noted that even absent a significant difference in team defense between WAS and CHC, the Cubs home stadium is a large hitters park and WAS is a moderate pitchers park. Using my park factors, that creates a difference of .13 runs per 9 after park adjusting (in favor of the Cubs pitchers).
Some of these commenters who simply can't wrap their heads around why team defense has nothing to do with pitching, consider the worst pitcher in baseball who plays in a park where any BIP is considered an out. Should he win the CYA with an ERA of nearly zero? What if he's smart and doesn't try to strike anyone out or walk anyone? (As I said, all pitchers would do that - you don't have to be "smart".)
"There is a somewhat persuasive counter-argument: you have to get to Game 7. Even if Cleveland’s chances to come back and win Game 6 with more conventional bullpen management were low as of Chapman’s entry--Fangraphs’ live win probability estimated them at 3.1 percent--that low chance definitively ends the Cubs' season if it hits."
It's not a persuasive argument at all.
One, the only way for the Cubs to win the WS is to win 2 games. So the correct strategy is always to maximize the chances of winning both games. If you increase the chances of winning game 6 by X percent (even if were to take you to 100%) and at the same time decrease your chances in game 7 of X + something, that's an incorrect strategy.
Second, the entire point of leverage is that it tells you the impact of one player over another. While the Cubs at one point had a generic 97% chance of winning using Chapman rather than another reliever does not make it 100% or even 98%. Using him barely moves the needle at all. It changes that 97% to something like 97.1%.
That's why it was 100% (to keep with the percentages theme) incorrect to use him. One can argue whether using him will have much an effect on tonight's game (and I think one would have to conclude that it will have SOME effect - either in effectiveness, longevity or both) but the important point is that there was virtually NO gain from using him in game 6. So if your "counter-argument" (that you MUST win game 6 to get to game 7, therefore...) were correct, which it is not, then you're still not gaining anything since using him in game 6 barely increases your chances of winning said game as opposed to another relief pitcher.
That was especially true as Maddon brought Chapman in to face Napoli and Ramirez, both RHB!
Russell I doubt there's a "second inning penalty" for relievers. Your results are likely a result of selective sampling.
A sample of relievers who pitched 2 innings will be populated by pitchers who pitched exceedingly well in the first inning (hence the ridiculously low Ra9).
Try this: look at starters who pitched complete games and tell me their <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=RA" onmouseover="doTooltip(event, jpfl_getStat('RA'))" onmouseout="hideTip()">RA</span></a> in the 8th and 9th. Starters who pitched 8 innings, look at their 7th versus 8th. Etc. You choose any time period where a manager has a choice of letting the player continue or not and you're always going to find the post-decision performance to be substantially worse than the pre-decision performance.
I'm sure I'm not telling you anything you didn't already know. You talked about it in terms of completing the second inning. It is true that there's also a selective sampling effect in the second inning (only those who pitch well in the second inning are allowed to complete it) but I'm guessing it's not nearly as large as the first to the second.
It is almost impossible to compare back to back innings for relievers because of this selective sampling phenomenon. Just looking at the performance in those 2 innings tells us nothing.
Why don't you look at all performance in 1 inning including relievers who pitched in 1 inning only and those who pitched in more than one? That's an unbiased sample. Then look at all second inning performance. That's also an unbiased sample. Then compare the two adjusting for the quality of the pitchers in each sample. That's what you're looking for. Your way is too problematic. We don't know the extent of the inning 1 to inning 2 selective sampling effect versus the start inning 2 to finish inning 2 effect.
No. Myers is in essence by changing his stance/approach telling the pitcher, "I know you are going to throw 55% sliders (rather than say, 25%)."
He can't change that to 45% because then he would be exploitable since presumably 55% is the optimal frequency for him.
Your argument would be the same whether Myers "tipped" his approach or not. A pitcher could say, "Well, I normally throw 55% sliders in this situation, and I know the batter knows that (even if he is not "tipping") so I'll throw 0% sliders now. Ha ha! I fooled the batter."
Of course he can't do that. If he did he would no longer be throwing 55% sliders and quickly everyone would know that and he would be way suboptimal.
So again, as long as the batter is acting/thinking optimally and his tell reflects that optimal strategy, there is nothing the pitcher can do to exploit it.
In poker, for example, on the river (last card) one player can say, "I am going to call your potential bluff 75% of the time and throw may hand away 25% of the time," assuming that those are correct GTO percentages, there is nothing her opponent can do to exploit that.
The key is that the pitcher chooses a pitch randomly from a fixed distribution of selections. The batter merely takes an approach that reflects what he knows about the selection distribution. That is something that presumably everyone knows. He doesn't have to keep that approach a secret. Why would he? The bench coach might as well yell out, "Hey, my batter is going to have an approach that suggests you are more likely to throw a FB in this situation." The pitchers just says, "Hey tell me something that everyone doesn't know. You still don't know exactly what pitch I'm going to throw."
Assuming that he is thinking fastball when he should be thinking fastball and thinking off speed when he should be thinking off-speed, then yes, that is correct. He is not telling the pitcher anything that everyone doesn't already know.
If he is just guessing FB or not randomly or incorrectly, then yes, of course he can be exploited. But you specifically mentioned that he was thinking FB in fastball counts and off-speed in off-speed counts, at least against the pitcher you have in the video.
Even if he were incorrect in his thinking and he was exploited how long do you think that would last?
As some of the other readers commented, there is NOTHING wrong with "tipping" the pitcher as to what a batter is looking for, AS LONG AS THE BATTER IS LOOKING FOR WHAT HE SHOULD BE LOOKING FOR, which appears to be the case here.
If the count is 0-2 and a pitcher throws a slider 70% of the time in that count and that situation (and to a batter like me) then I can bend my knees or shout out to the pitcher, "Hey I'm looking slider," and there's nothing the pitcher can do about it because EVERYONE knows that the pitcher is going to throw a slider 70% of the time.
Letting the pitcher know something that HE ALREADY KNOWS YOU KNOW cannot be exploited.
This may be a little hard to wrap your head around because you might be tempted to think, "Well as soon as he bends his knees, expecting me to throw a slider (70% at least), I'll just throw him a FB!
That can't work because THAT would be extremely exploitable. If the pitcher did that, all the batter would have to do is bend his knees and then know he was getting a FB!
This is all about game theory which is THE essence of the pitcher/batter combo. And game theory, as I explain above, tells us that this can NOT be a tell by the batter that can be exploited.
I was wondering if there was any research to suggest that hitting balls into the gaps was a skill (as opposed to just a pull/opp profile and then the scatter of hits within that is random)?
"Kinsler is a model of consistency, and that is by far his most valuable asset."
Why is that a "valuable asset?"
Say Player A had had 40 WAR in his 20-year career with 2 WAR per year and Player B has 40 WAR in 20 years with some large fluctuations from year to year.
Has Player A been more valuable?
Certainly for the HOF we look at peak seasons and not just career wins, right? For that reason alone Kinsler is NOT a HOF player.
Not getting this at all. According to your data:
Your top 10% in line drive rate lose 42 points in <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=OBP" onmouseover="doTooltip(event, jpfl_getStat('OBP'))" onmouseout="hideTip()">OBP</span></a> versus "elite relievers" (as opposed to all pitchers).
All other hitters lose 43 points. Virtually identical results.
Your top 10% in line drive rate lose 143 points in <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=SLG" onmouseover="doTooltip(event, jpfl_getStat('SLG'))" onmouseout="hideTip()">SLG</span></a>
versus "elite relievers" (as opposed to all pitchers).
All other hitters lose 131 points. Virtually identical results although yours do a little worse.
Your top 10% PC guys lose 52 points in OBP versus elite relievers and "the rest" only lose 41 points.
Your top 10% in PC lose 148 points in SLG and "the rest" lose 130.
Doesn't look like your theory holds any water to me whatsoever.
Great interview by the way! A lot of emotion and honesty from Sam and Ben.
Right, the intuition/chemistry difference is an important one. For the most part, making decisions based on "intuition" rather than logic, science, or fact, is a euphemism for, "I don't know the answer, so I'll do whatever the hell I feel like (and it's more often wrong then right)." Making seemingly "suboptimal" decisions because of chemistry or long-run concerns is a legitimate way to make correct decisions and once they are factored into the equation and you re-define suboptimal, they often become optimal decisions (hopefully).
Here is a perfect example of what I am talking about. Shifts, I assume, are muich more likely with <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=GB" onmouseover="doTooltip(event, jpfl_getStat('GB'))" onmouseout="hideTip()">GB</span></a> pitchers on the mound. Don't GB pitchers allow a higher <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=BABIP" onmouseover="doTooltip(event, jpfl_getStat('BABIP'))" onmouseout="hideTip()">BABIP</span></a>? So if the "shift" buckets is populated by more GB pitchers than the non-shift bucket, won't it appear the the shift is "causing" BABIP to go up instead of down. So, again, surely you want to tell us the <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=G%2FF" onmouseover="doTooltip(event, jpfl_getStat('G/F'))" onmouseout="hideTip()">G/F</span></a> ratio of the pitchers in the shift versus non-shift buckets.
"There are plenty of ways that I could be proven wrong, and if someone can prove me wrong, then I'm wrong."
Pizza, that's the whole point. You should NOT be drawing any conclusions about the shift "works" or not. So there should be nothing to prove you right or wrong about. You can tell us what happens to <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=BABIP" onmouseover="doTooltip(event, jpfl_getStat('BABIP'))" onmouseout="hideTip()">BABIP</span></a> or SlgCon or whatever, but please don't draw any conclusions, even tentative or uncertain ones, about the effectiveness of shifting. Without looking at total offense there is simply NO way to draw any conclusions about the "effectiveness" of shifts. None whatsoever. Zero.
The subtitle of your article is "slightly more convincing arguments against the shift." That's complete <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=BS" onmouseover="doTooltip(event, jpfl_getStat('BS'))" onmouseout="hideTip()">BS</span></a>. I don't know how to articulate it any better than I have. We might even EXPECT BABIP to stay the same or go up with shifts, with the assumption that batters are forced to try and go the other way more often (and thus lose power) in order to reduce the effectiveness of the shift but not eliminate it.
To be clear I have no idea or opinion on the efficacy of shifts overall or whether teams shift on too many players (e.g. they should only shift on most extreme pull hitters), but I don't think any of this BABIP data tells us anything about the overall effectiveness of shifting.
And for the record, comparing shifts and no shifts without controlling for team, pitcher, game situation, base runners, outs, parks, renders a comparison worthless in my opinion. It's not like teams are randomly shifting or not shifting so that you have a natural RCT study. There is a reason that teams shift or don't shift and those reasons likely render the two buckets unequal in terms of expected BABIP. I would never think of comparing shift and no-shift data with the assumption that will give me the EFFECT of the shift.
You say there is no way to control for pitchers, etc. Why did you say that? You have the pitchers, parks, fielders, etc. for shifts and no shifts, don't you? So tell us the collective UZR of all the fielders in both buckets, the <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=FIP" onmouseover="doTooltip(event, jpfl_getStat('FIP'))" onmouseout="hideTip()">FIP</span></a> of all the pitchers, the infield park factors for all the parks in both buckets, the average base runners and outs for both buckets, etc. The platoon matchups for both buckets. That's the way to control for these things, right? How can a strict comparison of the data WITHOUT controlling for all these things, possibly be reliable when teams usually have a reason for shifting or not shifting?
No one (at least me) is missing the point. That IS my point. That pitchers and batters will clearly change their approach. Which is why we need to look at a comprehensive stat.
When you compare shifts to no shifts for batters, how are you controlling for pitcher, park, game situation, etc.?
It is likely that when batters are shifted or not shifted, they face a different pitcher, park, defense, game situation, etc. in each bucket and there is no reason to think that those differences are not biased.
For example, some teams heavily shift and others do not and the pitching, park, and fielding of those teams that shift a lot are probably not equal to that of the teams that don't shift a lot.
And why do you keep focusing on <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=BABIP" onmouseover="doTooltip(event, jpfl_getStat('BABIP'))" onmouseout="hideTip()">BABIP</span></a> (and slugging) and then trying to conclude whether shifting is good or bad?
Shifting clearly will impact <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=HR" onmouseover="doTooltip(event, jpfl_getStat('HR'))" onmouseout="hideTip()">HR</span></a>, K, and <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=BB" onmouseover="doTooltip(event, jpfl_getStat('BB'))" onmouseout="hideTip()">BB</span></a> (regardless of what Mr. McKay found) because clearly batters and pitchers will change their approach with and without a shift (they SHOULD change their approach).
We can't conclude whether shifting is effective or not (or roughly a wash as you keep saying) without looking at a comprehensive stat like wOBA, right? In terms of "effectiveness" we don't really care how it impacts BABIP or some other subset of offense. We only care about how it impacts overall offense, right? If it increased BABIP but it decreased wOBA because batters were going the other way more but losing power, then it would be effective despite not lowering BABIP, right?
And if you include other aspects of offense, which I think we can all agree that you must, what data are you using that indicates whether a shift occurred? Is there data that records shifts regardless of the outcome of the play? In other words, are shifts or no shifts recorded when a batter hits a HR or walks or strikes out?
If not, then you can NOT do a comprehensive analysis and you would have no idea whether shifts are effective or not, right, since you would have no idea how they affected walks, BB, HR, etc.?
Just wondering as I am underwhelmed by all of the shift analyses I have been reading...
"And in case you were in need of more evidence that intentionally walking guys is bad on the aggregate, teams that have had one player draw three free passes in a game since 1913 are 118-49, with 40 of those 49 losses coming by only one run."
That record is NOT because of the intentional walks. It is because the team that issues the <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=IBB" onmouseover="doTooltip(event, jpfl_getStat('IBB'))" onmouseout="hideTip()">IBB</span></a> is usually losing or the game is tied (or pitching team is up by a run) and there are runners in scoring position.
In other words, in those exact same games if we were to rewrite history and have the manager pitch to the batter rather than IBB him, the w/l record of those games would still be around the same.
I'm not getting why you are singling out the error with your example and your logic. The same can be said for the same ground ball that is turned into an out or is a hit - that it could/should be assigned a value of .42 hits (or whatever the average value of that batted ball is).
And, as you allude to, using batted ball values for hitters would be a mistake, as hitters face different fielding configurations. You would have to have a lot of custom batted ball values and use one to fit the profile of the hitter. For example a hard hit ground ball to the hole between 1st and second might have an average hit value of .75, but if a LH hitter is shifted on all the time, then it would be unfair to give him a .75 hit value every time he hits a ball there at that speed.
Yes, of course. That would explain NO difference after being successful or not. But Russell found a large difference in favor of changing location after being successful. That would imply a non-random process which SHOULD be able to be exploited by the batter.
Russell, can you look at the result when the pitcher changes location and when he doesn't? In game theory equilibrium, the results should be exactly equal. If batters are exploiting the fact that pitchers tend to go to the same location after a certain result in the last <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=PA" onmouseover="doTooltip(event, jpfl_getStat('PA'))" onmouseout="hideTip()">PA</span></a>, then we should see a slightly better result in "after failure" pitches to the same location.
Let me try and give a little primer on projections in general.
A good projection must do 2 things: One, it must project a range of talent that is a significant percentage of the actual range of talent. A range in talent in a projection can never equal the actual range of talent in the population.
So, let's say that the true range of defensive talent in CF is +- 25 runs (per 150), say, at 3 SD.
A good projection might go as high as 15 or 17 runs.
The lower the range of projections, relative to the actual range of talent, the poorer the projection. Taken to the extreme, a bad projection system could just project everyone at league average. A bad one that uses a little info could simply project all players they somehow know or suspect to be good at +1 and -1 for the players they know or suspect to be bad.
Now, just because a projection system projects a wide range of talent, relative to the actual talent range, still doesn't mean it is a good projection. Those ranges have to be accurate. IOW, if you look at every player projected at +10 to +15, their combined future performance needs to be around +12.
If, as is suggested here, all +20 players perform at a +10 level, all +10 players perform at a +5 level, then those projected ranges are bogus.
It could be that the procjetion algorithm is simply not regressing enough and/or there could be other problems. Either way, it is a problematic projection system and it needs to be adjusted in one way or another in order to "scale" those projections correctly before we can even begin to evaluate the system.
Now, if the projected players at +20 perform at +6 and the projected players at +10 perform at +3, and those at +5 perform at +1 (and the same for players projected in the negative), then what looks like a good projection system, because of the nice wide ranges, is bogus. It still may be basically a decent projection system, but something needs to be tweaked (as I said, maybe just regressed much more aggressively), and the system needs to be evaluated based in part on the "real" spread (in this example like plus or minus 6) and not on the bogus spread of +-20, again, as compared to the spread in true talent (as best as we can estimate it).
Finally, even if a projection system has a nice spread and that spread holds up to scrutiny (it equals the spread in observed values group to group - i.e., all +20 combined perform at +20, all -15 perform at -15, combined, etc.), we also need to verify that it is measuring accurately what it purports to measure, in this case, defense or fielding. I can come up with a completely random, bogus metric which purports to measure defense but doesn't, and I can construct in such a way that it presents a nice range of projections AND those projections match up with future performance. Of course that future performance is not really measuring defense. This is the whole reliability/accuracy dichotomy you learn about it STATS 101 in college.
So as far as testing whether Pecota's extreme fielding projections are "good" ones, the first thing that MUST be done is to take their range of projections and test them against future performance, using aggregate players and aggregate opportunities to cut down on the noise.
Dustin did this to some extent, and the results were not good. Absent any other information, that tells you that a +24 projection is bogus.
If BP is claiming that they have changed their methodology such that their projections are more accurate such that they can accurately project a wider range in talent (remember that a performance projection is essentially - not exactly - a talent projection), then they simply need to use this new methodology to back test and see if their projected range matches up to the actual performance range.
It is one of the first tests that you do when you do projections! If it fails then you need to go back to the drawing board and figure out why. If your methodology is good it is usually a simple matter of not regressing enough given the sample size of the historical data going into the projections.
Dustin, I understand. Right, your own numbers showed evidence that Pecota (and perhaps other projection systems - I don't know) overestimates what they consider the best defenders. (BTW, to say, "Well we did this in the past, but our projections are better now," with zero evidence that that is true and that you fixed the over-estimate problem, is a poor argument).
Even without that evidence, anyone who does projections on a regular basis or for a living, should realize that you simply cannot project a CF'er at that level. If nothing else, add more regression to your def. projections. I don't know, really, as I don't know the nuts and bolts.
But, to say, "Yeah, I believe this projection is wrong, too," and still proffer that projection, well, that seems to me to be wrong in some way.
The article is fine.
Is anyone at BP willing to stand by that projection? If no, why would they put it out?
You cannot project a CF'er with limited MLB (really any playing time, because once you get lots of time, aging becomes a factor) for +24. Not possible. I've been doing defensive analysis and projections for almost 30 years.
Your own projections versus actual, no matter how much you want to qualify them (e.g., our projections are much better now than they were then), are evidence that that is true.
A +24 CF'er is equivalent to a +34 RF or LFer, just to give you some context for how good of an outfielder a true +24 player has to be. Not even sure there is such a thing. Even if there is, you can never project someone to be that good. Certainly not someone with 44 games in MLB!
I can't emphasize it enough that there is simply no way to project a CF'er at +24 runs. None. What is the cap? I don't know. Probably somewhere around 15-20 if you think that a player is the absolute best outfielder you have ever seen AND he has the historical numbers to back it up over a relatively large sample of performance.
My contention is even stronger when you include the fact that a projection must include the normal chances of minor and major injuries. Even if you project a player +20 IF they stay perfectly healthy, that is not a proper projection. The projection has to include chances of injuries affecting the performance.
If anyone disagrees I will happily go with an MGL wager on the under and I'll give you 2 runs! I'll take under 22. Can't use <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=FRAA" onmouseover="doTooltip(event, jpfl_getStat('FRAA'))" onmouseout="hideTip()">FRAA</span></a> at the end of the season though. I have no idea whether it inflates good fielders or not. Have to use DRS or something like that.
I am also confused as to how an umpire can be more accurate on 3-0 and 0-2 counts with vastly different zones.
I somewhat agree with that. I always thought that the notion of the "compassionate umpire" was a metaphor.
I any case, I am not seeing much of a functional difference. Some, but not much.
" I think that the point of the article is that if the umpire can improve the % of correct borderline calls at the expense of correct calls on pitches that are not close, then the overall percentage of correct calls will go up, which is a desirable result."
I don't think that is what Guy is suggesting or what umpires are doing. They are not sacrificing accuracy of calls on non-close pitches. That is never the case. It is always assumed that those pitches are called accurately.
He is suggesting that they are merely guessing on one direction or another on close calls, based on where they typically are thrown. I am not doubting that. I am doubting the motivation.
Not really buying this. If umpires are being Bayesian and actually improving their accuracy as the count progresses (or at least maintaining accuracy), then the size of the zone should not change much, if at all. Changes in the size of the zone are a proxy for poor umpiring (assuming that good umpiring means calling a consistent zone).
"There is no TTOP - it's all about the pitch count?" I am extremely skeptical, especially since I found evidence of the complete opposite (by controlling for pitch count) but I'll keep an open mind until I can do some more research. Shouldn't be too hard to find evidence for or against.
I am not crazy about the #9/#1 hitter thing for reasons that Russell mentioned (there could be other things going on, like selective sampling, for example pitchers allowed to face the top of the order tend to be better pitchers).
You are missing my point. I am not commenting at ALL about the "accuracy" of the Pecota projections.
Exactly. Or at the very least figure out what Pecota is missing since clearly most of the staff think that Pecota is missing something big, at least with respect to the Royals.
I say "staff failing" because the staff at BP is NOT supposed to fall prey to that kind of irrational thinking.
Now, this whole controversy is specifically based upon the Royals stellar record the last 2 years compared to Pecota's projection for those years.
So BP has 2 and only 2 choices: Either include past season's w/l record into the algorithm or ignore them and
stick to your projections.
You can't have it both ways.
I'm glad that someone recognized my point. I knew that I would be ridiculed for my post, which is fine. I am very confident in my statement and my logic that goes with it.
It's a little bit (maybe a lot) like a poker player who has been on a tremendous rush and picks up a bad hand that he should not play (even when considering how other players may change their play against her based on the rush). 95% of the typical players will play the hand fully expecting it to have a positive expectancy. A professional player will not (or should not).
Or a blackjack player who has busted his hand 10 times in a row picks up a 16 versus a 7 and refuses to hit.
I could go on. That is the way it seems to me wrt to the Royals.
Either believe in your projection as the best over/under you can put out there, change it to reflect whatever drives your belief that the Royals projection is wrong, or trash Pecota. I don't see any other ration response.
If it is an otherwise smart and rational person who knows better, the offer to make a substantial wager usually cures whatever ails them at the moment (succumbing to irrational and/or suboptimal human thinking usually because of poor pattern recognition). In this case, I think Kevin is just being...well I can't think of the right word.
But everyone else - most of them know that their 88 or 90 wins is foolish, I believe. If they didn't that simply wouldn't be congruent with their mission at BP.
As soon as they have to put their money where their mouth is (obviously they don't "have to" in this case), that's the moment when logic and rational thought (and everything they work for on this web site) suddenly kicks in. That's the MGL rule.
Yes and I don't know what an "internet bet-threat is?" What did I "threaten" to do (if what didn't occur)?
Lol. $100 to a charity of the winner's choice?
"Our staff didn’t want it. It’s not just that when we polled our writers for their own Royals predictions—before <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=PECOTA" onmouseover="doTooltip(event, jpfl_getStat('PECOTA'))" onmouseout="hideTip()">PECOTA</span></a> had been run—not one of them went as low as 76. It’s not even that not one of them went lower than 80, or that only one of 27 responses was lower than 85, or that the plurality response was 90, or that the average was 88. It’s this: When I asked a follow-up question a few days later—“If I told you PECOTA projects them to win 76 games this year, does your answer change?”—the response was overwhelmingly “nah.” In fact… counting them out… 73 percent of staff said it didn’t change their answer at all. As one put it: “No, because the projections just seem to not like the Royals.” This seems awfully close to a crisis of confidence. We’ll get into that."
To be honest, that says more about your staff's failings that the failings of Pecota.
I'll invoke one of MGL's rules here with a wager offer:
If ANY of your staff wants to make a wager for any amount of $ (charity or otherwise) where they will take over 85 1/2 and I'll take under 85 1/2, just let me know!
Given that the plurality was 90 (whatever that means) and the average was 88, surely lots of that staff would jump at that wager.
I suspect that none of them will, which means, according to my rule, that they really are full of sh**! ;)
I have to say that I don't see much value in any comparison that does not use some kind of "delta method" whereby you are are comparing the same pool of players. Clearly these pools are not the same. You warn about that "possibility" but it is a certainty and I believe fatal to your analysis.
Tango (and I) have done plenty of work on this (you should have cited at least SOME of that work!) subject using the delta method. That is how we come up with the positional adjustments in the first place. Well, at least one of the methods.
I would bet for example, that when you compared players to themselves, you would find that regular SS did indeed have better numbers at 2B than regular 2B. Likely true at third also, although third and SS have greater differences in skill-sets than 2B and SS.
At what level are these metrics regressed toward the mean? For example, UZR not only does not represent true talent, but it also does not represent "what actually happened." If a player has a +20 UZR in 150 games, it is likely that "what actually happened" was maybe +15 runs. As well, his true talent and what we would expect going forward, is likely +10 runs or something like that.
What about these metrics? Do they represent "what actually happened" regardless of how extreme they are? I assume they don't represent true talent (which would require some kind of further regression toward the mean)...
Extra inning games are colder, hence fewer home runs. Players are tired - that could factor into it as well. If <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=HR" onmouseover="doTooltip(event, jpfl_getStat('HR'))" onmouseout="hideTip()">HR</span></a> are more valuable (I don't know that they are - one run strategies ARE and HR are not one-run strategies unless bases are empty and there are 2 outs), pitchers would know that and pitch more in the bottom of the zone.
In my article in THT Annual, I hypothesized that when batters are "hot" they tend to swing for the fences hitting more HR but doing poorer in other aspects of offense. Same might be true when teams are "hot" (and the reverse when they are "cold."
Ah, example, # 8,413 when a major league player gives us insight into the game (analysts don't know anything because the haven't played the game) and are completely wrong. By the way, this is a typical example of how, in all fields, human beings' propensity for bullshitting, having a poor memory, and inability to make sense of the world, including their own experiences, trumps their "expertise" and experience.
You see if that's his real reason for hitting him leadoff God knows how many other decisions he makes based on ignorant superstition.
That is actually amazing. Not the record. That <span class="playerdef"><a href="http://www.baseballprospectus.com/player_search.php?search_name=Ned+Yost">Ned Yost</a></span> actually has a job managing a major league team.
Certainty of the estimates increases with sample size AND with other non-empirical data like arm angle and pitch repertoire. For example, if a pitcher throws like <span class="playerdef"><a href="http://www.baseballprospectus.com/card/card.php?id=49218">Brad Ziegler</a></span>, even without any empirical data, it is almost guaranteed that he will have a large split. Similarly, if a pitcher throws tons of sliders like Romo, it is almost guaranteed that he will have a large split.
Here's where people go wrong though. The certainty of estimates has virtually no bearing on strategy. If what you are estimating is relevant to a strategy decision, then you must use the estimate whether there is much certainty in it or not.
For example, say a left-handed pitcher comes in and you know nothing about him other than he has had a huge reverse split in around 50 <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=IP" onmouseover="doTooltip(event, jpfl_getStat('IP'))" onmouseout="hideTip()">IP</span></a> or the entire season so far. Another lefty comes in and he has an average split in around 1000 IP. A third lefty comes in and you also have 50 IP data on him and he has a huge PLUS split.
In the first instance you pretty much estimate his true splits at around league average for a lefty or PLUS 29 points despite his huge negative split for the entire season (you know nothing about him prior to that). Maybe your estimate is only plus 27 or 28 because the huge negative in 50 IP does count for SOMETHING.
In the second case, you of course estimate his true splits to be the same as he has been for the last 15 years - league average.
In the first case, you are not very confident of your plus 27 estimate. In the second case, you are VERY confident of your plus 29 estimate. In the third case, like the first one, you are not very confident of your +30 estimate.
So, all 3 pitchers have around the same expected splits - around PLUS 29 points. But, pitcher one has had HUGE NEGATIVE splits (he actually pitched better v. RHB) all season and that's all you know. Pitcher two has league average splits his whole, long career. Pitcher three has had HUGE positive splits all season long and that's all you know. He has been deadly on LHB and/or RHB have murdered him.
Do you respond to any of these pitchers and differently? No! You treat them exactly the same. You MUST assume that they all have around the same true platoon split. That when you send up you batter to the plate, a RHB will do much better than an equally talented LHB. Against all 3 pitchers. Yes, you are MUCH more certain that that is the case with pitcher #2, but certainty is irrelevant in your decision of whom to bat against him.
This is counter-intuitive but true.
Now if you had any more information on these players such as arm angle (which you obviously do, but for this exercise we assumed no other information or data), that information would have MUCH more of an effect on your estimate for pitchers 1 and 3 then for pitcher 2. That is where the certainty comes into play. Or if you were deciding which pitcher to acquire and you wanted a <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=LOOGY" onmouseover="doTooltip(event, jpfl_getStat('LOOGY'))" onmouseout="hideTip()">LOOGY</span></a> (or not) then the certainty might come into play in your decision. But in terms of how to respond to this pitcher, certainty is irrelevant.
My models have nothing nothing to do with explaining the decrease in splits for RHP.
Could be pitcher repertoire (maybe more changeups or 2-seamers). Could be average <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=TBF" onmouseover="doTooltip(event, jpfl_getStat('TBF'))" onmouseout="hideTip()">TBF</span></a> per pitcher. If the number of batters faced per pitcher decreases (i.e., more specialists or starters pitching to fewer batters) then the number of outliers by chance will increase.
There is no such thing as "a certain sample size to stabilize the metric."
Yes, it is possible to tell if that is the case. I have not done that, but I suspect that if it has changed, it has changed very little. The two factors for magnitude of splits is arm angle and pitch repertoire. I doubt that arm angles have changed recently and I don't know whether pitch frequencies have changed much.
If anything with more and more reliever "specialists" you would think the average platoon split would have increased and not decreased.
I don't know that shifting would have anything to do with platoon splits, but you never know.
FWIW, I have Mchugh with an estimated true platoon split of 14 points in wOBA, which is slightly smaller than the average RHP.
I use historical stats, adjusted for the RH and LH talent of the batters faced, as well as THEIR platoon splits, and I use arm angle and pitch repertoire in a regression formula to establish the baseline toward which to regress.
So I think my platoon estimates are pretty good. Very few of those pitchers who exhibited reverse splits for one season have true actual reverse splits. An easy way to see that is to simply look at that group in another year.
My guess is that of the 40% of the population that shows reverse splits, 80% of them will show positive splits in any other year.
A RH SP typically faces around 300 to 400 lefty batters in one season. You have to add around 2000 <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=PA" onmouseover="doTooltip(event, jpfl_getStat('PA'))" onmouseout="hideTip()">PA</span></a> of league average platoon splits to estimate a RH pitcher's true platoon split. That means you are regressing a one year split for a full-time RH starter around 83% toward league average!
As you can see, it is almost impossible to estimate a reverse split for a RH pitcher from one year stats. It also means that almost all of the actual one-year splits you see in a pitcher is noise.
In fact, you can safely IGNORE one year splits and just assume that a RH pitcher has a true league-average platoon split. That's A LOT better than using the actual split to assume his true split. When you regress something 80 or 85% of the way toward league average, you might as well just assume league-average.
I get so tired of hearing about a pitcher's one year splits on TV. And then, they quote batting average against. For a pitcher! Take a worthless stat to begin with, BA against for a pitcher, and make it even more worthless (if that is possible) by quoting a one-year split. Might as well quote a split for pitchers based on the phases of the moon and assume that that has meaning in the present or future too.
For a left-handed pitcher, the regression is about half as much. Of course, they typically face many fewer LHB so your sample size in one season is a lot smaller. So for a full-time lefty, you would regress maybe around 50%, so one part actual splits and one part league average. For a full time lefty starter.
Don't even talk about one-year reliever splits...
"The odds are that if he keeps using this questionable tactic this freely, it's going to cost the Angels their season..."
Why would you say that? Each <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=IBB" onmouseover="doTooltip(event, jpfl_getStat('IBB'))" onmouseout="hideTip()">IBB</span></a> probably cost on the order of 1/100 of a win. So while they are likely incorrect, you're not going to "see" the negative effect in 10 games (or even 100 games). They are nearly as likely to result in a good outcome as a bad one.
Shirtless boy looks like he is wearing a muscle shirt, no?
We're all guessing of course, but I don't think any manager makes 2 bad decisions a game. Then again, we really would have to establish the base line and what constitutes a "bad decision." For example there would have to be a threshold, say, a decision that cost .5%.
In my opinion, the average manager can add 1-2 wins a season with optimal decisions. The bad manager, relative to the average one, maybe 2-3. That is a very wild-assed, but educated, estimate.
Also if you take my word for it that a bad decision costs .5% to 2%, you cannot say that 320 cost 1.6 to 6.4 losses. That 50 to 200 is a range from the most to least egregious decisions. Actually, that's not true. Obviously the limit to the least egregious is 0, but we can define any bad decision as one that cost at least .5%. I believe the cap is around 2% and that happens very infrequently.
So, over 320 bad decisions, the average decision across that .5 to 2% probably cost around 1% which would mean a total of 3.2 wins for a manager who makes 320 of them, which, as I said, is not a realistic number, I don't think.
"And - after accounting for every other event that has occurred in said game - a bad managerial decision in the last inning may push a game from a win to a loss."
There is absolutely no way to know that, that's why the only useful heuristic is the 19/20% or the "it cost 1% in WE."
It's not like, "fielder makes an error that cost the game." If a manager decides to yank the starter for a reliever in the 9th (or alternatively, leaves him in) and the pitcher loses (or wins) the game, there is NO WAY TO KNOW the culpability of the manager based on the outcome of one game. THAT IS WHY we must use the, "the wrong decision cost 1% of a win" framework.
And if you use your heuristic, the times when the manager made the correct decision will result in a loss sometimes as well. And when he makes the bad decision, the result will sometimes be a win. There is NO way to know from the outcome whether the decision is good or bad. Again, that is why the ONLY framework that has any practical value whatsoever is the one where you estimate the "cost" or "gain" of a decision in marginal wins per decision, which is on the order of around 1%.
You and Jeff are arguing about angels dancing on the head of a pin. Actually I have no idea what you are arguing. None whatsoever.
And for the record, Jeff uses the term "regressed wins" completely speciously.
"In terms of win probability, Lichtman is 100 percent correct in saying that any one decision has a small mathematical impact on the outcome of a game. In terms of game outcomes, though, a manager's decision when sitting on the razor's edge between winning and losing can have a large impact on whether that game is a win or a loss."
Jeff, you are completely 100% wrong on this (and you should know it).
The whole point of one decision being .01 or .02 better in terms of WE than another is that it does NOT make much difference in a course of a game. That completely contradicts your last sentence, which makes no sense whatsoever. It can NOT and does NOT have a large impact, unless by "can have an impact" you mean that the team might lose the game say 20% of the time after said decision. But the answer to that is, "Had the manager made the correct decision, the team would lose the game 19 or 18%." That the entire point of understanding the one game impact of a decision.
Yes, those mistakes add up and I am by no means suggesting that we shouldn't care about manager mistakes if we are a fan of their team because they have little impact on one game.
That would be like me saying that you shouldn't care about your child not wearing his helmet while bicycling because it only increases his chances of getting seriously injured by 1 in a 1000 each time he rides.
Anyone who thinks that my comments mean that you shouldn't care about manager decisions has apparently not listened to a thing I have said over the last 20 years, or they just want to write a story (I am not suggesting that you are doing that).
I think the main issue is that you are looking at only one small and probably unlikely potential influence of veteran players. I can think of 4 or 5 things I'd want to look at but "mitigating the grind" was not one of them.
That's the funniest and smartest quote I've heard in a long time.
There is not that much difference between the value of a closer and a starter. If a closer pitches 70 innings at an average leverage of 2.3, that is the equivalent of 160 starter <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=IP" onmouseover="doTooltip(event, jpfl_getStat('IP'))" onmouseout="hideTip()">IP</span></a>.
I don't know how you dismiss it. I think it is a fatal error, not accounting for that selection bias. And I'm not really sure that you CAN account for it, especially when you are looking for some small differences.
You can almost never use playing time either in-game or season-long, and expect it to be independent of performance, especially in-game (for pitchers of course).
We can assume that the pitcher will gain or lose a mph when converting and we know how much that is worth (1/3 run per mph?).
We also know what the times through the order is worth (1/3 of a run per <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=TTO" onmouseover="doTooltip(event, jpfl_getStat('TTO'))" onmouseout="hideTip()">TTO</span></a>). And we know that pitchers with fewer pitches in their repertoire get hurt a little more in the TTOP (and vice versa).
So I think we can make a pretty good assumption about the increase or decrease in rpg when converting.
We don't know about the stamina factor, especially for reliever to starter conversion. I think it is a safe bet that some relievers simply don't have the stamina for that. Of course we don't know which ones.
Finally, I don't think you can use the sample you are using for this kind of analysis. To much selective sampling (actually I am surprised at the results given that). When a reliever pitches a few innings, he is likely to be pitching well and when he doesn't he is slightly likely to have pitched poorly.
Good article and good insight on McHugh and spin rates in general.
"First his average and max spin readings are fairly close, which indicates that most of his curveballs are spinning at roughly the same rate. Additionally, a high percentage of this spin is causing movement on the pitch, which again would seem to promote consistency."
I certainly agree that having consistent useful spin rate likely leads to better than average command. I am having a hard time, though, understanding why more useful spin in general leads to greater command. While more useful spin is probably harder to hit simply because it produces more movement, I would think that more movement (more useful spin) would be harder to control not easier. That is one reason why curveballs and sliders are harder to command than fastballs.
But, that is a minor quibble with an excellent article.
The reason that you throw mostly fastballs when you are confident that the batter, be it pitcher or position player, is going to bunt is simple - you want to make sure that you don't walk the batter (or allow him to hit away if you fall behind). That is the only reason.
You are using park factor and temperature in the model, right? Since PF includes average temperature, you would need to use a delta temperature (temperature in the pitcher's outings minus the average temperature at that park) in the model, right? Do you do that? It sounds like you do not, as you say that Schmidt benefited from the low temperatures in <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=SF" onmouseover="doTooltip(event, jpfl_getStat('SF'))" onmouseout="hideTip()">SF</span></a>. Those low temperatures are already included in the PF so there would be no need to include temperature unless his average temperature while pitching was different than the average temp in SF over the span of the PF being used. That would be double counting otherwise.
Can you explain the last 3 columns of the first chart? And do you know how pitchers typically grip (and throw), or the different versions of grips, of the "cutter?"
We have no idea as to the answer to that question, but my educated guess is no.
I was just using that as an example. It could by any number. As long as you held pitch count relatively constant.
I think you're making this more complicated than it is. One, batters perform better as they face the same pitcher more and more. Any baseball player will tell you that that is the case. Two, when you are unfamiliar with a pitcher, even if you have seen him before in previous games, you take many more pitches simply because you are more tentative. So of course you will walk and K more early in the game. And as you become more familiar with the pitcher, you are more aggressive, you get fooled less, and you hit the ball harder. It's really not complicated or mysterious.
I am not disputing that pitcher and batter approaches change as the pitcher cycles through the order. I am simply highly doubtful that the increasing advantage by the batters is due to a sub optimal approach by batters early in the game or a sub optimal approach by pitchers later in the game. There is just no way that we would see such a drastic penalty each time through the order.
As far as fatigue versus familiarity that should not be too hard to figure out without doing regressions (of which I am not a fan of unless they are absolutely necessary). I mean if fatigue is a driving factor and familiarity was not the driving factor we would see at least 2 things. One, the bottom of the order would show a larger advantage than the top. And two, if we simply look at 2nd and 3rd time deltas while holding the pitch count constant we should see very little difference if it were all about or even mostly about fatigue.
For example if we look at second time penalties at 80 pitches versus 3rd time penalties at 80 pitches that should give us some idea as to whether the overall effect is mostly fatigue, familiarity or some combination of nearly equal parts.
Have you looked at those 2 things?
I disagree with your thesis and conclusions. My research made it pretty clear, at least to me, that the times through the order penalty is about familiarity and not fatigue. In other words there IS magic to not facing the 19th batter. That is easy to test though and I'm surprised you did not.
Simply look at the results of the 18th and 19th batters in all games against the starter. If there is no magic then we should see a blurred line between the performance of the two. If we see a bright line, which I think we will, then that strongly suggests it's all about familiarity.
I also don't know why you ascribe some magic to the third time through the order. There isn't. The penalty from one time to the next is pretty constant other than the huge first time penalty for the road team.
As far as the penalty having to do with pitcher and batter approach and not so much familiarity per se, that is possible. However, I doubt that. If it were, as you imply, that would mean that batters or pitchers are using a suboptimal approach either at the beginning of the game or as the game goes on. Plus, I have shown evidence that the more pitches a batter sees in his first <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=AB" onmouseover="doTooltip(event, jpfl_getStat('AB'))" onmouseout="hideTip()">AB</span></a>, the better he performs, by a lot, in the second AB. Again, that suggests that the effect is one of familiarity.
When looking at a player who is 31 (or any age for that matter) over several years, you really need to compare him to the average player over that time period. Batters typically get a low worse in their early 30's from normal aging. I would assume that is a big part of Braun's decline.
I have to disagree with you 100% on one level. I don't think for a second that the rationale for the system as it is set up - there must be incontrovertible evidence in order to overturn a call - is because umpires have some magic skill, intuition or vantage point and we must respect their decision because usually they are right even though the high speed slow motion video evidence suggests otherwise on close plays.
They merely don't want to have that many challenges and that many overturned calls, for whatever reasons. And although the system does not have to be that way, I think that it is quite defensible. For one thing it takes away some of the grey area. If you simply had a "de novo" review (i.e., the umpire's call is irrelevant to the decision), and the standard of review was "by the preponderance of the evidence," that would make some very close decisions difficult, but more importantly, controversial. What if one video umpire sees it as 51% and another one at 49%? What if the fans see it as 70% and the video ump sees it at 49%?
It is simple, reasonable, and more importantly, practical for them to use the system they use (must be clear and convincing evidence to overturn). We use that system typically in the legal/justice system for similar reasons.
As I said, you can argue that you prefer a different system, but you have to give your reasons. You have to discuss the pros and cons for both sides, which you didn't. And more importantly, you have to discuss both sides accurately, which you most assuredly did not. You misrepresented the status quo. As I said, the reason for the present system is NOT that they think that the umpire has better judgment than the high speed video, as you claim in no less than two sentences. That is a strawman. The reason, as I mentioned, is reducing the number of challenges and overturned calls and reducing controversy on close calls. The price for that is a few wrong calls on the close ones. Even this imperfect system, however, is far better than pre-replay (assuming that we want to get as many calls correct as possible).
Good analysis, Russell!
"But then again, if he then started swinging away every time after that, the shift would come back. There’s a cat-and-mouse game to be played (game theory!)."
That isn't true and there is not really any game theory involved because one side gets to see the other side's strategy before deciding upon theirs.
Once someone like Big Papi indicates his intention to bunt when the shift is on, and his overall WE is greater than no shift, then the defense will stop shifting of course. Then if Papi swings away every time once the shift is removed, which he will, there is nothing that the defense can do about that. They can't start shifting again. Every time they do, he will simply bunt.
Now, if each side had to make their decision without knowing their opponents' decision, and stick to it, then game theory would be implicated and both sides would choose a strategy which resulted in a Nash equilibrium.
Papi would bunt some percentage of the time, randomly, and the defense would employ some kind of hybrid partial shift, presumably, or they would shift or not shift some percentage of the time.
Come on Russell!
Or you could just check The Book, where it says that in most cases don't walk the 8th hitter unless there are runners on second and third and 2 outs. ;)
Uh, virtually all of them.
Possible but very unlikely. I don't think any managers or teams engage in this kind of "meta game" thinking or action.
1) I am pretty sure that he thinks he is gaining an edge. Most managers do whether their strategies are correct or not. My comment was hyperbole to illustrate what I think is his personality style (egocentric). Any evidence I have is that he makes many unconventional decisions that I believe are incorrect or marginal at best, this being among them. I have written about some of them if you want to do a Google search.
2) IF there is any negative effect at all, and I don't know that there is, then it makes a break even situation into a negative one, so I think it is worth mentioning. By itself, the effect is probably de minimus.
Glad you came up with a coin toss, because that is what I have been saying for years: It makes absolutely no difference whether the pitcher bats 8th or 9th, as far as we know, and probably entirely depends on the exact composition of the lineup.
Maddon does this to show, with no evidence, that he is smarter than other managers. Whether he is or is not, I don't know (probably is smarter than most), but I don't think he is very smart at all compared to the "perfect strategist."
BTW, when you came up with .5 runs per year better, THAT is a coin toss as .5 runs a year is exactly the same as zero given the noise in your methodology and the sample size.
I also agree with the person above that, "Is .5 runs a year (and we have no idea what it is really, other than around zero) worth it, considering the shaming of the #9 hitter?"
Surely that could hurt his confidence, motivation, and psyche to the tune of more than .5 runs a year, no?
The battle between pitcher's first strike % (pitches in the zone) and batters swinging at the first pitch is classic game theory. There should be an equilibrium and likely is.
That equilibrium changes all the time because it is based on so many factors which also change over the years - batter strike zone recognition skills, pitcher command, batter power, pitcher power, etc.
Let me give an extreme example. Let's say that either pitchers were really good or batters were really bad. Batters would be better off trying to get a walk, because swinging at even a strike is relatively futile (because batters are so bad or pitchers are so good - like a pitcher at bat facing Kershaw). So batter would be taking a lot of first pitches, hoping for a ball. At the same time, pitchers would be trying to throw lots of strikes both because batters are likely taking and even then they swing, they don't do much damage.
That is actually where we are at now, more or less, relative to the past.
No. They are the same. The value of the CS is not measured as compared to a SB. It is measured as compared to not doing anything at all.
Why is a pickoff worth half of a caught stealing?
And the magnitude of the effect for pitchers/managers strains credulity.
Right, it is more likely to be a player specific effect, which would tend to make teams/managers consistent from year to year. It could also be a park effect which would really maintain from year to year. Day games at Wrigley. Hot summers in Texas, KC, and St. Louis. Etc.
But I am still hung up on how you could just look at the data and find that both batters AND pitchers get worse as the season wears on. That makes no sense from a data point of view. The batter/pitcher matchups have to go up for the pitchers to get worse and down for the batters to get worse. Which is it?
Even if they didn't, you would still have to figure out how to separate them out. I'm not assuming anything other than that the effects are inseparable.
It's like noticing that offense is down from one time period to another. Did pitchers get better or batters get worse? You have to do some mathematical gyrations to try and figure that out, and even then, it might not work.
For one thing, if batters and pitchers were to get worse as the season wears on, you would have a really hard time, perhaps impossibly so, of separating the two. They would cancel one another out and virtually any pitcher or batter would have the SAME results both early and late in the season.
I hate to give an opinion with zero evidence, but I am not even coming close to buying this. As always, I could be wrong...
I don't know why you don't simply include all players in year 2 and forget about the 275 PA threshold. It eliminates the survivorship bias issue.
Great work guys. Will you be presenting this data during the season, including minor league players? Updated how often? Will past data be available, including minor league players?
(And in a reasonably efficient talent market, isn't this the reason that the very best players cost more on a per-WARP basis? Getting 8 WARP from one lineup spot is more valuable than getting 4 WARP from two lineup spots.)
I don't think either one of those is true.
That is not an "advantage". If your scrubs are plus defenders, then they are no longer scrubs. You have to pay for defense, presumably just as much as offense.
If you put your better hitters at the top and your worst at the bottom of course it will score more runs than a balanced lineup. There is nothing "structural" about that. Your better hitters get more PA.
And what makes you think that it costs more for a scrubs and stars team than an all-average team, assuming the same overall talent?
What do you mean by "2 different lots of MLB balls?" Were they merely 2 boxes of balls or were they specifically from two different manufacturing "lots?"
I assume that these were not rubbed up with mud. Do you think that makes a difference - how much mud and perhaps the configuration of mud on the ball?
Tango, in my research I have found that players with more variable histories have more variable future outcomes. Whether that means that they are "harder" to project, I don't know.
Using RMSE or something like that will yield worse results for these players, but, again, I am not sure that means they are "harder" to project. I am agnostic as far the different methods for evaluating players.
Here is an example of what I mean. Say we have 3 groups of players with variable histories. One group is all over the map historically, but their projection is an OPS of .650. IOW, their weighted career average is somewhere around .620 or so (once we regress toward the mean, we get the .650).
The second group is all over the map, and their projection is .750. The third group, .850.
If we test our projections for each group, we indeed get a collective actual of .650, .750, and .850 for each of the 3 groups, so we pretty much nailed it.
But, because they were quite variable historically, as I said, the variance in actual WITHIN each group will be large. Some of the .650 group will be .550 and some will be .750, etc. Same with the other 2 groups.
So using a RMSE or even average error will yield a high number which will make it look like these were bad projections. But were they? I don't know, you tell me.
The exact same thing will occur with players who have limited histories. We can probably nail the .700, .750, and .800 players, but within each group there will be a lot of variability and the average RMSE errors will be large, just like the players with a lot of inconsistency in the past but with more data in that history.
Compare that to players who are more consistent historically. We will have our .650, .750, and .850 groups (etc.) but within those groups there will be less variability. Does that mean that these projections were "good"? Again, it depends on how you define "good" and "bad" and what method you use to evaluate the projections.
I can say one thing about evaluations. If you are using any system which does not incorporate the number of opportunities in the "actual" it is a bad evaluation system. And that is because the fewer the opps, the more random variability there will be. It is not enough to just say, "I'll use a 200 PA cutoff," or something like that.
I'm not really seeing the value of a metric like this. The "value" of a take is certainly not the run average value of the result. It is the difference between that and the run value of a swing (at that pitch on that count). The latter is hard to compute.
As you see in the chart the average value of a take goes up the more you swing at everything and only take pitches which are clearly balls. As far as cumulative value you can also see from the chart that those with the highest total value are the best power hitters simply because they get so many pitches on the border of and outside the zone.
So I not seeing how either rate or cumulative value tells us anything at all about a batter.
OK, no problem. It was confusing the way you labeled the axis. Probably should have used the names of the months or something like that. No big deal.
Just curious, are those months on the X axis, as in 8 months per season, March through October? I don't think many of those season had March games so why are there 8 dots per year?
I didn't word that very well. What I meant is that every single situation requires a different approach, and RISP is actually a conglomeration of dozens of different situations each requiring its own approach.
This is a very complex topic so I'll only throw out a few thoughts.
I think you are capturing a lot of randomness. There is no particular reason why a pitcher, any pitcher, should alter their repertoire with runners in scoring position, given the same batters of course. Which brings up another issue. You are probably capturing differences between approaches to different batters since the middle of the order tends to come up with runners in scoring position. And a lot of whether a pitcher changes his approach depending on the batter has to do with his unique repertoire.
Anyway, as I said there is no particular reason to change an approach with runners in scoring position. You cannot make the argument that a pitcher should pitch tougher or better in those important situations (and they are important) because that argument gets defeated by the argument that he should pitch as tough as he can in ALL situations. On the other hand, since a pitcher may get reduced benefit from exposing all of his pitches to often, perhaps he can hold back certain pitches to some extent and then unleash them in important situations. Again, that assumes that if he uses those pitches in all situations, he gets reduced benefit from them in general, which is a questionable assumption.
You can also make the argument that a pitcher can and should exert a little more effort with runners in scoring position assuming that he has a limited amount of effort while he is in the game and can and should budget that effort in different situations. In fact research does show that pitchers throw slightly harder in certain more important situations I think. But that has little to do with repertoire.
The ONLY reason for a pitcher to change approach is because the value of certain events change. For example with a base or bases open (and runner in scoring position obviously), where a walk is MUCH less valuable than a hit, the pitcher will throw more breaking pitches. Breaking pitches are harder to hit, lead to more K, but also to more walks.
With a runner on first and less than 2 outs, he may tailor his pitches to get a ground ball. With a runner on third and less than 2 outs, he is trying for the K so he also throws more breaking pitches and more pitches in the corner in general.
I think the RISP or no RISP is a bad dichotomy if you want to look at how pitcher approach changes. For example, bases loaded (where you don't want a walk) is completely different from, say, runner on second only, where you don't mind a walk.
Multiple runners on is different from 1 runner on, because the HR is terrible with 2 or 3 runners on.
And of course outs (and score) is also critical to approach.
I realize that you were tying this in to the Shields RISP issue, but any research into how pitchers change their approach must use more nuanced (and actually relevant to approach) subsets of base runners and outs.
I'm not sure that this is even possible, but my guess is that when an umpire is "not accurate" it is much more likely that he is calling a larger zone than he is calling a smaller zone, which is why you are getting more K (and less offense) for "inaccurate" umpiring.
I am not buying that if an umpire's zone is tight (small) but inaccurate, that it would cause an increase in K (and decrease in offense in general).
I track umpires quite a bit, and the ones who have a tight zone are the ones that call an accurate zone. The ones with a bad zone are almost invariable those with a large zone (larger than the actual zone).
I think the problem is not "zeroing out" accurate and inaccurate. If you are using the actual zone and not the de facto zone to determine accuracy, you will find that the average umpire is "inaccurate." (larger zone).
If you define "accurate" as the de fact zone, such that the average umpire is accurate by definition, then you should find that accuracy or at least a deviation in accuracy from the de facto zone, has no effect on K rate or any other offensive component, on the average. Deviation on the large side obviously creates more K and deviation on the small side obviously creates less K.
But to say that "any deviation from accuracy, on the average creates more K and less offense in general," is very misleading.
Very good article. Everything right on the money.
BTW, whenever you read/hear someone criticize strategies that don't work out but rarely if ever criticize strategies that DO, your BS detector should go off loudly.
You see, depending on the move, roughly half of all bad moves should work out and roughly half of all good moves should work out. Some people have a hard time wrapping their head around that concept. Those people you should NOT be listening to..
Good job man!
It is extremely refreshing for you to be open to that kind of criticism.
If we (analysts, critical thinkers, etc.) in any way, shape or form distort or misrepresent the truth in attempt to be politically correct or cow down to the mainstream crowd, we are left with nothing. Because really, all we have is the truth on our side - and that is no small thing. We are not always right, by a long shot, but we always strive to be (intellectually) honest.
Mainstream blown saves?
Sam you are an excellent writer and analyst, but you need to resist the temptation of mainstream BS.
"Further, the results don’t necessarily validate the criticism:"
Yes, in your next sentence you downplay that statement, by why include it at all? You lose all credibility in my book when you write things like that. The results have NOTHING to do with the decision in a sample size of one (nothing = .000001). If a pitcher is .3 runs worse than he was before, or even .7, there is zero chance that you can tell that from the results of one inning. Zero (.000001). I am 99.9999% sure that you know that, so...
I agree with you that the break toward home on the looper was probably a bad play. I don't think anyone could tell whether that was going to be caught or not. On 1 out, it may be justified, but with 0 outs, I don't think so.
I realize that this article and research were mostly tongue and cheek, but when you are dealing with a sample size of 100 some odd games and looking at win/loss records, it is impossible to even come close to ruling out anything other than the most unlikeliest of hypotheses.
Let's say that I hypothesize that the effect I am looking for is 5%. So, in 100 games, rather than winning, say 50 games, a group of teams is supposed to win 55 games, according to my hypothesis. Unless I find something like a winning percentage of 65%, there is nothing I can really conclude from a statistical perspective. If I get a 50% result, that could easily be a Type II error. Easily. If I get a 55% result - easily a Type I error.
Basically, when looking for small effects in small samples, don't even bother. Seriously. You are more likely to mislead yourself and your readers than you are to find anything reliably interesting.
For example, your results in 100 and 200 games tells us nothing of interest. Before you even get your results, you know that a result of "expected = actual" is not going to tell us anything interesting other than, "if there is an effect, it is likely not really large." Well, did we really think it was large going in? Again, looking for small effects in small samples. Exercise in futility.
I learned that from doing hundreds of these types of analyses. I've done so many of these that I get results all over the place (as expected) and I long ago realized that I could never make heads or tails of them for a lack of statistical significance one way or another (either accepting or ruling out the null hypothesis).
Yes, I realize that he has been more effective, otherwise you would not have written the article, but you are attributing that success to a change in approach. If all pitchers use a similar change in approach with RASP then it is unlikely that that change is a causative factor for the success (i.e., he probably just got lucky).
That's like saying, "Pitcher A has had amazing success pitching with runners on first base and 1 out, as compared to the average pitcher in that situation. Let's see why. Wow, it turns out that he is somehow able to induce 20% more GB than he normally does, hence the success."
Well, if ALL pitchers induce 20% more GB with a runner on first and 1 out, then that can't be the reason for his success!
You completely lost me.
That is 100% correct. Game theory for pitching only applies within the AB anyway.
2 questions I would have:
1) How do all pitchers change their approach with runners in scoring position? For all we know, Shields' changes are similar to all pitchers' changes.
2) If he is more effective with a different mix of pitches with RISP, why not throw those same mixes all the time?
If you use projections rather than what they have done already, which you should be doing, it looks even worse, as those four pitchers' projections (FIP) are around .1 each worse than how they have pitched this year.
Good job! I love the last section.
As an aside, whenever you hear the windbags on TV or on the internet give you narratives that support both, mutually exclusive, sides of an issue, there is a pretty good chance that both sides are B.S.
A contending team is playing a non-contending team, and the announcers tell us that it is "dangerous" for the contending team because the non-contending (i.e., bad) team has nothing to lose and loves to be a spoiler. As if a bad team can magically become good because they have "nothing to lose." If they can do that at will, perhaps they should have tried that at the beginning of the season.
Then they flip the coin over and someone else tells you that so-and-so contending team has the advantage down the stretch because they are playing non-contending teams that are going through the motions, giving playing time to September call-ups, yada, yada yada.
This is one (of many) examples how people's recollection or characterization of what they say or do is simply not reliable.
I mean if what pinch hitters say they do is simply not true (which it is not), why would we believe anything that a ballplayer or ex-ballplayer says about what they did or thought?
That is one reason why you hear so many stupid things coming out of the mouths of these ex-player commentators on TV and the radio (and internet), even when it specifically relates to their profession (thus you would think that they were reliable sources of information).
People who actually do, say and think the things that are contained in narratives are just as likely to believe those narratives as are people who have nothing to do with them.
There are SO many things that ballplayers will tell you they do or think that simply aren't true, that we have to remember that people's recollections, even from the so-called experts, are not substitutes for actual data. In fact, they are often the least reliable sources of information.
"In God we trust. Al others must bring data." William Deming.
"It's possible that in terms of run production, it all washes out."
I am confused. You found somewhat of a "talent" for swinging more or less at the first pitch depending on leverage, and you also found that swinging more or less at the first pitch affects walks, K, outs, and extra base hits, right? But you don't know if this actually makes the batter better or worse - it could be that just their approach changes, but not necessarily the win impact, right?
So, why, in your first sentence, do you say:
"By clutch hitting, I mean that certain players have some sort of ability to perform better in higher leverage situations."
Did you find that some players do perform "better" by virtue of this swing change, or not? "Better" has to be that their win impact goes up. It can't just be a change in behavior without knowing how that affects their win impact, right?
Robert, of course swing rates by count will vary more than overall swing rate simply because of the sample sizes. Right?
The assumption here is that the faster the player, the more extra bases he takes, the more he advances on outs, and the fewer double plays he runs into on the bases, including getting thrown out on the bases.
So the model has to reflect that. It is fair to use that model to see how speed (fast or slow) is leveraged in the various slots in the order. It is not possible for a slow player to gain runs regardless of the slot in the order, as in the last chart, since the model is supposed to assume that faster = more value. So there must be something wrong with the model.
As well, as I already noted, if all the players in the order have the same batting profile, there should be little to no difference in the value of speed regardless of the batting slot since the order is a loop and not a line with endpoints. So, again, something is wrong with the model.
It entirely depends what Russell put in the MC simulation. Obviously one can program a fast runner being too risky but that is not what happens in reality (fast runners have +5 to +10 baserunning linear weights), so that would be a programming mistake.
So remember, we are not finding out anything about actual fast and slow base runners with this simulations, only what Russell's simulation does with them.
Now, he should be making sure that his fast and slow runners do what they do in reality, which is easy enough to find out. Simply look at all players with low and high speed scores and see how often they take the extra base, get thrown out, etc.
I can tell you from the work I have done with base running linear weights that fast runners do NOT get thrown out on the bases very often. In fact, no one gets thrown out very often. The fast runners get their edge from taking the extra base and don't give back much if anything from getting thrown out.
Yes, that last chart is completely implausible.
Russell, I think that you are suffering from severe sample size problems. 10,000 games is not nearly enough, I don't think, to smooth those out the random fluctuations.
For example, if you put 9 league average players in a lineup, why would speed affect any spot more than another?
And slow players adding runs? Come on! You may even have a bug or bugs in your model.
"And yet, everything we’ve seen in the past 27 months suggests an every half-century type of player; regressing him toward a league-typical standard implies that he belongs in the same league as the Jason Kubels and the Michael Cuddyers and the Matt Kemps of the world. He doesn’t. He’s playing here only because a better game hasn’t started that can challenge him, just like when he was facing Cherry Hill East in the state playoffs."
That is wrong. The only way we know, or think we know, that he is an other-wordly player is from his stats. When we regress toward the mean of some inclusive population, that population is derived without knowledge of the stats. That is the essence of this kind of regression in projections. Now, we can incorporate other things, like how good a prospect he was, in formulating the population, but the one thing we cannot include, are his stats (at least the stats that go into the projection algorithm).
Very nice work!
"reach back for a couple of extra MPH when, say, there's a runner on second base in a close game."
Funny how we almost always overvalue and overestimate things when it comes to human behavior.
Yes, pitchers do throw harder in tougher situations - about .1 or .2 mph harder. So maybe on the order of 20 times less than you might think.
Obviously pitchers pitch everyone differently depending on the situation. For example, if you have a 5 run lead late in the game, you thrown many more strikes and fastballs to all hitters. If a HR ties or wins the game in the late innings, you pitch mainly down and away. If in a critical situation, a walk doesn't hurt you, but a hit does, you throw to the corners. Etc.
I would like to see the difference in OPS for light hitting position players. There should be no reason that a pitcher would pitch much differently to a decent hitting pitcher than a light hitting position player. A pitcher at bat is just a poor hitters - nothing more or less than that. And some are better hitters than the worst hitting position players.
There also may be something about the game situation when pitchers bat in high leverage situations that causes a lower OPS. Maybe. Certainly those situations are not necessarily the same as situations where position player hit in high leverage situations. For example with position players, most of those high leverage situations occur late in the game. For pitchers, I would think that few high leverage pitcher AB are late in the game.
As far as the decision to IBB or not, it was very clearly the wrong decision because of the lost WE in the next inning by not having a pinch hitting pitcher lead off and instead leading off with the #1 batter. So even if the WE in the 9th inning was a tossup between walking Reynolds or not, you are going to lose at least 1.5% in WE when the game is extended to the next inning (which happens around 2/3 of the time).
As well, as in all marginal IBB situations, or situations here you should not issue the IBB but you do, it is always correct to wait until the count is in the batter's favor. Always.
Thank you for the explanation about why signing bonus is a better predictor than actual pick number, at least before "slotting." It makes sense.
" The message there is that the correlation ain't .80 even when we have MLB data to work from and we're not projecting into the future."
It can't be that high even with perfect information, because there is too much random variance in one season of performance. Remember that even with perfect information (if we knew every player's exact true talent and payed them fair market value for that), the smaller the time frame, the lower the correlation because correlation includes random variance. That was MY point. Comparing correlations tells you nothing about how much information you have or how efficient you are at evaluating talent, when one correlation is based on one year of performance and the other is based on a career (or at least several years).
When I worked for the Cardinals in 2004, I studied the draft. I found that after the first round, using nothing but college MLE's would do better than the actual draft. In other words, after the first round, scouting is very inefficient, as your analysis suggests. Of course, drafting is probably better now than it was 10 years ago, as many teams are using sophisticated analytics to project ML performance from high school and college performance. There is also more emphasis on defense and positional adjustments now then there used to be. In the old days, teams would much rather draft the slugging outfielder or first baseman than the mediocre hitting excellent SS. Not so anymore, or at least it shouldn't be, although I suspect that most teams still overvalue hitting at the amateur level at the expense of defense and positional adjustments (and probably base running too). Everyone wants the next great hitter and not the next great fielder.
"In 2013, the correlation between salary and WAR among those making more than $1,000,000 was .23. I played around with various filters, but the result was always similar. So, from that point of view, teams are doing a better job of understanding the market for wins that prospects will eventually put up (years in the future) than they are of understanding market for actual MLB free agents."
In one, you are using career WAR and in the other (2013 salary and WAR), you are using one-year WAR, so of course the correlation for the one-year WAR is going to be less. That does not mean that teams are doing a better job in the draft than in valuing FA.
Back to the article in general:
When you look at individual rounds and find virtually no correlation after the first round, all that tells us is that teams cannot distinguish among the talent in each of those rounds, not that they are necessarily doing a bad job in drafting in those rounds.
For example, let's say that all players in the first round accumulated an average of 20 career WAR and all players in the second round accumulated an average of 15 career WAR. That might be a good result, I don't know. Let's say that it is. Well, the correlation of zero (or negative) in the second round simply means that teams cannot distinguish between one 15 WAR player and another, which doesn't mean that they are doing a bad job at drafting in the second round. Maybe all the talent is bunched together after the first round?
In fact, if the talent is normally distributed, then there are more and more players bunched together in talent after each round so that it becomes harder and harder for teams to distinguish among all that talent PLUS the actual spread of talent between the 30 players in each subsequent round is getting smaller and smaller so we expect that the correlation between WAR and signing bonus will also get smaller and smaller.
Even in the first round, the relatively small correlation doesn't real tell us much. What if teams are very good at identifying the best 30 or 40 players, but they are not so good at distinguishing among them? And again, what if the talent even in the first round does not have such a large spread?
All the correlations in each round tells us, from a team perspective, is that it doesn't really matter that much what position in each round you select from and you are probably better off with a lower number especially after the first round since you can save yourself some money in signing bonuses.
Also, isn't there a lot of noise in the signing bonuses depending on the player's agent, his family's financial situation, his personal preferences, etc.? Why wouldn't draft pick number be a better proxy for evaluating talent? with a few exceptions, aren't teams trying to pick the best available talent for each pick, regardless of what the eventual signing bonus is?
"I can name countless examples of this type of talent progression at the expense of performance."
Anything at the major league level?
"Nobody suggested scouting was science; in fact, its an art, one that is a learned skill and normally applied using a historical perspective."
Art, science, whatever. If there is no evidence that what you say has merit, then, I am afraid it is just blather.
I am NOT (of course, you will ignore this sentence and just go an assuming that I am saying that you don't know what you are talking about) saying that you or any of the other "scouts" are right or wrong about anything in particular. I am simply asking for evidence, that's all. Maybe there is. I admittedly don't read much of these types of articles. For example, if you say that you can analyze someone's swing or pitching mechanics and predict something that the numbers can't, then there must be evidence, right, or why should anyone listen to what you have to say?
I mean, if a respected doctor tells us that he has a cure for X, and he writes a beautiful article about why and how it works, we don't really care unless and until it is tested, right?
"The stats may not have the power to detect a change within a small sample, but one who has the eyes to see what is happening on the field can glean these subtle details."
I'll believe that when just once - once is all I ask - someone uses observation and scouting to tell us how a player's true talent actually got better even though his performance got worse! Or vice versa. Surely that has to happen.
Everything you said, Doug, is opinion without any evidence to back it up. Why should I believe that any more than I should believe Kruk on ESPN when he tells us how fielders make more errors with slow pitchers on the mound (they don't).
When you scouting guys can tell me something that is going to happen in the future and not merely narratives about the past, and we can test that, then I might believe what you have to say. I'm not saying you guys are wrong, but I have zero evidence that what you guys are saying has any merit whatsoever. At least I don't think there is. If there is, please let me know where to find it. Otherwise it is just opinion without evidence, which might be interesting, but it's not science.
I am not necessarily buying any of this. I could be wrong, but I think anyone could support any narrative with pictures of swings, because there are hundreds if not thousands of swings that a batter has during a season, depending on the pitch location, type, the count, etc.
"As Rob Neyer and Jay Jaffe have pointed out, the data backs up the notion that Trout is not merely a victim of bad luck. If it’s not bad luck or a small sample, then what the heck has gone (relatively) wrong with baseball’s wunderkind? When in doubt look at the swing that is putting up the numbers."
Is just plain wrong.
We have no idea how much of a player's past or present performance is luck and how much is true talent. We can only make inferences based on sample performance. Sure, using "scouting" and observation, we can make those statistical inferences stronger, but we can never be nearly certain what is luck and what is true talent.
The idea that we "know" that Trout's slightly worse performance so far this year is a change in talent from last year is just ridiculous. We know no such thing. For all we know, Trout was lucky the last 2 years and this season's performance represents his true talent all along. Or he is indeed unlucky this year and prior years' performance is representative of his true talent. Or anything in between (more likely of course).
This is very nice work! I realize that it just scratches the surface, but I was was disappointed that you did not at least consider what the call was (ball or strike) on the first edge fastball.
That's a very good point above! I doubt they are lying.
It seems to me that a manager's opinion is worthless, as with most of these things. Someone has to do some research and establish rules of thumb for various hitter/pitcher combinations and game situations.
As usual, in each instance, there is a correct answer (or it is a toss-up) and an incorrect one, regardless of what the manager's "opinion" is.
I cannot think of any reason why you cannot have both a great leader and at least a decent tactician. Strategy can easily be taught, given a willing student of course. Leadership probably cannot, at least at this stage in a person's career.
So, I put in on upper management to choose someone with great leadership and motivational skills, and simply exclude those who are unwilling to learn and utilize new (for him) strategies. I cannot imagine that that is not possible, especially as time goes on and more and more of the great leaders will have grown up in the internet age and at least be somewhat familiar with sabermetric principles.
If you grow up with them, you are by definition on board with them. The principal reason that the old codgers eschew sabermetrics is because they did not exist, for the most part, when then learned the game. When the "real book" replaces "the book," then it will no longer be necessary to teach the dogs new tricks.
One of the problems is that players and managers have an "either/or" mentality when it comes to things like this, and have very little understanding of what a "break even" point is.
For example, with no one on base, and especially with one out, they think that a bunt is an excellent idea, which it is. You probably cannot convince a batter that a bunt can also be an excellent idea with, for example, a runner on second and two outs, or two outs and no one on base in a close game with a power hitter at bat.
Of course, the correct way to look at it is through the lens of "break-even" points. If you have a decent or good bunter and the defense is shifting, a bunt in virtually any situation is probably a good idea.
So front office people need to teach managers and coaches the concept of break-even points, and they in turn need to teach the players.
There is also the issue of, perhaps it is correct to bunt every single time even in situations that do not call for the bunt, in order to force the defense to stop shifting entirely.
Actually, whether to bunt or not is simple. You bunt at a certain percentage such that it would not matter whether the defense shifts or not. That is the game theory optimal (GTO) strategy for the offense. For the defense, if they thought or knew that the offense was acting optimally, they should play a defense such that the WE is the same whether the batter attempts a bunt or not. That is their GTO strategy.
Nothing is really hard to measure. It is sample size that usually constrains us (from having much certainty in any conclusions drawn from the results).
I can buy both premises - that there is a hangover effect AND that relievers have off days, but there is NO way that you could identify an off day by observing base runners in ONE inning and then have such a significant effect in the next inning. Do you have any idea of the variance that the "good days/bad days" would have to be in order for two base runners in one inning to result in that kind of a decreased performance in the next? Well, it is impossible.
So, something is wrong.
What is the "xy correction?"
I think a good compromise for the "count" thing would be to do it the way you are doing it, just in case there is some "skill" at being more or less of a good framer at the various counts, and then normalize the results to an average count distribution. That is pretty much what I do with UZR, to some extent.
Also have to be careful with the regression thing which translates the observed into "skill." Most of the components of WAR and WARP do NOT do that, so if you are adding this or comparing it to other WARP components, you are adding or comparing apples and oranges.
So, are the other catching components on the player card and stats pages regressed also? Such as the catcher blocking?
Are you planning on updating these on a regular basis as the season goes on?
Again, great work!
I have a particular aversion - and I'm not sure why - to suggestions that may or may not be reasonable (or even excellent), but will never be implemented. No penalty that alters the game in any way shape or form will be considered by MLB, whether that is right or wrong.
I think this discussion is much ado about nothing. The math and the strategy is interesting, yet very few managers are going to make marginal challenges, for various reasons. And if some of them do - so what? It will take 30 seconds or a minute to resolve (especially if it is obvious that the manager is wrong). A month or two into the season, no one will be paying much attention to the challenges, especially the gratuitous ones (if there are any). It will give people an excuse to go get a beer, like a mound visit.
And if it becomes a problem, MLB will change the rules. I assume that the present system is somewhat experimental anyway, no?
WPA is an interesting stat. If you want to use it, or some derivation of it for a retrospective award, it is fine. For anything prospective, like a projection, it is terrible. It is not biased though, other than defense (which makes it biased I guess!), although, like Eric says, you can try and adjust it for defense.
Right, even Pizza mentions this. It is like using RA9 rather than FIP in WAR. RA9 and WPA includes things the pitcher has little to no control over, like defense, luck, and sequencing.
OK, makes sense. A team could conceivably have all their games go 0-0 into extra innings.
Here is an interesting thought:
Say that a pitcher like Felix (or any good starter) always goes into the 9th inning and if the game is close, he pitches and if the game is not, he leaves and a reliever comes in.
That starter would obviously have an average leverage of quite a bit above 1.0. Maybe 1.1 or 1.2 for he season. Maybe higher. And since he is an above average pitcher, he should have a lot more value (WAR) than regular WAR (with no leverage adjustment) would suggest.
But, let's say that his team's regular closer is someone like Kimbrel or Mariano, and he is pitching in the 9th instead of them. He is costing his team wins so to give him more value because of his average seasonal leverage is a bit of a contradiction. Of course on the team level, it will all balance out, since the closer will get less value (WAR) since he is giving up some high leverage situations to thst starter. And that is not to mention the fact that the starter's high leverage situations in the 9th are occurring the 3rd and 4th (and more) times through the order when he is not nearly as good as he is overall!
So it is NOT really correct to multiply any starter's overall WAR by his average leverage for the season since the manager is "de-leveraging" his starter by allowing him to pitch in higher leverage situations (later in the game) when he is least effective (because of the TTOP and fatigue).
"Although this would likely be balanced by lower leverages due to larger margins of defeat."
You just answered your own question, no? A team's average leverage for the season has to be 1.0, by definition. I think.
Also, for projection purposes, which is what this article is all about, you should not be using any kind of WAR which already includes leverage, unless you have to.
Obviously no credible forecaster uses past WAR for WAR projections other than as a quick and dirty method, but we'll assume that we are (using WAR to project future WAR).
Once you get your projection, you then estimate how that reliever is going to be used and THEN apply the leverage adjustment. You don't use the leverage adjusted WAR from the past in order to project leverage adjusted WAR in the future, unless, again, that's all you have or you don't know how he is going to be used in the future, so you just assume that he'll be used exactly like in the past.
The reason is similar to why you don't use WPA to project future WPA. Let's say that last year a closer had an average LI of 1.7. But let's say that the average closer has a LI of 2.0. Let's also say that this player is going to a new team, so we can take the team and manager out of the equation. Don't use his leverage adjusted WAR to project his leverage adjusted WAR in the future! Use his non-adjusted WAR and then adjust it using 2.0 and not 1.7!
It is not a question of clutch or even whether a reliever "should" get credit for pitching in high leverage situations, like a closer (and set-up guy) does.
Forget about the word "credit." You pay a player for the value he provides to your team (compared to some other player, in most cases, it is the "replacement player"). It doesn't matter how that value comes to fruition. If a manager decides that he is going to put a certain pitcher into high leverage situations (like he would a closer), it just so happens that the pitcher's value per inning gets multiplied. That is THE definition of leverage, whether you understand the concept or not.
Here is a perfect example, which should completely answer this question or solve this "problem" if any of you are having a problem with this concept.
Let's say that you can get Barry Bonds in his prime, but you are only allowed (or choose) to play him in games where you are up by at least 7 runs. What is a fair salary?
And what about if you only play him in games where the score is within 1 run or tied. How much is he worth PER GAME?
That's all there is to this concept when it comes to relievers. It actually applies to all players, but most players play in average leverage situations overall. You can actually do the same thing (even though most people don't), and technically you have to, for pinch hitters, pinch runners, and defensive replacements. Whatever their hitting, running, and defensive value is, you have to multiply it by the average LI that they play in. That is their value for salary and trade purposes, even if their "talent in a vacuum" is overvalued.
WAR that uses "half leverage" DOES properly evaluate short relievers for projection purposes. WPA overvalues them because it uses full leverage. It also includes too much noise that has no predictive value (which is why it is better use a context neutral WAR plus an adjustment - half - for leverage).
BTW, the reason for the "half adjustment" for leverage is "chaining." You'll have to poke around on The Book blog and other places on the web for an explanation of that. Basically, if you choose to leverage (increase his value) a good reliever by using him in high leverage situations, all your other relievers move down the food chain.
Right, exactly. In any case, the "any idiot" narrative is not a very helpful one. Smart teams know the value of a closer (and other relievers) based upon a credible rate (context neutral runs allowed per inning or per whatever) projection.
And it IS correct to use half leverage. The value of a closer is his wins above replacement times around 1.5 or so (assuming that he pitches in an average 2.0 LI).
The (silly) meme that, "Any idiot can save a game with a 3 run lead in the 9th," has nothing to do with the value of a closer based on a credible projection for him.
Kimbrel is probably 2.5 runs per 9 better than a replacement reliever. Valverde is probably .5 runs better. That makes Kimbrel worth 16 runs (72 IP) times 1.5 (half leverage) more than Valverde. That's 10 million dollars per year more value, even though Valverde is one of those "idiots" who can save a game with a 3 run lead.
(Of course, it's not that "any idiot" can save that game. It is that his team wins the game with the "idiot" on the mound 94% of the time and with the elite closer, it is 97%. Of course, the value of a great closer is mostly in games where his team is NOT up by 3 runs, hence the concept of leverage.)
"Let's leave aside your point about the selection bias in the sample (it's a valid critique, but for a moment, let's assume that it doesn't make a difference)"
I'm sorry, but you can't leave out that point. That is THE whole point (that I am making).
"In the 60's, we see that starters in their final innings were a quarter run worse than relievers in the same inning."
The runs in that "final inning" have no reflection on the talent of the pitchers in that inning. I can create any number of runs I want in the "final inning" simply by deciding when to take out a pitcher.
As I said, the primary reason for the runs allowed being much higher in the old days in that last inning is the reason for taking the starter out. We have NO IDEA of the talent of the pitchers in that last inning. None whatsoever. You have to have an unbiased inning for that.
In fact, the ONLY thing that looking at runs allowed in the last inning tells us is why managers took them out. The higher the runs allowed in that inning are, the more managers took them out because they were bad and not because of any other reason (like a high pitch count). The other reason that the "last inning" has fewer runs allowed in modern days is because managers will leave in a pitcher if he allows lots of runs in that inning but his pitch count is low! That is because the bad inning will sometimes not be the last inning. In the old days, when a pitcher gave up a lot of runs, he was often removed even with a low pitch count.
I'm sorry, but unless I am reading your conclusions wrong, I think you made a fatal mistake by looking at the last inning.
"If they were to revert to the old days, they would send pitchers out whom they shouldn't send."
Can you explain how you came to this conclusion (if you have time)? I'm not following you.
As I said, if you are looking at the "last inning" pitched that is not an unbiased look at the runs allowed. The "last inning" runs allowed will be inflated because a pitcher who allows lots of runs in an inning will often be taken out during or right after that inning.
The reason you are seeing more runs allowed in that last inning in "the old days" has nothing to do with how good the starters are. It simply means that in the old days, starters were taken out only during or after a terrible 6th inning. In modern times, starters are taken out after and during a bad 6th inning AND when their pitch counts are high.
So, again, these are very misleading numbers, and I don't know why Russell chose this "last inning pitched" criteria. I don't think it yields anything useful.
"To try to isolate cases in which we can surmise that the manager knew the starter was faltering, but still left him in, I looked for all cases in which the sixth inning (whether the starter completed it or not) was his final act that day."
Wow, that's a selective sample and a half! I don't think you are isolating cases when the manager knows a starter is faltering but leaves him in for at most one more inning. I think you are isolating cases where a pitcher pitched horribly in the 6th and was taken out (either in the middle or after).
By no means is the result of that inning representative of how starters pitch in the 6th inning when their manager thinks they are faltering. If you look at any inning which is the last inning for a starter, you will see a bad inning. The earlier that inning is, the worse that inning will be (because the reason a manager takes out a starter in the early innings is because he pitched badly in that inning - in the later innings it could be just a high pitch count).
I also think the results you are seeing across time is mostly this:
In the early days, the only reason a manager takes out a pitcher in or after the 6th is because he had a terrible inning. In modern days, a manager takes out a starter during or after the 6th if he had a bad inning OR if his pitch count starts to get high.
I'm not really sure what we are to conclude from that...
I don't think this is workable, for a lot of the reasons articulated above. The two principal forces working against this (other than the aesthetic ones, which are not insignificant, IMO) are:
1) I need A LOT more (as in any) evidence that having the better fielder in the opposite field is advantageous (and to the extent that Russell estimates) simply because there are more balls hit there. We really have no idea how fielders of various talent levels fare as a function of the size of the field, and the speed, location, trajectory, and spin of the batted ball.
2) Assuming that there IS an advantage, I would think that there HAS to be SOME disadvantage to constantly switching. When you playing 9 straight innings at one position you become more and more familiar with the wind patterns and even the outfield quirks in visiting parks, even if you have played there lots of times before. Regardless of how much you train for this, I just can't see it NOT being a fairly significant disadvantage. Not to mention, seriously, the effect of jogging back and forth during hot days and nights.
So, no, I don't see this as being workable until and unless lots more research and experimenting is done.
"Yes, I know, pitcher ERA is a terrible way to measure pitcher value, and reliever ERA is even worse."
Actually, ERA or RA9 is just fine for evaluating pitcher in the aggregate. And this notion that ERA is OK for a starter and not for a reliever is a complete myth. What we have absolutely no use for, in terms of reflecting a pitcher's talent, is saves!
"Maybe Nathan will become another one of the limited examples that continues to shine into his 40s."
After removing Fryman and Reed from the list and we look at (non-weighted) the simple average of all these pitchers pre and post, we get a decrease in ERA of around .2 runs. And that is not even accounting for an expected regression toward the mean! Guess how much a random pitcher loses from one year to the next? .2 runs. So there is absolutely nothing remarkable about a good reliever pitching into his 40's. at least according to these pitchers. So your data completely disproves your thesis!
I only include batters who were in the lineup when the game started (I also don't include 9th innings or later - those are usually the blowouts with starters still in the game).
And I am not sure why you are searching for reasons why there is NO fourth time penalty. As you can see from the last chart in the article, there is an 11 point fourth time penalty, almost exactly the same as the second and third.
BTW, as someone pointed out on The Book blog, the second time through the order is only a "penalty" relative to the first time. As I have mentioned and you can see in some of the charts, the second time around is about equal to a pitcher's (and batter's) overall performance for the season. So the second time is not really a "penalty" it is just that the first time is an advantage for the pitcher. But, that is really a glass half full or half empty semantic thing...
That should be "only has maybe 1000 TBF" not "100 TBF."
Sure, the 80%, 10%, 15% cutoffs are pretty arbitrary. I usually try to find a balance between creating "opposing" (like one-pitch pitchers compared to multiple-pitch pitchers) categories that are meaningful yet contain large samples of data. I can't always do that of course.
That is one reason why I also often use multiple groupings. The problem with using too many groupings is that it is possible to keep changing the cutoffs until you finally get the effect you are looking for or not looking for! And that could have occurred by chance (Type I error).
I am afraid we are talking past one another, so we might want to start all over again.
Let's start with this:
"You don't find a "solid 10 point penalty for all starters," unless you hand-wave it away; that's the whole point."
Yes, if you read the last part of the article which was an addendum to the original article, when I use the delta method to determine TTO penalties for all starters from 2002 to 2012, the fourth time minus the third time difference was 10 points in wOBA. The delta method is the correct way to determine average penalties. It is basically the average of all starters' penalties, weighted by the number of batters they face the fourth time through the order. So, if starter A had a wOBA of .350 the fourth time and .340 the third time, he had a 10 point difference. If he faced 100 batters the fourth time, that would be the "weight" for that pitcher when calculating the average of all pitchers. If pitcher B had only a 5 point difference and his fourth time TBF were 10 batters, then the weighted average of "all" pitchers would be ((10 times 100) + (5 times 10)) / (100+10), or 9.55. That would the weighted average of all starters' (in this case, two starters) fourth minus third penalty.
That is what I did for all starters. That should have been part of the original article. It has nothing to do with the second article. OK, we got that out of the way.
In this article, I broke down starters into several different groups. In the first go-around, I had 3 groups, mostly fastballs, not many fastballs, and all the rest. I found that the mostly fastball group had a 13 point ADVANTAGE (not a penalty) the fourth (and later) time through the order. The other two groups had the usual 10-13 point penalty.
I noted that and said:
"Interestingly, the “fastball” group reverts back to better-than-normal levels the fourth time (I don't know why that is, but I'll return to that issue later), but the latter group continues to suffer a penalty as do all the others."
I did come back to that later, and speculated on why that might be the case - why the predominantly "one-pitch" starters actually did a lot better the fourth time through the order than the third time and even second time (but not the first). I said this:
"Against one-pitch pitchers, pitchers gain 61 points (small sample size warning—639 PA). Again, I have no idea why. Maybe fastball pitchers are able to ramp it up in the later innings, or maybe they start throwing more off-speed pitches later in the game. (A PITCHf/x analysis would shed some more light on this issue.)"
I then took the same chart and combined the third and fourth times (and later) data. I didn't "hide" the fourth time data. That was presented in the chart before that. They were exactly the same charts, but one had first time, second, time, third time, and fourth time (and later). The other had first, second, and third (and later).
I noted that the fourth time data for some of my groups was a relatively small sample so it might suffer from sample error as opposed to the first, second, and third time data which was much larger and much less prone to sample error.
I certainly do not know how much of the fourth time "reverse penalty" with some of these groups is noise and how much is something about their repertoires that truly enables them to get "better" deep in the game. But I briefly addressed that and offered some speculation as to why that might be the case.
I don't know what else you want me to say or do or NOT say or do.
OK, that is out of the way.
"There seems to be a simple explanation available: guys only make it to the fourth time through the order if they're pitching well, or at least successfully, in which case the wOBA for the whole game is going to be suppressed compared to the overall average wOBA they allow."
I don't know what you mean by that. You will have to explain it again in different terms and be more specific. A "simple explanation" for what?
As I said, it is very true that in games where there is a fourth time through the order, the pitchers would have pitched well and gotten lucky the first three times through the order, but that should not effect the data they way I calculated the "penalties." There is going to be a very slight survivorship bias which I hadn't thought of until now, and I am going to address that in a further comment as soon as I look at the data and figure out how much of a bias there is. That bias should slightly over-state the penalties for all times through the order, but mostly the fourth time penalty. It should be slight though. The reason for the bias is this: All pitchers in a season who got to the fourth time (or had lots of TBF the fourth time) were slightly lucky in that entire season and the ones who did not (or had few TBF against that fourth time) were slightly unlucky.
"Here's an easy test for that one: if you look only at games where pitchers did reach the fourth time through the lineup, how does wOBA vary as a function of times through?"
You will certainly find that the first through third times will be a low wOBA and the fourth time will show a very strong penalty, since the pitchers who were allowed to pitch to the order the fourth time were "selected" to a large degree on the basis of how they pitched prior to that. But - here is the important thing - I did not calculate the penalties like that! For example, I did not calculate the fourth time penalty using only those games where the pitcher made in to the fourth time! Of course I didn't. If I did that, there would be a huge selection bias and it would look like there was a huge fourth time penalty, as I explained above. I used all games for a pitcher.
For example, let's say that a pitcher had 300 first time TBF, 300 second time, 250 third time, and 50 fourth time.
And let's say that the wOBA against was .340, .350, .360, and .370. My fourth time penalty for that starter would simply be .370 - .360, or 10 points. Then, when computing the average for all starters, I would weight that 10 points by 50, as I explained above. So, in computing the "fourth time minus the third time" penalty, I am using all games in which the pitcher faced the order for the third time, not just games where he faced the order for the fourth time. If I did that, the third time wOBA would probably be something like .330 and not .360 (since the manager is letting him continue late in the game). In that case it would look like the penalty was 40 points rather than 10 points. I hope that is clear.
"There were comments on the previous article that hinted at the importance of this question as well, but no followup."
What question? If there was a question or questions that I did not answer or address, I apologize. I do the best I can with my time. I don't even get paid to write these articles. In fact, I have written hundreds of articles, a book, and spent thousands of hours contributing to the sabermetric body of knowledge. I do it because I enjoy it and it is a passion of mine. I make zero money from it.
"That in turn leads to the more interesting question: is a pitcher more likely to make it to the fourth time through if he throws multiple types of pitch?"
I don't know, but I can find out in about 5 minutes. I suspect that the answer is maybe, but either way, yes, or no, it is not by much.
If you paid attention to my article, you will see that the ONE-PITCH pitchers are the ones who should be continuing the fourth time, if anything, since they are the ones who get BETTER the fourth time. The multiple-pitch pitchers continue to have a penalty the fourth time. Look at the first chart. Those are the ones who should be taken out after the second or third times. However, again, for some of the groups that I studied, the number of pitcher seasons is relatively small, and this the number of TBF the fourth time is small for the entire group. Because of the possibility of noise/sample error, we simply can't trust those numbers very much. Even for a sample of 8,000 batters faced (e.g., Group I in the first chart), one standard deviation of wOBA is around 5 points.
That is possible but it seems far-fetched to me. If that were the case, we would see pitchers purposely throwing less hard as the game goes on, and being more successful in doing so. I don't think we see that.
First of all, I don't think I'm "brushing it off." I briefly discussed possible reasons for the fourth time "rebounds" that some groups of pitchers seem to have. I also suggested that an analysis of PITCHf/x data could shed some more light on it.
I am not sure of what you mean by this:
"There seems to be a simple explanation available: guys only make it to the fourth time through the order if they're pitching well, or at least successfully, in which case the wOBA for the whole game is going to be suppressed compared to the overall average wOBA."
You appear to be suggesting some kind of a selection bias, but I don't think there is one. You are correct that when a pitcher is allowed to face the order for the 4th time, he likely (and on the average) has pitched a very good game, but that will not affect anything unless:
1) He is pitching well because he is "on" that day and the "onness" tends to carry over into that fourth time performance. That "carry over effect" has been mostly debunked by research by me and others. So that should NOT be much of a factor.
2) If he is pitching well and thus allowed to face the lineup for the fourth time, and the "pitching well" is at least partly due to the umpire, weather, and park, then, yes, the 4th time wOBA is likely to be depressed a little and I am not controlling for that.
Remember that I have already shown that for all pitchers combined, the 4th time penalty is 10 points. See the last part of the article. So there does NOT appear to be any selection bias. Then I found that some types of pitchers, mostly the one-pitch pitchers, not only don't see a penalty the fourth time through the order, but they pitch even better. You are claiming that there is a "simple explanation " for that. Not only do I not understand what you are saying, but if there is an explanation for that involving some kind of selection bias, why do we find a solid 10 point penalty for all starters?
Lots of pitchers in each group will show all kinds of different TTOP patterns, just by chance alone. It would be almost impossible for all of the pitchers in the one-pitch group to all have the same types of penalties. We are dealing with relatively small samples for each pitcher. In a small sample anything can happen. So Brown and Colon may or may not show the same types of penalty patterns that the group as a whole shows. They likely do, which is the case for any individual pitcher or pitchers in the group, but they easily could not.
Now, how much of an individual pitcher's (or sub-group of pitchers, like, say, sinker-ballers) TTOP patterns is random chance and how much is due to something about the pitcher himself is another story. I addressed that to some extent in the first article. Basically, most of the variation we see in a pitcher's own patterns as opposed to whatever group he belongs to, is due to chance as far as we can tell. That is because, again, of the small sample sizes we are dealing with. Even in 4 or 5 seasons, a starter only has maybe 100 TBF in each of the first 3 "times through the order" segment.
I apologize for an error in the article. This sentence:
"That means that it would take around 30,000 TBF or 7,000 innings pitched (roughly 35 years) before we would regress a pitcher’s own TTOP pattern 50 percent toward that of the average starter."
Should be, "around 7,000 TBF or around 1600 IP," which is 8 years and not 35 years. So if we have 3 years for a pitcher, we would regress his own penalties around 73% toward the mean. Of course "the mean" could be different for different kinds of pitchers. For example, pitchers with many different pitches like a Felix, may have a lower TTOP than, say, a pitcher who throws mostly fastballs - I don't know. Plus, the 95% or 99% confidence interval around that .03 correlation can be as low as no correlation at all (it is highly unlikely to be a true negative correlation) and as high as .1 or .13 or so.
Thanks to Jared Cross for picking up that error (on The Book blog).
I have done some unpublished research looking at the effects of seeing pitchers in prior games. There appears to be a small effect such that the more times you have seen a pitcher in the past, the better you do in an upcoming game. As I said, the effect is very small and I din't delve into it very deeply (for example, does it matter if you saw him last week or last year). A lot of good comments and questions above and a few horrible ones.
"On a given night, a particular pitcher may be performing so well relative to his 'true talent level', that even with the penalty, he is still better than any available reliever."
And how would managers and pitching coaches be able to recognize that? I submit (quite confidently) that they are so results oriented that they can't and don't. They get almost everything else wrong (sorry, but that's true), why would they get this one right? If you don't believe that they "get almost everything else wrong," please listen to the ex-players and managers who are the color commentators on TV broadcasts. They constantly spew nonsense. These are the same guys that manage teams.
Yeah, I never thought it has much to do with fatigue, but I wouldn't rule it out completely. Clearly there is a point at which each pitcher gets fatigued and pitches worse than normal, even given the applicable TTOP. We found in The Book, that there is probably no magic number for that (like 100 pitches). It is hard to research I think, but there probably needs to be more research on fatigue anyway.
And I don't trust managers and pitching coaches to be able to figure that out in the middle of a game. They are too focused on and biased by results. For example, if we could somehow know when a pitcher was indeed tired and we allowed that pitcher to throw one more inning and he struck out the side (even tired or bad pitchers can pitch well, right?), I would be willing to bet my last dollar that a manager or pitching coach would think that he is just fine! And vice versa. If we could know that a pitcher after 103 pitches was NOT tired, and he were to give up 2 walks and a HR (that happens to non-fatigued and good pitchers, believe it or not), I would also bet my last dollar that managers and coaches would take them out and tell us that they were tired.
Thanks Tango. And thanks to you for discovering this TTOP. I think you did at least.
Good idea! I might try that. Sort of a controlled experiment in real life!
Yes, that is possible. Interesting theory actually. The way to test that would be to see which batters if any, take more than their share of pitches in their PA in general. I'd have to think about that, but that is an interesting thought! Thanks.
Thanks. Your data is interesting. Let me just get it straight what you mean by number of unique pitches. I think I do. For example, if a batter sees 4 fastballs, that is considered 1 unique pitch? But if he seed one fast and one curve ball, that is 2 unique pitches, right?
I think you want to somehow separate number of unique pitches from number of pitches altogether. For all we know, you may be simply picking up the effect of number of pitchers, period, and not unique pitches.
You want to do something like, one group are batters who saw 4+ pitches but at least 3 unique pitches and the other group are batters who saw 4+ pitches but they were all the same. Or something like that where we can separate the effects.
It is also always nice (somewhat mandatory actually) to at least report (if not control for) the quality of the pitchers and batters in each group. Once you start looking at number of unique pitches thrown or seen, you could easily have substantial differences in the quality of the batters and pitchers in each group. For example, when I was looking at base stealing, I established two groups, one was pitchers who allowed a lot of steal attempts and the other was pitchers who did not. To my surprise the latter group were much better pitchers, based on wOBA against (i.e, not even considering SB/CS against). If I had not controlled for pitcher quality in my research, I would have been in trouble.
That is certainly possible. Joe's data below suggests that that might be true. It is on my list of things to do!
I have no idea. I imagine that this is a universal thing and probably largely subconscious, but then again, I am no cognitive psychologist. It would be interesting, as you say, to look at individual teams. Other than random fluctuations due to sample size issues, I am guessing that all teams have roughly the same patterns, both on offense and from the pitching side.
Oh, and one more thing. I don't think that DIPS is like the TTOP. I don't think the pitcher has much control over the TTOP. I don't think it has much to do with him. I think it is almost entirely about batters simply getting used to the pitcher, and I don't think that a pitcher can do much about it. That makes it very different from DIPS.
In any case, your "opinion" doesn't matter. The math speaks for itself. If all you know is a pitcher's past times through the order numbers, the math tells us that we can't use that to predict the future. If you want to argue with the math, be my guest.
Yes, I did address this with the correlations. Like DIPS, there may be pitchers who have their own unique "times penalties" (or not), but we can't tell from their past results, even for many seasons. That is what a very low correlation (.03) tells us, be definition - that a pitcher's past differences (between times through the order) has almost no predictive value. So you can't contradict that when the math is almost irrefutable.
" think, as well, this understates the risks of overtaxing a bullpen and the stacked effect that can build long term."
Well, I'm not advocating anything. I'm simply giving and explaining the data. What a manager wants to do with that is up to him - not me. But I think that it would behoove managers to understand this phenomena in order to make those decisions, don't you?
"A reasonable person might consider Middlebrooks to have still been in the act of attempting to field the ball. He had not moved on to another action."
I don't believe that is correct. The rule book defines two ways that a fielder can be "in the act of fielding a ball."
One, it is a batted ball. That one does not apply. It was a thrown ball at this point.
Two, it is a thrown ball. OK, this one applies. So let's see what the rule book says about that:
Comment: If a fielder is about to receive a thrown ball and if the ball is in flight directly toward and near enough to the fielder so he must occupy his position to receive the ball he may be considered in the act of fielding a ball. It is entirely up to the judgment of the umpire as to whether a fielder is in the act of fielding a ball. After a fielder has made an attempt to field a ball and missed, he can no longer be in the act of fielding the ball.
So, the umpire may consider receiving a throw to also be "in the act of fielding a ball," BUT the fielder has to be "about to receive a thrown ball." The refers to the times that a fielder is waiting for the ball and blocks the runner. In this, he is not waiting for the ball when the "block" occurs.
Finally, even if we were to concede that he was, "in the act of fielding a ball" buy virtue of being "about to receive a thrown ball," the end of that comment applies:
"After a fielder has made at attempt to field the ball and missed, he can no longer be in the act of fielding the ball."
So, no, I don't think it is reasonable to consider Middlebrooks to still be in the act of fielding a ball, at least according to the rules, which is the only think that matters.
I wonder if some of the changes in strike zone size are not really changes in size, such as with the count. Let me explain.
Even though you are using 1x1 inch grids, there is some wiggle room within each square, plus there are calibration and reliability issues with the pitch f/x system itself.
If in certain counts or other situations a pitcher is throwing more to the edge of the strike zone (like with slower pitches too, and even faster pitches, since they tend to be controlled less), is it possible that the pitches on the edge of the zone are more "outward" within those borderline 1x1 squares, making it appear as if the strike zone were smaller, even though it isn't?
In other words, what if a "perfect umpire" were calling the strike zone, such that it is always the same regardless of the situation (batter, count, outs, etc.). And what if pitcher A was trying to throw pitches right in the middle of the zone, and pitcher B was trying to throw pitches only at the edges. I contend that with pitcher B your method would produce a smaller strike zone because on those 1x1 squares on the edges of the zone, with pitcher B there would be more pitches located on the outer edges of those squares and our perfect umpire would call those balls, whereas with pitcher A, there would be more pitches actually located on the inner part of those squares and more likely to be called strikes.
Your method would assume that all pitches in those borderline squares should be called a ball or strike the same percentage of time for both pitchers.
Basically, I think that the scatter plot of pitches is a big factor in terms of the measured (using your method, which is probably the best you can do) size of the zone.
Russell, I asked this in your last column about momentum down the stretch, and I'll try again.
When you run these regressions, and you have run lots, especially when you try several independent variables, are you not going to occasionally find significant p values by chance?
It's sort of a rhetorical question because I am pretty sure the answer is yes. How do you reconcile that? You can't assume that they are in fact the result of a true effect, right?
In other words, don't you have to to look at your results in the context of your entire body of work and then recalculate your p-values?
For the reader, let's say that in the course of the last 5 years I do 100 studies and unbeknownst to me, none of the effects I am studying really exist. By chance alone, 5 of those will have p values that are significant at the 2 sigma level (or 2.5, depending on whether it is a 1 tail or 2 tail test), and 1 of those at the 2.5 sigma level. Accordingly, I will falsely conclude that several of the effects that I am studying are real.
Russell does a lot of these kinds of studies. He also tests a lot of variables in each study. By chance alone he will find some significant results (where no effect actually exists). He reports them as likely being real effects, because his P-values are based on one test and one test only.
Russell, you and your colleagues have done so many of these types of studies, have you ever thought about the notion of accepting too many erroneous "positives" for two related reasons:
One, you do so many studies that some of them will produce false results even with significant or close to significant P values? Don't you have to look at each study in the whole body of similar work?
Two, surely there is publication bias on top of that (you do a study, find nothing, and are hesitant to write an article about it, for good reason), although the first one is enough to produce lots of false positives.
To those of you who don't know what the first one is, consider this:
Of all the studies on sample data where we are relying statistical tests to determine whether an effect really exists or the measurement is just a random artifact (in this case estimated by his P values in the regressions), of which there are tens if not hundreds of thousands, many of them get "positives" by chance alone. That is by definition.
If I conduct or may people conduct a total of 100 studies and publish all the results AND unbeknownst to us, the effects that we are looking for do not exist, what is going to happen? Well, in 4 of those studies we will likely conclude that an effect exists (reject the null hypothesis) at 2 sigma with a 2-tailed test, and even at 2.5 sigma, we are left with 2 positives.
And that is not even considering the second thing, publication bias, which is that maybe 1000 tests were actually conducted and many of the negative ones never got reported and all of the positive ones did. And there are going to be a lot of false positives in those 1000 experiments. So of my 100 published tests, I might have maybe 20 or 30 false positives!
One antidote to this, which I also must ask Russell if he uses, is Bayes. Do you have any a priori going into these studies, even if they are rough estimates? I suggest that you should. What are they? Well, for example, if the body on research in the past suggests that psychological things have very little effect on the outcome of a baseball game, and I think that is a fair statement, then we should very much go into a study like this with a priori that suggests that our hypothesis is likely not true. If we do that, even in a conservative fashion, I think you will find that your P values will not hold up.
What say you? You are the expert at these things, not I.
I know this probably sounds like sabermetric arrogance, but if I want to know what to expect from a player tomorrow or the next week or down the stretch or the rest of the season, I go to Fangraphs.com and look at their ROS projections from Steamer and ZIPS (and now BP's Pecota has them too). What they have done this season, this week, this month, 2 weeks into the season, a month into the season, or 5 1/2 months into the season interest me about as much as and is about as useful as the last episode of the Real Housewives of New Jersey.
Writers, fans, and commentators would do well to do the same. Of course, then there would be nothing to write or talk about, in their world at least.
The funny and ironic thing is this: Sabermetric types get maligned for spending too much time in front of their computers and not enough time watching game and players.
Have you ever watched a player like Gattis, or Francouer, or Vernon Wells in the middle of a torrid streak and you weren't thinking about "hot" they were? It's easy for me to do, since I don't usually know a player's current season stats and if I do, they mean nothing to me. Anyway, if you watch Francouer after he's had an OPS of .920 after the first month of the season, it is NOT hard to see that he sucks. You will see nothing different in his approach, and likely his results. He will look like the same old craptastic hitter he has been the last 5 years. Of course if you buy into that bullshit, when he gets a hit, which he will around 25% of the time, you will say, "See, you can't stop the guy! He's on fire!"
Try a similar experiment with a pitcher. Seriously, watch a crappy pitcher who has pitched 3 or 4 shutout innings. While the commentators will be extolling his virtues and telling you how well he is pitching tonight, and how the manager should "ride him" as long as he can, watch him for the next few innings as if you don't know how he has done so far in the game. It is amazing how he will look like the same crappy pitcher you knew and loved before the game started.
"If it does, and the batter is willing to bunt, AND if the WE for the batting team is not greater than no shift, then the defense should not shift in that situation."
Should be "greater" and not "not greater."
Well, the only question whether the bunt attempt yields a higher WE than hitting away with a shift one, given the bases/outs/score/inning state (as well as the identity and hand of the pitcher), etc.
If it does, you bunt. If not, you don't.
If it does, and the batter is willing to bunt, AND if the WE for the batting team is not greater than no shift, then the defense should not shift in that situation.
Is it just my overactive "skeptical gland" or does Ortiz' spray chart by no means make it obvious that he is going more the other way this year? Even random issues aside, they look pretty close to me. In fact, it looks like a lot more singles to left field LAST year than this.
Here is a link to another brief discussion we are having on The Book Blog about the shift and game theory:
His delivery both now and earlier in the season looks awful to me. That is somewhat confirmed by the plot of his release points but I don't actually know how that compares to the average pitcher or the average pitcher with a similar velocity.
However, the bottom line is that if his velocity is up 1.5 to 2 mph, and everything else is roughly equal, then we should see a marked increase in performance (true talent-wise) to the tune of around .4-.6 runs per 9, according to this article:
In other words, you can speculate all you want about his mechanics and even the reason for his uptick in velocity, but if he continues to throw at 94-95 rather than an 92-93, then we should expect much better performance.
Let's face it, the guy had 2 great years, 09, and 10, and 2 awful one, 2012. The rest were good. If he can keep up a velocity of 93-95, I expect him to be good. Not great, but good. With that delivery, in my opinion, the only way to be a great pitcher is to throw 98.
I have done some private research which suggests that teams should almost always bring the infield in, even with runners on second and third early in the game.
Obviously, it depends on the exact inning, score, batter, pitcher, runner, park, etc., but it was not surprising to me that teams play the IF back too often. Anytime there is a risk averse strategy available, it is likely that teams/managers will choose that strategy too often. In fact, you can bank on it.
As far as this article is concerned, I have a few thoughts:
One, I have never heard of a "shift" in the OF. In fact, I am not aware that teams are doing anything different in the OF than they have ever done. So I am not sure what Harrel is talking about and what the Astros may or may not be doing in the OF. It does look like the RF and CF are playing very far apart which is an unusual positioning. You typically see all the OF'ers playing straight away, or shifted to one side or the other. I would be skeptical if it would ever be correct for one OF'er to position himself in one direction and the other(s) either position themselves differently, regardless of what a spray chart might suggest. In other words, even if a spray chart suggested that a particular hitter hits balls down both lines or in both gaps, I would think that that was just a fluke. I don't think hitters have the ability to do anything with the direction of their batted balls other than to hit them toward one side or the other. I could be wrong about that.
"Farrell told the Saber Seminar audience that Overbay tends not to hit grounders hard the other way, and that his grounders that go for hits are almost always pulled or up the middle."
I mean, that is pretty much true for every hitter. Some a little more or less than others, but that is basically the profile of all MLB batters.
Finally, this: If a shift made fielders uncomfortable such that their range or throwing were adversely affected, it would show up in the numbers. According to Dewan at least, teams that shift get lots more outs. That is all they should care about, right. Now, whether other things are affected that DON'T show up in BABIP is another story altogether, and is still an open book...
Nice work Russell...
We need to do a lot more research in the effects of bullpen workload. We (at least I) have an idea that managers let starters, especially poor and mediocre ones, pitch too long when they are having a good game, given the strong evidence that pitchers fare a lot worse the 3rd and later times through the order, even when they are "cruising."
However, without having some idea as to the advantages of saving your bullpen, it is difficult to know how long to leave in your starter given that hey tend to fare worse and worse as the game goes on, no matter how they are pitching...
Russell, wait a minute. If a pitcher in the high pitch count game, gives up 1 run less per 9 (which is probably conservative), then in the next game, his average rpg, even if there is no effect from the previous game, is going to be around 1/30 rpg less than his seasonal average, right?
You are finding an effect of roughly the same amount! So where is there a residual effect? Is my math wrong?
Good work, but count me in the skeptic's corner as well. Not that I would have expected there to be a benefit to having elite fielders - I would have expected no effect at all. But, a negative effect? It doesn't smell right to me. I suspect it is bias in the data.
I haven't thought about how it might affect your analysis, but we are pretty sure (from work by Colin and Brian Cartwright and others) that there is range bias which shows up as making elite defenders look worse than they are and bad ones better. For example, if a SS with great range gets to a ball in the SS hole (56), the stringer will tend to record that ball as being closer to the "6" zone than it was.
Good stuff Pizza! I wonder how much teams use this sort of stuff. Seems like it would be very worthwhile.
Pitching inside should have been a dying art and probably should come back a little, for three reasons: One, when the strike zone was changed such that pitches above the belt were rarely a strike, then pitching inside became less effective. In order to pitch inside effectively, you usually need to throw it high and inside, although some sinker ball pitchers, especially those with a lot of arm side movement, do throw down and in.
Two, as batters became stronger and quicker, the ability to turn on an inside pitch became greater, thus reducing the value/effectiveness of the inside pitch.
Three, with players wearing protective armor, they were more likely to dig in on inside pitches and also to take the HBP on inside pitches.
That being said, there are a few reasons why a pitcher would or would not throw inside a lot. One, if you don't throw hard, you are much less likely to want to throw inside of course. Two, if your ball sinks more than it "rises" you are more likely to keep the ball away from the batters. And three, perhaps most importantly, if you do not have good command, especially of the fastball, you generally CANNOT pitch inside! Why? Because you will hit too many batters and you will leave too many pitches middle/in which is the zone in which pitches go to die.
If Scherzer indeed does not have good command with the fastball, then no matter how hard he throws, he does not want to come inside for the aforementioned reason.
According to the article, he comes inside 19% versus a league average 25%. Let's not get all giddy and pretend that is a huge difference. Anderson saying, "with almost nothing on the inner third is certainly unique...," is more than a bit of hyperbole. To me, 19% is not "almost nothing," especially when the league average is only 25%.
This is also hyperbole by the author: "Despite Scherzer's obsession with the outside corner..." Can he show us the percentage of pitches on outer third as compared to the average pitcher? I am guessing that the difference is not nearly enough to use the words "obsession with the outside corner."
Finally, if you have great stuff, which Scherzer does, you do not have to mix up your location (or hit your location) as much as pitchers without great stuff. Again, 19% and 25% is not a huge difference. Much ado about nothing, IMO. My guess is that any successful pitcher without great command of the fastball does not throw inside a whole lot.
Great stuff Russell! I have a few questions/comments:
You are comparing how they did in each game (one game only) compared to how they were expected to do given the pitchers they are facing that day (including platoon effects)?
To determine the expected outcome, what time frame did you use for the batter's expected stats - his season stats for that year?
The rest effect might be greater than observed in your regressions since I think we have to assume that players who get days off are sprinkled with players who got a day or two off because of some injury they are battling.
It would be nice to re-run your regressions and including only players who had a day or two off because his team did not play. Then do the same thing with players who had days off when their team DID play. Or code into your regression a dummy variable signifying whether a player had a day or two off by manager choice or because the team did not play.
Finally, whether a manager giving their players a day off or two is an advantage or not depends on how much they lose for that day in the substitution and for how long the rest has a positive effect.
Pizza, from your regressions are you able to estimate how long the effect lasts? You said that two days off in the last 7 was worth 3 points in OBP versus no days off in the last 7. But for how long is that advantage? You are referring only to the game immediately after the 7 day period, right? What about the the next week? Two weeks? Month? Will we see that 3 point advantage hold up? Do we see that advantage slowly bleed away? If so, what does the decay look like? How many days off per week or month of play would we need to sustain a 3 point OBP advantage compared to a player that never takes a day off?
Since all players get days off due to scheduling and rainouts, it is not clear to me that giving players extra days off is an advantage. On that day off, what if the sub is 30 or even 50 points in OBP worse than the regular? Now if you have to give the regular a day off every 2 weeks just to sustain that 3 point advantage, you are breaking even.
So, really, we need to know the gain in X number of games after a day off in order to see whether giving players days off is an advantage, and then, it depends on how much worse the substitute is. Just telling us that on one day there is a 3 point advantage is not very useful, unless I am misunderstanding what the regression says.
One reason why I don't particularly like these regressions, BTW, at least by themselves. You also need to show us a chart of some data that says, "Here is the OBP delta (actual minus expected) of all players who played the previous 7 days. Here is the OBP delta of players who had at least 2 days off in the previous 7."
You should also show us, "Here is the OBP delta for a week, and then 2 weeks, and then 3 weeks, etc., of all players who played 7 straight days prior to the first game in the week, 2 week, 3 week, etc., series, and compare that to the players who got the days off." Again, this enables us to see how long the rest effect is sustained without another day off.
Of course, even then, players who got a day off would likely tend to get more days off, so you would have to control for that as well.
This was a great notion and good research. At the same time, I think it requires a good deal more analysis before we start declaring that managers should or should not do this or that with regard to giving players days off...
Nice job guys!
Ditto what everyone else said and what they were thinking!
The first thing that I find confusing is the idea that any fatigue is caused by sleep deprivation. Sleep deprivation? Where did that even come from? I have never heard of that in relation to fatigue in sports.
I've always assume that you may get fatigued because you are playing almost every day. I have never heard of players being sleep deprived during the season. Most of your games are at 7:00 and you don't arrive at the park until 3 or 4. Even for day games, you arrive at what, 9 or 10?
Why would a baseball player be sleep deprived as the season goes on? If he stays out all night during the baseball season, I imagine that he would do he same in the off-season.
So, tired from playing every day, fine. But sleep deprived? Huh?
Anyway, as far as the actual study, if you are not, at the very least, controlling for the pool of players, which apparently they did not, then I don't want to even hear about the results! This needs to be done using some kind of "delta method". Of course. Any study that looks at a player effect, which is most studies, needs to control for the pools of players. That is especially true when your effect is over the course of the season when the player pools change considerably. To not control for the players (batters and pitchers) is just ridiculous.
Ditto for the weather. Can't the weather be controlled for using the fixed effects methods that all the statisticians/sabermetricians seem to be using these days?
And finally, batter and pitcher approaches change as the season progresses, for various reasons. Whether they change for the better or worse (or the same), and why, is often impossible to ascertain. Some of these changes may have something to do with fatigue (but not sleep deprivation!) or maybe not.
In order to implicate fatigue as a culprit, first you want to show that the changes are a detriment to the hitters, which, as I said, is not easy, and second, you pretty much have to rule a lot of other things out before you conclude that the effect is a result of fatigue, and even then, it is just a guess.
Finally, if hitters are experiencing fatigue as the season progresses, and pitchers don't, we would see a decline in offensive stats, after controlling for weather? Do we? I actually think we do, at least for April versus the rest of the season, but I have always thought that it was because "hitters were ahead of pitchers" early int he season. If it were due to fatigue, I think we would see a gradual but continuous degradation from April to September. I don't think we see that.
Russell in your calculations, are you controlling for the pitchers, batters, weather, lighting, etc.? I would think you would have to. Entering the game after it has started implicates all kinds of contextual differences. Without controlling for them, I am not sure how reliable any results would be. What do you think?
Also, the pinch hitting penalty is easily explainable by two things: One, and probably the biggest factor, is that a pinch hitter faces a pitcher one time. Two, they tend to be injured or tired players.
I don't think there is any particular reason why a sub fielder or one who switches positions would do any worse than one who enters the game, although I suppose you could create an argument similar to the batter's situation where the fielder gets used to the park, lighting, batters, as the game goes on. I guess there IS a reason to think that a sub fielder might have some kind of penalty. Again, I am not sure that your test is valid one way or another without controlling for the changing context as the game goes on, unless somehow you did do that with your logit method...
Seriously, Russell? If a team fires a hitting coach, the hitters have been underperforming. If they hire Mickey Mouse as his replacement, they will do better because of regression. You mention this and then essentially dismiss it.
How can you draw any conclusions whatsoever without controlling for regression, which is inevitable?
Very nice article, Ben. I think you are one of the best "statsy" writers on the internet.
That is an excellent quote from your article on managers. I agree, by the way, that this is a pretty large inefficiency in the baseball market. When ex-players who were internet savvy and sabermetric friendly while they played become coaches and managers, that is when we will see sabermetrics really come to fruition at the manager/coach level. We are not there yet. Even young managers like Ventura and Matheny are too old for that.
Oh, one more thing. Those "10 rules for an optimized lineup" are kind of a joke, aren't they? Similar to those rules, here are mine: "Bat your best hitters at the top and your worst hitters near near the bottom, unless you don't."
Being a little picky, but it could lead to some confusion:
An "off-speed" pitch in baseball refers to any pitch that is not a fastball. It surely includes curve balls and sliders.
If you want to refer to change ups and split finger fastballs, you pretty much have to say, "change ups" and "splitters."
Right, if a bad faith response or argument helps to cause analysis to move forward, than so be it. Sometimes that happens. It is not always legitimate discussion by intelligent, knowledgeable, and reasonable people that moves science in the right direction.
Yes, you do. Of course you do. If a player is zero for his career, and he has a +15 per 150 in half a season, it is likely that we overestimated the difficulty of his chances.
If that player is Brendan Ryan, and he has a +15 half way through the season, it is more likely that that is what happened.
Moral of the story: Don't combine two numbers when one is regressed or does not need to be regressed to tell you what you think it tells you, and the other needs to be regressed to tell you what you think it tells you or want it to tell you.
And that warning is even stronger when you have a small sample size!
I've said this many times: I don't like the idea of combining UZR and offensive metrics (to get WAR). It is combining apples and oranges. It is a little like OPS (a false combination), but a lot worse.
The simple reason is that UZR needs to be regressed a lot more than the defensive metrics (and probably a lot more than UBR or other baserunning metrics).
An unregressed offensive metric tells you basically what happened, but obviously it does not reflect the best estimate of talent without regression. (Big mistake that lots of folks make is equating the two).
Unfortunately, the way that UZR and DRS are constructed, unregressed numbers tell us neither what happened, nor do that reflect our best estimate of talent.
If you want to combine UZR (or DRS) and offensive metrics like lwts, you would need to regress the UZR some amount to reflect an estimate of what really happened (less regression than you would if you wanted to estimate true talent).
Unfortunately that is not done, so you end up with a "hyprid" monster in WAR which includes a pretty much exact measure of what happened on offense (translated into theoretical runs/wins - which really mean nothing, BTW) AND a rough estimate of what happened on defense.
The result is two things: One, a number, WAR, which represents what happened on offense, plus a rough estimate of what happened on defense. And any time the defense number is less than or greater than our estimate of the player's true talent, we should assume that the defense numbers (e.g. UZR) is too high or low in terms of what happened.
There is just no getting around regressing UZR before combining it with offense. If you don't do that, which no one does, then, yes, you have to take those WAR numbers with a large grain of salt, especially when the UZR component of WAR is far away from our estimate of that player's true talent UZR. Even if UZR is not far away, the error bars around the UZR, in terms of an estimate of what happened, are large - or at the very least, they exist. There are virtually no error bars around the offensive component of WAR, at least with regard to what they represent - exact offensive results translated to theoretical runs.
At the end of the year, we can probably live with the unregressed UZR part of WAR. 1.5 months into the season - nah...
"Put him in a situation in a tight game where they’ve got to throw him a fastball and he’ll turn it around and do some damage.”
What situation is that, exactly? Where a pitcher HAS to throw a guy who can't hit off-speed pitches a fastball? Bases loaded with a gun to his head that is set to go off if he walks the batter?
Sometimes the crap that comes out of scouts' mouths baffle me. If anything in a tight game, you will see fewer fastballs, right? In any case, in a tight game, and pretty much any time for that matter, a batter is likely to see lots of stuff that he has trouble hitting. If Paul can't hit anything that spins (which I doubt - seriously who can hit a well-placed off-speed pitch?), then that's pretty much what he is going to see.
And how many times have you heard about a player who can crush a fastball, especially a young player? Pretty much everyone that has ever come up to the big leagues. It's kind of like a catcher being "tough as nails."
One more thing in the category of, "You don't know what you don't know." And one more reason why, if you think that sabermetrics has figured out nearly everything there is to figure out, well, you are porbably wrong.
Nice job Ben and company.
Correct. If you are going to throw more pitches out of the zone, you throw more fastballs.
If you are trying to throw more off-speed to a batter, you will necessarily throw more pitches out of the zone, but in this case, it is pitchers trying to throw more pitches out of the zone then trying to throw more breaking pitches, although the two are related (when pitching around a hitter, you can do both).
To give an illustration of how this works, if I am definitely trying to pitch around a batter, any batter, say, with a base open and 2 outs in a close game, I might throw all breaking pitches nowhere near the heart of the plate. I don't mind the walk but I also might get the batter to chase one of those hard to hit breaking pitches.
However, when I am facing a batter with no protection in the lineup, I will throw him lots of fastballs on the corners because in general, I do worry about walking him. Just because a good hitter has no protection in the lineup does not mean that I want to walk him a lot - hence lots of fastballs, but not in the middle of the zone. As opposed to when you really don't care if you walk him, you throw lots of breaking pitches out of the zone, and hope for him to chase, but don't mind if he doesn't. And fastballs out of the zone typically do not get chased (other than the good high heater) to the extent that a breaking pitch out of the zone gets chased, especially in pitcher's counts.
A simple, and probably at least partially correct, explanation for him taking more pitches in the zone is that if a batter is seeing more pitches out of the zone, game theory dictates that he take more pitches in general, including those in the zone. As proof, imagine that a pitcher throws 95% of his pitches out of the zone. What would you do as a batter? Take every pitch! That would include the 5% in the zone, and even the 1% or .5% that were right down the middle...
I've been meaning to ask something this for a long time. Is a "net strike" one extra strike and one fewer ball (a strike replacing a ball, like an out replacing a hit in a defensive metric, and thus worth .27 PLUS .50), or is it simply one extra strike with the balls held constant?
Russell, I don't understand this:
On the third point, if BABIP is far from the league norm over the last 100 BIP (say it's .240), then from a variance explained point of view, the recent personal history of the pitcher is more important than the league average.
In my uneducated (from a statistics perspective) brain, if the recent history is MORE IMPORTANT than the league average, that implies that you would regress a pitcher's last 100 BIP BABIP less than 50% toward the league mean in order to predict the BABIP of his next few BIP. Clearly, that is not the case. Give me a pitcher who is .240 through his last 100 BIP and I will show you a pitcher who is .2997 (or whatever) through his next 10 or 20 or 100 BIP, where league average is .300 (for pitchers with similar profiles, like GB rate), after adjusting for the opposing team, his defense, and the park. So I don't understand what you mean by "more important" or even what the those relative percentages in the chart mean.
"...the velocity of that change up actually went down (while the fastball velocities went up)"
I did not mean to imply that a decrease in change up velocity is a bad thing. Actually the optimal speed of a pitcher's change up is a very individual thing. He doesn't want it too fast or it becomes too similar to the fastball, and he doesn't want it too slow or it becomes batting practice. And it all depends on the deception of course. The more it looks like a fastball vis a vis the pitcher's motion, the better it likely is. And of course the optimal velocity of the change up depends on the speed of the pitcher's fastballs. A pitcher who can throw 98 might have a 92 mph change up and a pitcher who only throws 88 might have an 81 mph change up.
Typically the difference between the change up and fastball is less than the difference between the fastball and curve (but more than the difference between the fastball and slider), but not always. Again, it depends on lots of things, not the least of which is the slowest a pitcher can throw the change up without altering his delivery to give it away (basically you throw it exactly the same way you throw the fastball but with a grip which slows it down and imparts less spin).
I agree. Very good point. If your fastball velocity increases and consequently you use it more, the effectiveness per fastball may not increase as much as you might expect or it might actually go down. This is true even if you are increasing that percentage by the correct amount. It is all about trying to optimize your mix of pitches, according to game theory.
If that effectiveness of your fastball goes down, and you are not overusing it, then the effectiveness of your other pitches would have to increase to more than offset the decrease in effectiveness of your faster fastball.
For Price, according to the pitch f/x data on Fangraphs, he had a huge jump in his 2-seam velocity from 2010 to 2011, by 3 mph (part of that might be pitch classification errors of course). So you would expect that the effectiveness of that pitch would be much, much better. The value of that pitch, however went down!
Why? Perhaps it is because he doubled the usage of that pitch, from 17 to 34%. If you throw a particular pitch twice as often, you would expect that the value of that pitch would plummet, since the batters can look for it that much more often. The fact that the value (per pitch) only went down from +12 runs to +10 runs is a testament to the fact that it was that much of a tougher pitch to hit, at 3 mph faster.
And if you look at his changeup, the value of that pitch went up from +2 to +10 runs! If you double the frequency with which you throw one of your fastballs, then the change up is going to be that much harder to hit, and it was. Plus, the velocity of that change up actually went down (while the fastball velocities went up) AND he doubled the frequency of that change up as well! So it is amazing that the value went up so much.
Basically what I am saying is that increasing the quality of a certain pitch (in these cases, by increasing velocity) is only one part of the equation. If a pitcher changes the frequencies of all his pitches, then all kinds of interesting things might happen. A pitcher's overall effectiveness is a combination of his individual pitch quality (as determined by velocity, movement, location, and deception) plus the game theory aspect of pitch frequencies given the count, score, runners, inning, fielders, and batter (plus the ability of the pitcher/catcher to "read" a batter on that particular AB).
Good job Russell!
A few comments:
1) I agree that the 50 IP year II threshold is problematic, and I would not be comfortable drawing any conclusions without redoing the study while correcting this.
2) I looked at this as well a few years ago, and "concluded" that the Verducci Effect was merely regression toward the mean. Pitchers who had an increased workload tended to have very good seasons in year I. Very good seasons means they were "lucky" (as a group) and thus were going to to appear as if they got worse in year II due to regression. If one uses a control group, one must be sure that the control group was equally good in year I OR one must control for regression, perhaps by comparing actual and predicted (via Marcel or Pecota, etc.) performance in year II.
3) In testing this or any hypothesis, one must be careful to use "out of sample" (to the original hypothesis) data. In other words, if Verducci noticed his effect for players in, say, 2006 and 2007, one must remove that data (those years) in testing his hypothesis. Let's say that I "noticed" that in 2010, the HFA in baseball was much higher than usual, so I hypothesized that HFA was increasing in MLB (perhaps due to the decrease in greenie use). If I want to test this hypothesis, I cannot include the 2010 season, since obviously that will confirm it.
You mean pitching using a slide step? All pitchers throw from the stretch with runners on first, second, first and second, or first and third. Not sure what you mean...
This is a great analysis:
"*Even without an anointed closer, Leyland showed little inclination to mix and match. It made sense to leave Coke in to face Raul Ibanez for the first out of the ninth, but not to let him face Russell Martin and Alex Rodriguez, the next two batters. Leyland’s explanation for letting Coke face Martin was that “the numbers said [Martin] has not hit lefties that great,” which certainly isn’t what either his career splits or his 2012 splits suggest—Martin has hit southpaws far better this season. His explanation for letting Coke face A-Rod was that “Granderson was on deck, and you get a lefty for him.” Assuming Drew Smyly was available—he pitched two innings in Game One—it didn’t make much sense not to bring in Joaquin Benoit, Octavio Dotel, or Al Alburquerque for Martin and Rodriguez and use Smyly to get Granderson, especially with an off day on Monday."
When sabermetricians talk about "closer by committee" they mean using the best relievers, given the handedness of the batter or batters (and other characteristics like a pitcher's expected K% or GB%, when particularly needed), given the leverage of the situation. For example, Ben's suggestion for the 9th inning, use Coke for Ibanez and then one of the righties for Martin and A-Rod, is using a "closer by committee" correctly.
(And, BTW, if you bring in Benoit to face Martin and A-Rod, you can also use him for Grandy, since Benoit not only is an excellent pitcher, but has almost no platoon split.)
Leyland's idea of a "closer by committee" is to pick someone else to pitch the 9th and keep him in there regardless of the handedness of the batter, as long as he is "hot." I mean, wasn't that the intention of Leyland - to keep Coke in the game until he let up a run or a couple of base runners? And didn't he primarily start Coke in the 9th because he had successfully retired a few batters in the 8th (going with the "hot hand")?
So, let's not confuse the sabermetrician's concept of a "closer by committee" and that of a manager of Leyland's ilk.
One more thing: The primary disconnect with how a sabermetrician might use a pen and how a manager like Leyland might use one is in the notion of who matches up well against whom. Managers consider pitchers "good" when they are "hot" or rack up lots of saves and sabermetricians go by projections based on 3 or 4 years of performance. Managers consider lefty/righty matchups based on batting average for the current season, with no regard to regression. And finally, managers take great stock in batter/pitcher past results - for example, 4 for 6 against a particular pitcher, or 0 for 8, is deemed very significant for a manager (and will help dictate a decision) whereas the sabermetrician would pay no attention to any sample of batter/pitcher results.
So a manager like Leyland may try and bring in the best possible pitcher in high leverage situations but he really has no idea how to determine that.
From a pitcher's perspective, the SO is reduced in value (as opposed to a batted ball out, not a batted ball), since it is only one out. Also, the value of the walk is greater than in a neutral situation (for the batter) since it advances the runner. So, a pitcher would want to throw more strikes and pitch to contact, thus reducing both walks and K's (they typically go hand in hand from the perspective of a pitching approach).
And of course, like the sac fly, it is likely that pitchers are indeed trying to pitch a little lower in the zone to induce a DP but the batter is trying to keep it in the air to avoid the DP, both approaches cancelling one another out.
Then again, I don't think that pitcher trying to induce a ground ball and the batter trying to avoid a ground ball is as salient an effect as in the sac fly situation, where the batter is REALLY trying to elevate the ball, and the pitcher is REALLY going for a K and sometimes a ground ball, since a runner on third is a lot more important that a runner on first, generally speaking.
Colin, are you assuming that those 22 extra HR are coming from outs, or from singles, doubles, and triples? The reality, I think, is that they come mostly from outs but also from some singles and extra base hits (mostly doubles and triples).
Also, with a larger field, you have the outfielders playing deeper, which means a few extra singles to the short part of the outfield and base runners advancing more on balls hit to the OF, so moving fences in is not going to increase scoring as much as we might think based upon the extra HR.
In any case, if we assume that the fence change will increase run scoring from .11 to .20 runs per game per team, and we split the difference (.15), then we increase the PF for Safeco by around .15/4.40, or .034.
I have Safeco with a PF (run factor) of .9 going into this season, based on data since its inception. That is the second lowest PF in baseball (Petco is .83).
So .9 to .93 is not that much really. .93 will still be the 4th lowest in baseball, after SD, OAK, and SF, again according to my PF database.
It is not easy to make a park with a large foul territory (around the 6th largest in MLB), at sea level, and cool temperatures into a hitters or even a neutral park.
And it doesn't really suppress HR all that much. If you look at the parks database that Colin references (Seamheads), you will see an overall HR factor or around .9 or so, which is not so extreme. For the average full time player for the M's, that is less than 1 HR per year!
This is an interesting way to look at the development curve (correlations from one age to the next). I like it.
At the same time, don't these results mimic the standard aging curve, whereby there is a steep upward curve to around age 26, then a plateau (with a slight downward slope) until age 29 and then a fairly steep downward curve?
If we assume that every player has a similar curve only it shifts right or left, then we would expect the same correlations that you get.
Interesting that the mostly positive numbers indicate that the speed on the 0-0 count is significantly less than at the "average" count. I am not sure that I would have used the 0-0 count as the baseline rather than the average pitch speed across all counts.
Miller is one of the, if not the, most extreme pitchers' umpires in baseball, which basically means that he calls all sorts of pitches out of the zone, or at least those that most umpires think are out of the zone, strikes.
Nitpick: FIP went UP also, albeit by a smaller amount than ERA.
Jay, correct me if I am wrong, but I assume that the pool of pitchers in each group are not the same. In other words, some of the pitchers who got 1-5 or 1-7 saves did not go on to have any more opportunities and hence they were not in any other groups. Obviously anyone in a larger group was also in a smaller group.
If my assumption is correct, then your results are simply due to selective sampling, no? Some of the pitchers in group 1 pitches badly (blew a good proportion of their save opportunities) and were not allowed to have a save opportunity anymore, right? So, of course, the lower groups will have a lower save percentage. This tells us nothing about their talent withe respect to being able to close out a game. Nothing whatsoever.
So, if you are trying to find some evidence that experience matters, you are not going to, using this methodology. I'd have to say that this statement you made:
"The way many managers, mainstream media types, and even fans discuss the role, it would seem to, and to a limited extent, the data backs that up."
is false. The last section at least.
In fact, if you make sure that you use the same pitches in each group - in other words, if you worked backwards by only using the pitchers in the last group, you will likely see the opposite result - that these pitchers, as a group, who went on the get at least 22 or 26 save opportunities, were lucky in the beginning. You should find their save percentage in the first 5 or 10 games (group 1) to be higher than later on (the latter groups).
Isn't FRAA relative to that position, so shouldn't each position have a cumulative FRAA of zero, be definition?
If not, how do you measure defense NOT relative to a particular position? IOW, how would I know whether LF'ers were better or worse defensively than RF'ers without looking at players who played both positions?
Say we removed all LF'ers in MLB and replaced them with high schoolers who had plastic surgery to look just like the MLB'ers they replaced. How would we know that they were worse fielders than the RF'ers? Explain to me, please, how FRAA would know that they were worse?
Molto grazie Max! Fantastic, stellar stuff! Keep up the great work!
I don't see "Fair Ra" in the glossary anywhere. That appears to be an old glossary with terms that are not even used in your new spreadsheets, like all the "Eq" stuff...
Great job! It is nice to see BP so responsive to the needs/wants of their readership!
Max, are your spreadsheets (with all catchers and their numbers) for each category available? Thanks!
"...BB includes IBB but not HBP."
It does not appear as if it does. For most players, you can't tell from looking at the numbers, since most players have few IBB. But look at Fielder and Pujols. Fielder is projected with 92 BB+IBB in 687 PA. He typically gets IBB'd 25 times per season. That leaves 67 NIBB. That can't be right. Same thing with Pujols.
Also, you did not answer my question about singles and AB.
Downloaded the spreadsheets. For batters, there is no hits or singles column that I can find (I assume B2 and B3 are doubles and triples - is it too much trouble to use the letters D and T or 2B and 3B?). Am I missing it? I can almost infer them from the BA and PA but there is also no AB column and it is not clear if AB is PA-BB (there are no SH and SF and ROE, etc.).
Also does BB include IBB? What about HBP?
And finally, are all the numbers assuming player plays half his games in his home park? For minor league players, do the numbers assume he plays half his games in his minor league park as well?
Mike, great work, but I still think that the hit and run is virtually never correct and I think I found a fatal flaw in your analysis:
You see, Mike, I don’t think you can combine the two types of base runners. If you do, of course you are going to come up with a postive result (for the hit and run) as long as there are enough good base stealers in your sample (even if those were indeed hit and runs with those runners at first and not straight steals, although clearly some of your hit and runs are actually straight steals, especially with good base stealers on first).
Rather than compare the hit and run to no hit and run, you should be comparing the hit and run to:
1) a steal by those base runners and no hit and run.
2) no hit and run with a non-base stealer at first.
The ONLY thing we care about is what happens with a non-base stealer at first!
We already know the answer with a good base stealer at first. The answer is that a hit and run is better than no hit and run (with the base runner not going), BUT a steal with no hit and run is likely better than that!
So if the answer is that a hit and run with a non-base stealer at first is wrong and a straight steal (not all the time) with a good base stealer at first is better than a hit and run, then you would still get the results you are getting, but a hit and run would still be never correct!
Does everyone understand that?
"Would I want to pinch-hit for Hiroki Kuroda in the sixth inning down by a run so that I could get more work or innings for Matt Guerrier and Mike MacDougal? When the guys I'm batting are Trent Oeltjen, Dioner Navarro, Jamey Carroll, Justin Sellers, Eugenio Velez and others who primarily made up LA's bench last year?"
That is a good question. So let's orchestrate an answer. You seemed to have made your mind up already, based on what? Your gut instinct? Your gut has nothing to do with it.
You already have the batting line for an above average pitcher. Kuroda is a below average one, so that would have to be adjusted. You would then have to create a batting line for one of those fine batters you list as the Dodger PH'ers. Then you have to do some work to see the difference depending on the bases/outs state that you are contemplating. Then you would take that difference and multiply it by the LI, which is based on the score, inning, bases, and outs. Now you would need to compare that to the number of runs you are gaining or losing by replacing Kuroda with Guerrier or MacDougal for the average number of innings that Kuroda will pitch after he bats (I suspect around 1.25).
Have you done all that? If not, how would you know the answer to your own question?
One of the (many) ways that you have come to an incorrect conclusion regarding your own answer is by assuming that Kuroda is better than those two relief pitchers in true talent. When a manager has to make a decision like that, he obviously doesn't know how these pitchers are going to pitch for the remainder of the season, so he has to actually do a projection in his mind (or on paper I suppose).
Let's quickly look at a projection for Kuroda and Guerrier:
Kuroda 3.57 ERA
Kuroda 3.79 nERC
Guerrier 3.57 mERC
Hmmm. It seems like Guerrier is the better pitcher right off the bat. So, we would surely like to replace Kuroda with Guerrier when Kuroda is facing the lineup for the 3rd time, whether he comes to bat or not. So we don't really need to go through all of those calculations above...
Similar to what Tango says above, we also present evidence in the book that relievers in general can pitch more innings without giving up any performance.
Again, managers can leverage this strategy by using it less when their bullpen has been taxed lately and using it more when they need work or are not overly tired. Managers probably use this anyway as one of their criteria for letting the pitcher hit or not.
I suspect that the 3 principle reasons that managers would be resistant to this strategy, from most to least important (to them), are:
1) They think that their .221 thus far pitcher will continue to pitch at near that level.
2) They underestimate the value of the pinch hitter, especially in certain situations, like a 1 out bunt situations (BTW, an above-average hitting pitcher bunting with 1 out is a terrible strategy - even so with many 0 out situations).
3) They are fearful of over-taxing their bullpen.
Bullpen management is a legitimate concern as I state 2 times in the article.
Certainly the manager can add to the requisites for using this or a similar strategy (a quicker hook for the starter when he comes to bat in high leverage situation) that he has a reliever available who is better than the expected performance of the starter.
BTW, there are many examples where the starter who was allowed to hit was an awful starter (since the average starter in that bucket was .258 (TAv against), there are plenty of much worse ones), such that even a replacement reliever would be better. Surely in those cases, Bill's objection doesn't hold much water, again assuming that the bullpen in general is not overly taxed from recent overuse.
You typically don't call that a hit and run as many people have already pointed out, since, by definition, a hit and run means the runner goes and the batter must swing at any pitch other than one in the dirt.
It is simply the runner going in order to stay out of the double play.
The batter does absolutely nothing different, although one of the small negative consequences of a runner going on a 3-2 count, who is going to get thrown out more than the BE point for a steal (in this case, being down 2 runs in the 9th, the BE point is 90%+), is that the batter is forced to swing at more borderline pitches, since a K is now worth around 1.5 outs (more or less, depending on how often the runner is safe on a K) so the difference between a K and a BB is larger than usual, thus the BE point for how borderline a pitch has to be before you swing/take is different.
In a typical situation, you send the runner on 3-2 in order to potentially accomplish 2 things. One, stay out of the GIDP. Two, advance the runner an extra base more frequently on a hit (and occasionally two extra bases on a single).
As well, in a typical situation, the runner being safe some percentage of the time on a K (the ML average is around 52% on a 3-2 count), is also a plus of course (obviously the runner getting thrown out more than cancels that out).
In this case, advancing the extra base or the runner being safe on a K adds very little win expectancy, since his run means nothing.
The advantage of him running, therefore, which is obviously known by LaRussa (as much as I deride his intelligence), is staying out of the GIDP ONLY.
Again, the downside is that you increase the DP on a K of course, you add a few line drive DP, you perhaps distract the batter, and you force the batter to (correctly) swing at some more borderline pitches.
So which is better?
Unfortunately (for all those who posted here, on The Book blog, on FG, BBTF, and everyone else around the country with an "opinion"), you cannot figure it out without "the numbers!"
No amount of explanation, common sense, logic, or baseball experience or acumen will enable you to figure out which is correct - run or not run.
On the other hand, it is simple to do the math and figure it out. Are there some variables that we don't know or we cannot quantify exactly? Yes, as always there are. Does that preclude us from coming up with an answer? As usual, no it does not. Why? Because we can always set some upper and lower limits on the variables we are not sure of, such as the distraction to the batter.
Now here is the important part:
If it looks like the answer is "Run" at the upper and lower boundaries, then the answer is clearly "run." If the answer is "don't run" at both boundaries, then the answer is clearly "don't run." If the answer is "run" at one boundary and "don't run" at the other, then you can flip a coin or argue until you have to go to the bathroom.
From what I have seen of the numbers, the answer is probably "run." I have not done the calculations myself, and I have not seen rigorous ones.
Again, if you want to argue, please argue with numbers and not with rhetoric, opinion. emotion, voodoo, or snake oil. This particular decision, like most, cries out for "numbers." And as it turns out, it is relatively straightforward and easy to do. Again, no amount of logic or baseball experience is going to get you the right answer (other than accidentally).
Colin, great stuff. Minor point: The numbers are apples and oranges, of course, because the pool of pitchers faced, and other "environmental" factors, are going to be different (e.g., when the pitcher bats, he is likely facing the opponent pitcher early in the game, whereas when a pinch hitter bats, he is likely facing a reliever late in the game), but your point is well taken and clearly explained.
The two points are, one, you give up a substantial RE/WE letting your pitcher bat late in (or even in the middle of) a game in a high leverage situation (the LI was 2.51 in this game), and two, you likely don't even save/gain anything by allowing your starter to remain in the game, especially when you have lots of good reliever options (as in the post-season) and you don't have to worry about taxing your bullpen.
My rule has always been this: Always pinch hit for your starter after the 5th inning in a high leverage situation (the high leverage takes care of the fact that the game is likely close and there are likely runners on base) unless your starter is elite (like a Halliday or Lee), and even then, still pinch hit if the inning is past the 6th or 7th, unless a sac bunt is a viable option, such as you are down a run, up a run, or the game is tied, and there is a runner on first or second with 0 out.
We would really like to know these various managers’ SB%, preferably in the different score/bases/outs states. Knowing their SBA rates, while it tells us how often they like to run, tells us nothing about the value that that particular philosophy provides to their team. You would also have to wonder how each manager’s preference for the hit and run affects the SBA rate…
"In the end, it seems that once a player hits the majors, the book is out on him. Certainly there could be cases where batters suddenly stop getting pitches grooved down the middle, but in general it's not happening often enough to show up in the data. As a whole, each player tends to see a similar approach throughout the entire season, rather than a sudden—or even gradual—in season adjustment to the swinging tendencies."
Let's be careful about these generalized conclusions! You are looking at only a coarse and specific "approach" by pitchers - namely pitch location distribution and even then only as measured front he center of the plate or in and out of the zone. There are all other kinds of "approaches" that may change against rookies as the season goes on. The one you typically hear about is the percentage of fastballs or off-speed pitches thrown. Even then, one would have to be careful about lumping everyone together as if one group of players were thrown more off-speed as the season goes on and another group were thrown more fastballs, overall, the percentages would not change, as one would offset the other. Same thing actually with your study. What if as the season went on, some rookies were thrown more strikes and others were thrown less, depending on their perceived strengths and weaknesses?
Good stuff Rob. I love that you included a control group because the first thought that came into my mind was what if Pecota under-projected all such similar players.
In any case, I wouldn't discount such small effects. You seem to trivialize them in your closing remarks. It is so hard to find ANY effect what it comes to these things, that when you find a small effect like this, you should celebrate. Of course, the difference between the control and subject group might not even be statistically significant (I have no idea of the standard error), but even if it isn't, that doesn't mean of course, that it is not real. It just means that we are less certain that it might be.
Of course, there is reason to think that it is real. If a player declares that is in the BSOHL that generally means that he is not injured. So all other players (including those in your control group) probably have some injuries and it is not surprising that their performances is less than their projection. And also not surprising that an injury-free subset of players (BSOML) slightly outperforms their projections, since any projection engine if designed for all players on the average, including those with current and past injuries.
As well, for a player to declare that they are in the BSOML suggests that they may have had a past injury. Unless Pecota accounts for that (likely does not), you also expect any player with past injuries but currently healthy to outperform their projection...
I would start by separating day and night games for a start, although, as I said, fatigue in day games might be a confounding factor.
Otherwise, just run some kind of regression on pitch speed and temp controlling for the other factors of course.
As I personally hate regressions, I would simply split each pitchers games into 2 groups for day games and 2 groups for night games - below a certain temperature and above a certain temperature. You would have to do it for home games, otherwise the stadiums would be a confounding factor (the warm games would tend to be in different stadiums than the cold games).
If you did that, you would have to control for time of the year as well, otherwise your warm games would mostly be in the middle of the season and the cold games at the beginning and end, which could be a confounding factor as well.
Shouldn't be too hard to find a way to see effect of temp on overall velocity and the trends during a game.
Didn't someone have an article a while ago on pitch speed and temp? Also, someone can Alan Nathan could tell you the physical effect of changes in temp on pitch speed due to the differing air density, but that would not include any effects on the pitcher (like being looser in warmer weather or less fatigued in colder weather, etc.).
If you are asking me (MGL), I don't know. That is what I am asking!
Jeremy, does the chart which shows a general decline after a short uptick in the first 20 pitches control for the pitcher? If not, then the chart is suspect since there is likely a different pool of pitchers at each pitch count.
Also, what effect do you think temperature has on this? It would be nice to separate out day and night games to see whether you can tease out the temperature effect, although fatigue in day games might come into play also.
Great stuff as always Alan!
I agree with Tompshock and disagree with I75Titans. Allowing a team to store their balls in a climate controlled situation is just asking for them to cheat. It is too easy to do. If you allow all teams to do so, there will be a fair percentage who will cheat from time to time, whether they get caught or not. It shocks me (sort of) that MLB did not oversee the use of the Rockies' humidor from the outset. The other teams should have insisted on it.
Alan, supposedly in 2005, they started storing the balls for a longer period of time and perhaps removing them closer to game time. You will see two reductions in scoring/HR, I think. One, in 2002, and another in 2005. So I think the data should be bifurcated.
I was waiting for you to compare the 5th year performance to what would be expected from a Marcel or other projection. But, alas, it was not there. In similar studies by Tango and myself, we found that the performance after the drop off year was about that expected, a weighted average of years 1-4 with an age adjustment. If that is true, that pretty much ends the discussion, doesn't it? And also makes any of these types of discussions and speculations pretty mundane. No matter what players do in terms of breakout years or extremely down years, or anything in between, our best estimate of their performance in the future (post-season, next year, last week of the reg season, etc.) is simply a Marcel-like projection? If that is the case, which I think it is, we would have nothing to write about, huh?
You have a fatal flaw in your methodology:
"This would likely remove the selection bias, since a couple weeks right before the break are unlikely to affect whether an individual is asked to participate in the Derby (especially because the participants are selected about a week before the All-Star Game)."
While any period of time AFTER the selections are made should be unbiased, ANY period of time whether it be one week or 3 months BEFORE the selections will be a biased sample of players who have gotten lucky.
Well, I just read Chuck Brownson's article at BTB. Here is what he does:
"I then subtracted his actual stolen base attempts from the expected number and multiplied it times the run value of the SB (0.175) times the likelihood the runner would be caught stealing (CS%)."
That is completely wrong of course. For some strange reason he is assuming that all catchers "should" have the same SB attempts (per inning I guess) with their own CS% and then crediting or debiting them the difference between what they "should" have and what they do have. That is ridiculous of course. For example, if a catcher is 1/2 in 900 innings (around 100 games), he is going to assume that they "should have" been 35/70 (or so) rather than 1/2 and he is going to give them 12.25 "rep runs" or so, which makes no sense. None whatsoever. Where did he save 12 runs? Similarly if a catcher is 3/3 (0 CS%), he is going to assume 70/70 and dock him 10.5 runs or so. Again, makes no sense.
Eric, you should not have even mentioned "rep runs" in your article. You devote an entire paragraph to it, clearly implying that that it is part of a catcher's value. You could have mentioned that some catchers are so good that no one runs against them and therefore, paradoxically, they derive little or no value from their good arms, but you didn't. In fact, you implied that these catchers have more value than is being captured by the traditional SB/CS numbers, which is not true. If you didn't mean to imply that, why would you have even mentioned it, let alone devote an entire paragraph to it, based on an obscure and incorrect methodology by someone whom I have never even heard of?
Right you just referenced the article. I was just wondering what the methodology or rational was. You seem to be familiar with it.
You say, "it matters little." It matters none, at least as far as quantifying the catchers' value. In your example, the 30/100 catcher probably allows zero net runs or so (assuming an overall 70% BE rate) and the 15/30 allows maybe +3 runs. The catcher who is 0/0 has zero net runs of course. And, as I said above, those numbers have to be further adjusted by the net runs allowed by the average catcher if we want to compare catchers to the average catcher although there is no great reason why we have to do that (sum the league to zero).
Colin, very nice job pointing out the process of cherry picking (it is a controversial and complex issue).
Two interesting things about "flukes" are:
When one observes a fluke, often part of the fluke is that the player looks awful. In other words, players (pitchers and batters) go through fluctuations in which they look and perform awfully. We call them "flukes" or "luck" or randomness, even though technically they may not be, just like the landing of a coin on heads or tails, is not technically a random event (it is a function of how it is thrown), it is properly treated as such. So, to declare that something is not a "fluke" (in statistical terms), because you saw the player and "things just look different" is not good reasoning or logic. Things may in fact BE different about that player's approach or technique or even health (or his psychology, such as WRT the beaning) for some period of time, but if there is little we can do to predict when it will start or end (and I am not saying that we can or cannot), then for all practical purposes, it may be treated as random, luck, or a fluke.
The other thing is that declaring something as either a fluke or not is a false choice and a large one at that. There an in infinite number of combinations of fluke and non-fluke that can describe or explain a spate of performance.
"Sorry, but I don't think that is necessarily the case."
It IS the case, and I am glad that Tango chimed in since I thought that I was the only mad one in a sane world.
drawbb, outfield arms get valued by the number of advances and the number of assists (the OF throwing out a runner on the cases), as compared to the average OF at that position. If no one runs on you, you get credit for no advances. If players run on you, you get credit when they don't advance a base, when you throw out a runner (more credit than a non-advance obviously), and demerits when runners advance the extra base.
So everything is accounted for, without any special adjustment or calculation for "deterrence" (runners not running on you).
It is exactly the same with catchers. You get credit for throwing a runner out (and pick-offs), and you get demerits for allowing a stolen base (and throwing errors on a steal). And everything gets compared to the average catcher. There is no need to adjust for "deterrence." If no one runs on a catcher, nothing happens, just as if stolen base attempts were not allowed, like in Little League.
The way the "adjustment" occurs, if you even want to call it that, is by comparing everyone to the average catcher in the final step in the computations. The catcher that no one runs on gets exactly zero net runs, but if the average catcher has -2 net runs per season (IOW, all base runners combined generate net positive runs), then the catcher against whom no one runs, gets +2 runs in credit. Interestingly, if base runners generate net negative runs (ran too much), which they probably did for many years up until the last few years, those catchers with great arms and zero net runs (before the league adjustment) would have to be credited with net NEGATIVE runs, a little bit if a logical anomaly.
The answer to that, by those good catchers that no one runs against, is that if runners want to generate net negative runs by running too often (and/or possibly at the wrong times), then these catchers need to actually "bluff" the runners a little by not showing such a great arm and encouraging them to run a little. If that is the optimal strategy for them, and they do not do that, then they indeed deserve to be charged with net negative runs even if they have great arms and no one runs against them. That is because in baseball, as in most sports, it is not only athletic talent (like a strong arm) which creates value, but good strategy as well.
So, I would like to hear from Eric (or others) and have them explain to Tango and me what this "deterrence adjustment" (in quantifying catcher value) is all about, as it makes no sense to me.
I still don't understand why you need a "deterrence" adjustment factor, or whatever you want to call it. If a catcher allows no SB attempts because he can throw the ball 1000 miles an hour, then you simply compare him to the average catcher. If, in fact, runners run too much such that the average catcher saves his team some RE or WE, then the catcher against whom no one runs is a liability, right? Regardless, I don't see how there is any need to make any adjustments for catchers who deter the running game. We know the value of a SB and we know the value of the CS. So you simply take each catcher's total SB and CS, multiply each by their respective values and that is the value of the catcher's arm. What difference does it make how many SB attempts the catcher allows? That will be included in the calculations. If an average catcher costs his team 1 run per 150 games, then the catcher against whom no one runs is worth 1 run per 150 games, relative to the average catcher. Again, no need to do any separate calculations or adjustments based on whether a catcher "deters" runners or not, as long as you normalize every catcher to the average catcher, which everyone is going to do of course.
In addition, the likelihood of lots of typos, spelling, grammatical, and syntax errors in the above post, is very high. I am sleep deprived...
That is a good analysis, Eric. I think all of your assumptions are reasonable - or at least as reasonable as they can be, given that we really don't know any more than what we know.
You are assuming that if he stopped switching that his platoon ratio would be a little larger than the average LHB. You are correct in that you HAVE to assume that any switch hitter would have a larger than average platoon ratio. If they had a small one, they would less likely to switch hit. If you decrease the chances of a small one, the best estimate of his ration has to be larger than average. Plus, certainly if he stopped switching, he would be "rusty" when hitting lefties from the left side. And 1.21 is not that far off from 1.15. Perfectly reasonable if not TOO conservative.
The other assumption you made, which is not really an assumption but a mathematical requisite, which is that he really has a true platoon ratio of 1.19 a switcher rather than his observed rate of 1.29. You HAVE to make that assumption of course. We have to regress that sample 1.29 the appropriate amount, as we do with any sample stat. After all, your goal is to determine whether Berkman would likely perform better IN THE FUTURE as a lefty only or a switcher. To do that, you have to compute a projection for his switch hitting platoon split, which necessarily requires the regression that you did. So, the 1.19 is completely the correct thing to "do" (as much as you can "do" a number!).
So your conclusion that, "in the future," his mostly likely OPS versus left-handed pitching is .851 as a switcher and .839 if he hit as a lefty all the time, is 100% correct. Of course there is great uncertainty around those numbers and you can probably compute the standard error of the difference if you really wanted to (which you probably don't!).
Plus, there is enough room between those numbers such that even if some of your assumptions were "wrong" (the one which is really "pie in the sky" is the 1.21, but as I said, that is conservative anyway), it would still be likely that he is better off switching.
Then again, given that he IS switching despite having a large observed platoon ratio, that is VERY strong evidence in and of itself that he is better off switching. Remember that most of these analyses are really Bayesian, and in this case, the a prior is strong in favor of the likelihood that he is making the correct decision in switch hitting...
Eric, there are a lot of nuances here that are tricky to work with.
For example, why you wouldn't just compare a hitter's poor opp handed TAv (or whatever stat you want) to his expected performance given how he hits from the other side and assuming a somewhat worse than league average platoon split (the assumption is that one of the reasons a hitter switches is because his splits would be large if he didn't PLUS he is not used to hitting from the same side as the pitcher).
So, for example, take Berkman's numbers against righties, and then estimate what he would do if he hit lefty against lefties and then compare that to how he does not (batting righty versus lefties).
Of course, there are several problems with that. These are selectively sampled players (for having large splits as a switcher). Therefore we can assume that they got lucky from one side or the other (their best side), so using the numbers from that side to estimate what they would do from the same side against same-hand pitchers is going to yield too optimistic of a result.
If we use both sides to inform us as to a hitter like Berkman's overall talent, by taking his switch hitter split and regressing it toward the average switch hitter's split, then we can estimate what he would do versus lefties and righties if he stopped switch hitting and then compare that to their bad side now. I think that is what we did in The Book.
If you did it that way, I think you would find that Berkman, for example was right on the cusp as to whether he should switch hit or not. For most marginal examples (like Berkman) the lynchpin is probably how big their platoon split would be if they didn't switch - only them and their coaches know that...
"If a pitcher, however, throws too many balls in the zone, given his stuff, it will turn out to be bad advice."
Oy, that should be "too many STRIKES..."
"The batter and pitcher have "choice" as to the equilibrium point."
That should be "no" choice...
Brian, the equilibrium point does NOT change based on the pitcher's proclivity to throw a certain percentage of first pitch strikes and the batter's tendencies to swing at first pitch strikes and balls. The batter and pitcher have "choice" as to the equilibrium point. That is determined mathematically and is based on the pitcher's overall ability in terms of his stuff and his control and the batter's overall ability to hit, take, and swing at all the different pitches in all the different locations.
The trick is for the batter and pitcher to figure it all out. And of course of the batter is not acting optimally, then the pitcher's optimal strategy is not equal to the Nash equilibrium point and the same thing applies to the batters if the pitcher is not acting optimally.
Russell this is great stuff. The point about the pitching coaches, which is lost among these comments, is a great one. That is, the axiom, "Get the first pitch over" or, "The most important pitch is strike one," like all one-size fits all edicts, is dumb and is counterproductive toward guiding pitchers toward an optimal strategy.
Now, if a pitcher happens to be throwing too many pitches out of the zone, given his abilities, then obviously this advice will turn out to be good in general. If a pitcher, however, throws too many balls in the zone, given his stuff, it will turn out to be bad advice.
But, the worse part is that it ignores the different optimal strategies that different pitchers should have overall, and even worse, it ignores that fact that each pitcher should have vastly different first pitch strategies against different hitters.
On the first point, that different pitchers should have different first pitch strategies, for example, a pitcher with good in the zone stuff should be much more likely to throw first pitches (at any count actually) in the zone.
On the second point, to a hitter like Vlad, all pitchers should be much more likely to throw a first pitch out of the zone.
Anyway, this is a great example of where "scouting" (in this case tutelage from a pitching coach) PLUS analysis (game theory) equals the best strategy for teaching pitchers how to pitch optimally. BTW, the analysis applies much more to pitchers than to batters. For batters, the big "hitch" in using game theory to suggest optimal approaches is the batter's comfort zone. If you suggest to a batter that the correct strategy for Vlad is to take more first pitches, even though that might be correct on "paper" (given all of his various abilities), he simply might not be comfortable with any other strategy but the one he is using, and his performance might actually suffer even though on paper it should get better. Not so much with pitchers, and that is primarily because they "get to go first" and for batters their strategy is always ultimately reactive rather than pro-active, which it is for pitchers.
Tim, great stuff! We talk about run environments affecting strategy decisions, based on RE and WE matrices all the time, but this is a great attempt at actually quantifying it as well as assigning an actual run environment to a particular class of pitchers and teams.
One thing I think you messed up:
"70 percent successful attempt (batter is out, runners advance one base), 23 percent unsuccessful attempt with either the lead runner thrown out or a strikeout, 3.5 percent throwing error by the defense that allows runners to advance more than one base, 2 percent double play, 1.5 percent fielder’s choice that results in all runners being safe."
Where are the hits??
According to The Book, all sac bunt attempts result in singles 12% of the time.
The actual breakdown (again, overall - it much depends on the batter and the inning, where the inning is a proxy for how much the defense is expecting the bunt) is this:
Batter out, runner advances: 48.4%, not 70%
FC, both runners safe: .6%
An out, no runner advance: 26.2%
A hit: 13.4%
And a few other various and sundry outcomes.
Those numbers include when the batter gets 2 strikes and ends up swinging away. If we just look at actual bunt attempts all the through the PA, the numbers change slightly.
I would guess that Russell's data and analysis would show slightly more predictive value for cold streaks, for the reason mentioned (injuries and the like). But, the noise is so great (small samples of 10 PA, 25,and 10 PA) that the signal (a few players who are playing injured) to noise ratio is going to be very, very, small.
In fact, the very small predictive value that he did find for hot and cold streaks could easily be explained by parks and weather, as well as injuries...
"You're absolutely correct if the overall goal is to literally specifically investigate just Garland and Pineiro, but this is more of using them as archetypes to discuss what teams (and what situations for teams) should be looking into guys with certain skill-sets. Garland is considered by many to be Mr Consistency while Pineiro is the crazy volatile Tracy Jordan character from 30 Rock."
Guys with certain skill sets? How about if we determine FIRST whether the distribution of expected performance IS a skill before we talk about which kinds of teams would benefit from which kinds of distributions.
Now, I am not saying that a pitcher who has been consistent in the past (e.g. Garland) is NOT more likely to be consistent in the future (i.e., that the distribution of his expected future performance is likely to be narrower), and vice versa, but you sure implied that we know. And contrary to your quote above, you sure implied that Pineiro is likely to have a wider distribution of expected performance than Garland for 2010. Again, do you know that that is true?
And, as I said in my previous comment, you looked at the aggregate performance in Year Next of pitcher who were inconsistent in prior years (actually, those who had an "uptick" in Year Next minus one). Don't you think you should have looked at that distribution of performance among all those pitchers and then compare that distribution to pitchers, like Garland, who were relatively consistent for several years?
If you found no significant differences between the two groups, in terms of that distribution, your article would be moot, wouldn't it? Again, I don't know what you would find (and I actually expect thee to be differences), but until I know one way or another, it's lind of a leap to be talking about pitchers like Garland or Pineiro might benefit some teams more than others even though their weighted mean expected value in 2010 might be equal. Isn't it?
You made an assumption which is not necessarily correct (as far as I can tell from reading the article) and then when you had the opportunity to investigate it within the historical data, you didn't.
The assumption you made is that a pitcher who has been consistent in the past (e.g. Garland) will have a distribution of expected performance in the subsequent year that is much narrower than a pitcher who has been inconsistent in the past (e.g. Pineiro). IOW, what evidence do you have that "we know what to expect from Garland" but that Pineiro's 2010 performance "might be all over the place" even if we expect them both to have around the same weighted mean performance?
To couch the question one more way, how do you know or even suspect that this is true:
"Garland’s expected value might break down as 1.8 wins coming 20 percent of the time, 60 percent delivering 2.1 wins, and 20 percent 2.4 wins. In contrast, Pineiro checks in with a spread of 0.5 wins 20 percent of the time, 20 percent yielding 1.3 wins, 20 percent of the scenarios producing 2.1 wins, another 20 percent with 2.9 wins, and finally 20 percent generating 3.7 wins."
When you looked at the history of pitchers with "inconsistent" performance like Pineiro, you should have looked at the distribution of their performance in the year following the uptick and compared that to pitchers who had similar career profiles as Garland. We need to know if there is any difference in terms of the spread of that distribution, and if yes, to what degree. If the answer is "little or no difference" then your whole thesis falls apart, right?
"If I only had five observations per year, then I'd probably get a lot of random variation and so not a lot of consistency within managers over the years."
Do you mean managers with 5 SB opportunities or 5 managers per year? I am talking about the former, of course, when I am talking about sample size. The number of observations will NOT affect the correlations, only the standard error.
You always say, "Think of an ICC as like a y-t-y correlation." But, as I originally said, the magnitude of a y-t-y correlation specifically depends on the number of "opportunities" in each year and without knowing that number, it means nothing. If I regress OBP on OBP from one year to the next, and I only include players with 100 or less PA each year, I might get a correlation of .25. If I only include players with PA greater than 400, I might get .60. So just saying, "My y-t-y 'r' for OBP was .5" means nothing unless I know the number of PA per year in my sample. (It is also nice to know the number of players or "observations" as that will help me to figure my standard error around the correlation.)
So if I have bunch of players in a bunch of years, and you tell me the ICC for OBP, again, that means nothing to me unless I know the range or distribution of PA in the sample, right?
Maybe I have it wrong. Maybe the ICC is sort of a combination of "r," as when we do a y-t-y "r" and the underling sample size. For example, if you have a bunch of players with samples of 400 PA and you do an ICC for OBP and you have a bunch of players with samples of only 100 PA, will you come up with the same ICC?
Pizza, we may have discussed this before in another venue, but since "r" is always a function of (the underlying) sample size (not the number of pairs in the regression), in your intra-class correlations, how do we/you know the sample size associated with your "r"? For example, if I were working with the same data you are, and I regressed first half on second half, I might get an "r" of .4, if I regressed one whole year on another year, I might get an "r" of .5 or .6, if I regeressed 5 years of manager data on another 5 years, I might get .8, etc. In this instance, you mention that the "r" was .538. Without knowing how many games (or steal opportunities or whatever the "unit" is) that represents, I have no idea whether .538 is "consistent" or not.
To those people that don't understand the whole issue of survivorship, I want to reiterate the fact that we are not really trying to include players in our resultant trajectories who do not actually play.
We are merely trying to balance out the players who do play a Year II, because they will have tended to be slightly lucky in ANY year (Year I) that is followed by a subsequent year. And when we use the delta method, we are only including player season in which there is a Year I and a Year II, so all of the players we are including will tend to show a false decline (or a false not-so-large increase) in ANY pair of years.
In order to account or adjust for that, we include ALL players, even the ones who do not get a Year II (for any given Year I), by creating a phantom Year II and using a Marcel-type projection for their Year II performance (which doesn't really exist). That way, we can simulate a random, controlled experiment, whereby all players are forced to play at least one more year at any age. That would be the only way we could really ascertain true aging curves and peaks - by either forcing all players to play until they are 40 years old or so, or by at least forcing all players to play "one more year" whether they were allowed to or not (and then use the delta method because we have players who have played only 2 years, 3 years, 5 years, 10 years, etc., and we want to include all of these players, unlike JC).
Actually forcing all players to play until they were 40 or so (and starting them in the majors when they were 20 or so) would not give us a very good answer either. That would answer the question, "What is the average aging curve look like for all players who had some time in MLB and were allowed to play until they were 40 regardless of how well they aged or how well they played?" That would be sort of the reverse of JC's sample, but equally biased.
Forcing players to play one more year, which is essentially what I am doing when creating those "phantom Year II's," creates a little bit of bias as well, because there are reasons why these players do not get to play in Year II other than the fact that they got unlucky in Year I (although that is definitely part of it for some of these players), but it is a good method to balance out those slightly lucky players who do get a Year II at any age. And actually using the "5 runs worse" method of regressing that JC does not like is actually a good way to counteract that bias.
So using all players AND creating phantom Year II's for non-survivors, and then using the delta method to construct an aging curve, I believe is by far and away the best method of answering the question, "How does the typical MLB player age?" where "typical" means all players combined, from the ones that have a cup of coffee to the ones who play for 5 or 6 years, to the ones - as in JC's sample - who have long and illustrious careers.
Phil, yes, absolutely. I just went back to my program and changed one number in the code. That number is the mean towards which I regress the non-survivors in their Marcels. I changed that number from -5 to 0 and also to -10. It does not change the peak age or the trajectories very much at all. It is still 28 in the modern era (I arbitrarily define that as including any player season after 1979 in my data).
As I said, JC's criticism of the "5 runs worse than average" turns out to be a red herring, as whatever I use does not significantly affect the results.
Really, the survivorship problem is not as large as I thought it would be. If I do not include these players (who do not have a Year II), and thus my remaining players are a little lucky in all of their Year I's (that is the problem with not including the non-survivors), there is essentially a plateau from 27 to 28 (a .1 run increase from 27 to 28 actually).
Once I include the non-survivors and use the "5 runs less" for the mean that I regress towards, the 27 to 28 interval shows a .4 run increase (rather than .1 run without the non-survivors).
If I do not use "5 runs less" for the mean (I simply use a standard league average), that 27 to 28 interval is now a .5 increase rather than .4.
If I were to use "10 runs less" for the mean rather than "5 runs less", I get a .3 run increase for that same interval.
So, the peak age and overall trajectory is not very sensitive to the mean I use for the regression in the projections for the non-survivors.
Well, JC wanted a response to his criticism on that topic and I have provided a very adequate one I think.
"The projection is their last three years lwts per 500 PA, weighted by year (3/4/5) added to 500 PA of league average lwts for that age minus five. In other words, I am regressing their last three years lwts (weighted) toward five runs lower than a league average player for that age.
While Lichtman believes using five runs below average generates a "conservative" projection, the substitution is just a guess informed by nothing more than a hunch. In this case, the guess imposes the outcome for the exact factor we are trying to measure: the estimated decline is a pure product of the assumption. Thus, it is no surprise that Lichtman's adjusted delta-method estimates yield results that differ little from his raw delta-method estimates."
JC completely mis-characterizes or does not understand what I was doing.
He seems to imply that I am assuming a 5 run decrease for all of the "one-year" players (those who do not get a Year II).
I am not. I am assuming a Year II performance equal to a basic Marcel projection. While aging is or should be a part of a Marcel, so that it is true that I have to make some aging trajectory assumptions in order to construct the projection, the "5 runs worse than average" is the mean that I am using in the regression that is part of the Marcel (the projection). That is completely different than assuming a Year II which is 5 runs worse than Year I. That would be ridiculous. And that is what JC is implying that I am doing, I think.
Normally a Marcel regresses toward a mean which is the league average performance of similar players (age, size, etc.). The reason I used a mean (to regress towards) that was runs worse than a "generic" mean was that these players who do not see the light of day in Year II tend to be fringe players and therefore the means of their population are likely worse than the means of the population of all players.
In fact, if anything, I think I used a conservative (too optimistic/too high) mean. I contemplated using a mean which was 10 runs worse than a "generic" mean (mean of all MLB players).
Interestingly, as you can see from the charts in my articles, even using a "low" mean when doing the regressing, all of the players' projections in Year II were BETTER than in Year I until age 30. So these players actually showed a "peak" age of 29 or 30 (it is not a "true" peak because Year I is an "unlucky" year), which pushed my overall peak age slightly forward.
The most important thing is that whether I used a typical MLB mean for the regressing, 5 runs less than that (as I did) or even 10 runs less than that (which, as I said, may have been even more correct), it would not have changed my results. So criticizing that aspect of my work cannot indict the conclusions generated from that work.
Guys, in the pitch f/x data, did you control for that quality of the batters and for pitch count? IOW, is the control group matched up, batter-wise and pitch count-wise, with the non-control group, in the pitch f/x data, as it is in the "outcome" data? As you said, pitchers will tend to hit when the bottom of the lineup is coming up the next half inning and when their pitch counts are not high, at least late in the game. Early in the game, pitchers will tend to hit in the same inning as the opposing pitcher will tend to hit. So it is really critical to control for the quality and even identity of the batters, as well as pitch count.
The funny characters occur when I cut and paste my comments from another web site. I don't know what that happens.
It is not an ad hominem attack because it has nothing to do with my discussion of the substantive issue. An ad hominem attack or argument is when an “attack” on a person is represented as or is a diversion from a substantive argument. It is certainly OK to state, “And by the way, here is a comment on something else that the person said, which has nothing to do with the argument at hand.”
Did I have to explicitly say, “Warning! Time out. This next comment I am about to make has nothing to do with the argument at hand. Nothing at all. It is a side-bar.”
I’ve spoken my peace (or is it piece?) on JC’s research and on aging in general, and I don’t really have anything more to say on either issue, otherwise I run the risk of being even more redundant and repetitive than I already have been. And as always, I could be wrong on one or more of the assertions that I have made. Not to mention the fact that there is a lot of muddy water and gray area in this particular topic.
Doesn't this study boil down to the following statement?
"Players who play longer than average have later peaks than average."
And isn't that almost begging the question?
Yes and yes.
Not only do players who play longer have later "true" (underlying talent) peaks, but their observed peaks (which are not necessarily the same as their "true" peaks - for example, a player might have his best season at age 22 or at age 36) are also going to be later than their true peaks. It is sort of survivor bias in reverse. Players in JC’s sample tend to have gotten lucky all along the way, pushing the peak age forward as they do so and flattening out the post-peak part of the curve as they go forward in their careers.
"I especially appreciate the positive feedback."
When I write an article or do research, I especially appreciate the criticism. I have nothing to learn from the "pats on the back." But that's just me...
I should have said that the aging versus injury issue is a "red herring" rather than a straw-man argument, but my point is still the same...
As well, his injury versus aging issue is a straw-man argument. That is a completely separate issue (and a complex one at that) from the flaw in his study, which is generalizing a very small, biased sample of players to that of an average or typical MLB player.
One more thing. I have been trying to find where I made the comment, but I can't. I quickly looked at players who had played at least 5 years (I think) prior to age 27 and accumulated at least 2000 (I think) PA. So basically they are full-time players at the beginning of their careers.
JC's players are a subset of these players. Some of these (5/2000) players will go on to amass 10 years and 5000 PA and some will not. When I looked at all of these 5/2000 players going forward, I found a peak age of 27-28 and the same basic overall trajectory that I found for ALL players using my delta method corrected for survivor bias.
So obviously the subset of these 5/2000 players who do not make it to the 10/5000 level (JC's sample) peak earlier (and probably have a steeper decline after their peak) than the players who do, as you would expect.
Just more evidence that JC's sample is a biased set of players who peak later than the "average" player as well as later than the full-time player with 5 years under his belt.
Again, what purpose does it serve to determine the trajectory of this very small subset of players and why does JC refer to this trajectory as that of the "average MLB player" rather than a very small, biased, subset of players who play MLB?
I really don't understand his point and why this article is even called "How do baseball players age?" as if his very small subset of players represents the average or typical player. And why does JC think that his results are in opposition to that of myself, Tango, Bill James, and others who looked at ALL players and not just those who played for at least 10 years with at least 5000 PA? Obviously the smaller the subset of players we look at after the fact, and the longer and more prosperous the career, the later the peak age we will find and the shallower the curve after that peak, almost by definition. What is the point, I ask for the umpteenth time?
I use my aging research in order to help us with projecting player performance. Most or all of the other analysts that JC criticizes do the same, or at least that is implicit in their work. You can't possibly do that with JC's data.
I think that this is JC against the world on this one. There is no one in his corner that I am aware of, at least that actually does any serious baseball work. And there are plenty of brilliant minds who thoroughly understand this issue who have spoken their peace. Either JC is a cockeyed genius and we (Colin, Brian, Tango, me, et. al.) are all idiots, or...
As some of you know, I also wrote a two-part article on aging on THT. JC, in this article, references that work. Here is a comment I wrote on that site which I think aptly sums up this issue:
As I reiterate throughout the two parts, there really is no one-size fits all aging curve. And in practice, you are better off addressing each player on a case-by-case basis.
For example, if you have a 31 year-old FA that you are thinking about signing, you would want, at the very least, to look at the aging curves of similar players during the modern era - for example, full-time, 30-32 yo players who have played for X years already.
None of the generic curves I discuss, or JC’s, or anyone else’s, will be much help. You have to look at specific aging patterns for similar-type players, including such things as body-type, speed, injury history, etc.
In addition, a player’s own historical trajectory might give you some idea as to his future trajectory.
Really, the only 3 things you want to take away from this article, including the second part a-coming, is:
One, if you look at players who have already played for 10 seasons and many IP, as JC did, of course you get a very different aging curve than you would expect from any player before the fact, even in the middle of their careers. To extrapolate that to all players, most players, or even the “generic” player, sans the very part-time ones or the ones with very short careers, is ridiculous, as Tango, Phil B., and many others have already stated.
Two, the modern era appears to have a significantly different aging curve, probably for all players. i arbitrarily defined the modern era as post-1979, but it could be anything really.
And three, if you absolutely have to answer the question, “What does the average aging curve look like for MLB players, including those who do not have long and/or illustrious careers (and many of these part-time players DO reach their peaks), and what is their peak age,” the answer probably looks something like my last curve, at least in the modern era, although the one in the next installment after I adjust for survivor bias is probably more appropriate.
And that curve (for the “average” MLB player) is not unlike what we have thought all along - a fairly steep ascent until a peak of 27 or 28, and then a gradual decline which gets a little steeper if and when a player gets into his thirties and beyond. There is simply no way that we can expect a player (not knowing anything else about him, such as body type) who has not already finished a long career (or come close to finishing) to peak at 29 or 30, as JC suggests.
Of course it makes no real practical sense to talk about a player’s peak age and his trajectory after he has already finished his career.
I wrote some comments on The Book Blog. Colin is 100% correct. This trajectory has no useful value. It surely cannot be used for any projection purposes. It simply tells us the average "observed" (which is very different than "true," as I explain below) trajectory for very good players who had long and prosperous careers. Those players are a very small subset of all players at any age, especially at the younger ages (what percentage of young players end up having a career of at least 10 years and 5000 PA with at least 300 PA per year?).
So he comes up with an observed aging curve for a very, very small subset of players who by definition peak late and age gracefully (gradually). If we assume that all players have somewhat different "true" aging curves (if, because of nothing else, their differences in physiological versus chronological age), his subset of players is one that necessarily is going to have an aging curve with a late peak and a gradual decline - otherwise they likely would not have lasted that long and played as much and as regularly as they did.
In addition to that, and to make matters worse, the trajectory he found is not even a trajectory of "true talent." Because of his selection bias, it will necessarily be comprised of players who, by chance alone, had late peaks and gradual declines.
To illustrate that, let's say that all players have the exact same true aging curves. Now, if we let all players play 10 years, obviously by chance alone, some players will peak at 26, some at 32, etc. (even though they all have the same true peak). And some players will have steep post-peak (and pre-peak of course) trajectories and some will have shallow ones (in fact, every possible shape will occur if we have enough players in our sample), again by chance alone, even though they all have the same "true" shape. The players who peak late by chance and have a gradual performance decline by chance alone will tend to dominate JC's sample. Basically JC's sample (a VERY small subset of MLB players) consists of players who have true trajectories that peak late and decline gradually AND players whose "observed" peaked late and declined gradually, by chance alone. Is it any surprise that he finds a peak of 29 or 30 and a gradual decline after that? Heck if we look at players who played 15 years and 7000 PA, we are likely to get a later peak and more gradual decline still! Does the name Bonds sound familiar?
I am sorry, but with every fiber in my body, I think that it is ridiculous to characterize JC's resulting trajectories as "of the typical MLB player," or some such thing. It is an "observed" (as opposed to "true" - representing the changes in true talent of a player over time) aging trajectory of a very small subset of players who we already knew had long and prosperous careers. Nothing more and nothing less. Can someone tell me any practical use for this kind of data?
"That's actually the exact reason that I used the odds ratio correction method. I'm measuring outcomes relative to expectancies. So, if the player traded was an overall .400 OBP guy, the model knows that and expects him to be on base 40% of the time."
Pizza, if the players who are traded tend to be good, they also tend to be lucky, so any post-trade performance will regress, whether you use the odds ratio method of matchup expectancy or not.
Good stuff, BTW!
I too am waiting for a pitch f/x analysis of home and away. Mostly I want to know if umpires have a different K zone for home and away teams or make some occasionally biased calls.
Great stuff Eric! I think it is pretty critical that you break the numbers up into counts (and even game situations, like base runners, outs, score, etc.), or adjust for those things, although obviously you are going to run into serious sample size issues then.
Also, a batter's tendencies has a lot to do with their success when making contact or their ability to make contact. I realize that you are trying to look at these tendencies independent of a player's overall hitting success or their success when making contact or their ability to make contact, but the readers need to be careful about concluding anything about whether a batter is optimizing his approach without taking into consideration measure of success. For example a player like Valddy can afford to swing at pitches out of the zone because he is so good at it (making contact and hitting the ball hard when he does). A player like Castillo can not, because, for example, even if he were able to make contact on pitches out of the zone, he would not hit them very hard.
I'd be real wary of reading too much into team SBIP. Too much effect from park and the pitchers. While pitchers do not have that much control over BABIP, they certainly have more control over doubles and triples. In fact, there is a fairly strong correlation between a pitcher's HR rate and his extra base hit rate.
If anyone wants to check, I am going to guess that the best teams in SBIP also had a pitching staff which allowed fewer HR than average, and vice versa for the worst teams in SBIP.
I don't know about "love affair" but if you look on Fangraphs you will see that Chone, Marcel, and ZIPS had Sonnanstine's FIP projection (basically an indicator of the context-neutral expected performance of a pitcher) at around 4.00, which is very good. I am not sure of the scale of FIP (what a league average FIP is - probably around 4.30), but the same forecasters' FIP projection for James Shields was around .2 runs better and for Garza about the same. Pecota was a little worse, giving Sonny an eqERA projection of about .1 runs worse than average.
So basically, some of the "statheads" projected Sonny as anywhere from a little better than league average to slightly worse than league average, despite not being a high GB pitcher and despite having a lower than average K rate and a higher than average HR rate. That is because he was projected to have a fairly significant lower than average BB rate.
Anyway, Sonny has indeed pitched horribly this season and after watching him a few times this year, I wouldn't give a plug nickel for his services, for whatever that's worth.
A pitcher whose fastball averages 87 mph and is not a heavy ground ball pitcher is generally going to live on the edge...
FWIW, using Madson's projected platoon splits and that of Eyre and Taschner, Madson's projected "ERA" against lefties is 3.90 and Eyre's is 3.28, assuming that the lefty batter has average platoon splits himself. If he has more, then there is an even greater difference between Madson and Eyre. Taschner is 4.12 versus a lefty - he basically sucks overall, although is useful against lefties as compared to a RHP who is not all that great or has a large platoon split himself.
Most lefty pitchers are better versus LHB than all but the best RHP. That is why teams have LOOGYs in the first place. The traditional argument, "Why bring in a crappy LOOGY when he's, well, crappy," usually doesn't hold water since even a crappy LOOGY is pretty good against lefty batters and better than all but the best RH relievers.
But, as someone pointed out, as much as Joe would like to see Dunn be the last batter, that ain't gonna happen 30% of the time or so. So you have to factor into the analysis who is going to pitch to the following batter or batters that 30% of the time that Dunn does not make an out.
So, you have Madson versus Dunn, at a 3.90 "ERA", plus Madson versus the next batter 30% of the time, and the next batter after that, 8% of the time, or whatever it is, or Eyre versus Dunn at an "ERA" of 3.28 (much better than Madson's 3.90), plus someone else versus the next batter 30% of the time, plus the next batter after that 8% of the time.
I don't know the answer. It's probably a toss-up or reasonably close either way, depending on who else they had in the pen, if anyone, to take over for Eyre if Dunn gets on base. If you have to leave Eyre in there to pitch to the following RHB's, he is projected at 4.00 versus RHB, as opposed to Madson at 3.19. SO while you gain .62 runs with Dunn at the plate with Eyre rather than Madson, if you have to leave Eyre in after Dunn, you lose .81 runs 30% of the time for the next batter, and another .81 runs 8% of the time for the next batter after that. That is a net gain for bringing in Eyre of .32 runs per 9 innings, which is around .008 runs per batter. Multiply that by maybe 3 for the leverage and you gain maybe .002 wins - nothing to write home about.
Probably more important than those overall numbers are the relative values of the various events when you are making a decision about what relievers to bring in. Some pitchers are bad or good because they give up a lot or few walks or HRs or batted balls, or what have. The value of those events can be quite different depending on the inning and score and the player(s) at the plate. For example, with a 3 run lead to start the 9th, you want a pitcher who does not allow a lot of base runners. You don't care if he is a high HR guy. With runners on base, you want either a GB pitcher or a high K pitchers. In a one run game, especially with 2 outs, you want a low HR guy. Etc. Those are important considerations as well. How often have you seen a manager bring in his closer who is a high walks and high strikeout guy (low BA against) in the 9th inning with a 3 run lead and you intuitively cringe because you know that he is going to walk the bases loaded and then have to pitch out of a self-inflicted jam? Not only is a 3-run lead (LOW leverage}) a great time to save your closer for another (more important) game in general, but it is also a good time to bring in a lesser pitcher overall who is a low walks guy, even a low walks, high HR guy. If he gets in trouble, you can always bring in the closer anyway.
As you can see, one of the problems with taking out your best pitcher to bring in a lefty is that even if the lefty/lefty matchup is better, you don't have your best pitcher available anymore for the following batters, if they should bat.
The best use of a platoon matchup in favor of your closer, is at the beginning of the inning of course. How many times have you seen two lefty batters lead off the 9th inning, say Utley and Howard, and the opposing manager brings in his RH closer to start the inning? Assuming a decent LOOGY in the pen, the better move is usually the LOOGY for the first two batters and then the closer. Occasionally, you will see a manager like LaRussa, Scioscia or Pinella do something like that. I love it when I see it. Again, that is assuming that the LOOGY matchup is better than the closer/lefty matchup, which may not be the case if your closer is especially good, has a small platoon split himself, or the LOOGY is especially bad (like Taschner versus Madson).
Matt, what if all of these offensive rates, like OPS, HR/PA, etc. are multiplicative, or at least partially multiplicative and partially additive, rather than just additive, which I think they might be?
Or what if you use an odds ratio method, which I think might be the correct thing to do?
If home road difference is a fixed ratio rather than a fixed difference, and you correlate H/R differences from one year to the next, guess what? You are going to get a strong correlation. You will also get a stronger correlation for players with higher numbers, for the same reason.
Plus, to be honest, the park factor thing is really going to throw a monkey wrench into the equation. Even if you try and park adjust, I think. Using one year park factors is going to create a lot of noise and if you arbitrarily regress 50%, if that regression is not enough, you have a lot of noise, and if it is too much, you will get correlations just from the park factors alone. Plus, again, I would NOT be using Rockie players at all. Including them, whether you use park factors or not, is going to give you a positive correlation overall.
Anyway, we'll start with the assumption that what we consider to be a "fixed" HFA is that all teams have the same difference between their home and road WP (and it is not a given that we can't define a "fixed" HFA as a fixed ratio between home and road WP). We'll call that .08 (.54 -.46). So, for example, a good team with a .600 overall WP will be .640 at home and .560 on the road. The assumption is that that will also be true regardless of the level of offense or defense at home or on the road (I don't know if that is true or not either).
Now, what do we have to do with runs scored and allowed in order for that to be true? Let's say that we have a team that scores 4.5 runs and allows 4.5 runs. In order for them to have a .540 WP at home and .460 on the road, we would have to add .18 runs to their runs scored and subtract .18 from their runs allowed OR we would have to multiply their runs scored by 1.04 and divide their runs allowed by 1.04. Which way works better?
Let's say that we have a team that scores 6 and allows 6 overall. Their overall WP is still .500. If we add .18 runs to their home score and subtract that from their runs allowed, we get a WP of only .530. If we multiply their runs scored by 1.04 and divide their runs allowed buy 1.04, we get .539, which is near what we want.
So for level of offense and defense, it looks like we have to multiply runs by a fixed amount. What about for the strength of a team. Say a team scored 5 and allows 4 overall. They are a .610 team. We expect them to be .650 at home and .570 on the road. Let's do the same thing - try adding or subtracting a fixed number of runs and let's try multiplying home and road runs by a fixed number. Adding and subtracting .18 runs gives us 5.18 RS and 3.82 RA at home, which is a WP of .648 and on the road, it is 4.82 and 4.18, or .571 on the road, pretty close to what we want. What about multiplying and dividing by 1.04? At home, we get .646, and on the road, we get .572, so adding and subtracting seems to be better.
So, it is not real clear to me which is the correct way to do it.
To be honest, I think you have to do some sort of odds ratio rather than assuming a fixed difference or a fixed ratio. If you do that, you have to re-run your correlations. I am not exactly sure how to do that, but I'm sure someone can help...
I am afraid you're not going to find out anything here without controlling for the two things: One, the exact stolen base talent of the runner and two, the game situation.
The reason is that you cannot separate cause from effect. Obviously, no throws to first regardless of the number of pitches in the AB suggests that it is not a base stealing situation (maybe the game is a blowout or the batting team is down by more than a run late in the game). It also suggests that the runner is not such a base stealing threat, even if you only looked at runners with at least 15 steals (not all 15 steal players are created equal).
The other thing is let's say that throwing over to first had no effect on the runner attempting a steal or being safe on a steal? Why would the pitcher waste his time with throws? Because you can't pick anyone off unless you throw over!
Basically, I don't see the data telling us anything at all for the reasons articulated above.
A few things.
1) I think that the HBP against Rollins was in the Jersey and not the sleeve, no? I could be wrong.
2) Maddon should have taken Price out of the game after Howard, and he should not have brought him in until the lefties started hitting (Balfor should have pitched to Ruiz and Rollins). The reason is two-fold. Price had no fastball, at least according to the Fox gun. It was 92-93. It usually is 95-97. With that kind of fastball (not that 92-93 is bad) and his definite command problems in general, I don\'t think he is a particularly effective reliever versus RHB. Even with one day off, why burn him when you had plenty of relievers left in the pen. What you want to do in the post-season is to rotate your relievers as much as possible so that you have lefties and righties available every game, if possible.
3) Danley did NOT signal out and then ask for the appeal. That would have been ridiculous (OK, I would have thought that Eddings call would have been ridiculous also...). He started to call out and then changed his mind. There was nothing wrong with that. The home plate umpire should almost never call a batter out in a check swing. He cannot possibly see whether he went around or not. THat is not to day that Danley did not come as close as possible to calling him out with his hand, but in order to call a batter out, you usually raise and then close you hand and then say, \"Strike three, you\'re out\" or something like that. He clearly raised his hand to call him out, thought better of it, did NOT close his hand, let that hand motion towards first, and of course did not say, \"strike three.\" That is why Manuel did not argue much.
If Danely had actually \"called him out\" as Joe says in the article, Manuel would have gone through the roof, and justifiably so.