CSS Button No Image Css3Menu.com
New! Search comments:
(NOTE: Relevance, Author, and Article are not applicable for comment searches)
Right, if a bad faith response or argument helps to cause analysis to move forward, than so be it. Sometimes that happens. It is not always legitimate discussion by intelligent, knowledgeable, and reasonable people that moves science in the right direction.
Yes, you do. Of course you do. If a player is zero for his career, and he has a +15 per 150 in half a season, it is likely that we overestimated the difficulty of his chances.
If that player is Brendan Ryan, and he has a +15 half way through the season, it is more likely that that is what happened.
Moral of the story: Don't combine two numbers when one is regressed or does not need to be regressed to tell you what you think it tells you, and the other needs to be regressed to tell you what you think it tells you or want it to tell you.
And that warning is even stronger when you have a small sample size!
I've said this many times: I don't like the idea of combining UZR and offensive metrics (to get WAR). It is combining apples and oranges. It is a little like OPS (a false combination), but a lot worse.
The simple reason is that UZR needs to be regressed a lot more than the defensive metrics (and probably a lot more than UBR or other baserunning metrics).
An unregressed offensive metric tells you basically what happened, but obviously it does not reflect the best estimate of talent without regression. (Big mistake that lots of folks make is equating the two).
Unfortunately, the way that UZR and DRS are constructed, unregressed numbers tell us neither what happened, nor do that reflect our best estimate of talent.
If you want to combine UZR (or DRS) and offensive metrics like lwts, you would need to regress the UZR some amount to reflect an estimate of what really happened (less regression than you would if you wanted to estimate true talent).
Unfortunately that is not done, so you end up with a "hyprid" monster in WAR which includes a pretty much exact measure of what happened on offense (translated into theoretical runs/wins - which really mean nothing, BTW) AND a rough estimate of what happened on defense.
The result is two things: One, a number, WAR, which represents what happened on offense, plus a rough estimate of what happened on defense. And any time the defense number is less than or greater than our estimate of the player's true talent, we should assume that the defense numbers (e.g. UZR) is too high or low in terms of what happened.
There is just no getting around regressing UZR before combining it with offense. If you don't do that, which no one does, then, yes, you have to take those WAR numbers with a large grain of salt, especially when the UZR component of WAR is far away from our estimate of that player's true talent UZR. Even if UZR is not far away, the error bars around the UZR, in terms of an estimate of what happened, are large - or at the very least, they exist. There are virtually no error bars around the offensive component of WAR, at least with regard to what they represent - exact offensive results translated to theoretical runs.
At the end of the year, we can probably live with the unregressed UZR part of WAR. 1.5 months into the season - nah...
"Put him in a situation in a tight game where they’ve got to throw him a fastball and he’ll turn it around and do some damage.”
What situation is that, exactly? Where a pitcher HAS to throw a guy who can't hit off-speed pitches a fastball? Bases loaded with a gun to his head that is set to go off if he walks the batter?
Sometimes the crap that comes out of scouts' mouths baffle me. If anything in a tight game, you will see fewer fastballs, right? In any case, in a tight game, and pretty much any time for that matter, a batter is likely to see lots of stuff that he has trouble hitting. If Paul can't hit anything that spins (which I doubt - seriously who can hit a well-placed off-speed pitch?), then that's pretty much what he is going to see.
And how many times have you heard about a player who can crush a fastball, especially a young player? Pretty much everyone that has ever come up to the big leagues. It's kind of like a catcher being "tough as nails."
One more thing in the category of, "You don't know what you don't know." And one more reason why, if you think that sabermetrics has figured out nearly everything there is to figure out, well, you are porbably wrong.
Nice job Ben and company.
Correct. If you are going to throw more pitches out of the zone, you throw more fastballs.
If you are trying to throw more off-speed to a batter, you will necessarily throw more pitches out of the zone, but in this case, it is pitchers trying to throw more pitches out of the zone then trying to throw more breaking pitches, although the two are related (when pitching around a hitter, you can do both).
To give an illustration of how this works, if I am definitely trying to pitch around a batter, any batter, say, with a base open and 2 outs in a close game, I might throw all breaking pitches nowhere near the heart of the plate. I don't mind the walk but I also might get the batter to chase one of those hard to hit breaking pitches.
However, when I am facing a batter with no protection in the lineup, I will throw him lots of fastballs on the corners because in general, I do worry about walking him. Just because a good hitter has no protection in the lineup does not mean that I want to walk him a lot - hence lots of fastballs, but not in the middle of the zone. As opposed to when you really don't care if you walk him, you throw lots of breaking pitches out of the zone, and hope for him to chase, but don't mind if he doesn't. And fastballs out of the zone typically do not get chased (other than the good high heater) to the extent that a breaking pitch out of the zone gets chased, especially in pitcher's counts.
A simple, and probably at least partially correct, explanation for him taking more pitches in the zone is that if a batter is seeing more pitches out of the zone, game theory dictates that he take more pitches in general, including those in the zone. As proof, imagine that a pitcher throws 95% of his pitches out of the zone. What would you do as a batter? Take every pitch! That would include the 5% in the zone, and even the 1% or .5% that were right down the middle...
I've been meaning to ask something this for a long time. Is a "net strike" one extra strike and one fewer ball (a strike replacing a ball, like an out replacing a hit in a defensive metric, and thus worth .27 PLUS .50), or is it simply one extra strike with the balls held constant?
Russell, I don't understand this:
On the third point, if BABIP is far from the league norm over the last 100 BIP (say it's .240), then from a variance explained point of view, the recent personal history of the pitcher is more important than the league average.
In my uneducated (from a statistics perspective) brain, if the recent history is MORE IMPORTANT than the league average, that implies that you would regress a pitcher's last 100 BIP BABIP less than 50% toward the league mean in order to predict the BABIP of his next few BIP. Clearly, that is not the case. Give me a pitcher who is .240 through his last 100 BIP and I will show you a pitcher who is .2997 (or whatever) through his next 10 or 20 or 100 BIP, where league average is .300 (for pitchers with similar profiles, like GB rate), after adjusting for the opposing team, his defense, and the park. So I don't understand what you mean by "more important" or even what the those relative percentages in the chart mean.
"...the velocity of that change up actually went down (while the fastball velocities went up)"
I did not mean to imply that a decrease in change up velocity is a bad thing. Actually the optimal speed of a pitcher's change up is a very individual thing. He doesn't want it too fast or it becomes too similar to the fastball, and he doesn't want it too slow or it becomes batting practice. And it all depends on the deception of course. The more it looks like a fastball vis a vis the pitcher's motion, the better it likely is. And of course the optimal velocity of the change up depends on the speed of the pitcher's fastballs. A pitcher who can throw 98 might have a 92 mph change up and a pitcher who only throws 88 might have an 81 mph change up.
Typically the difference between the change up and fastball is less than the difference between the fastball and curve (but more than the difference between the fastball and slider), but not always. Again, it depends on lots of things, not the least of which is the slowest a pitcher can throw the change up without altering his delivery to give it away (basically you throw it exactly the same way you throw the fastball but with a grip which slows it down and imparts less spin).
I agree. Very good point. If your fastball velocity increases and consequently you use it more, the effectiveness per fastball may not increase as much as you might expect or it might actually go down. This is true even if you are increasing that percentage by the correct amount. It is all about trying to optimize your mix of pitches, according to game theory.
If that effectiveness of your fastball goes down, and you are not overusing it, then the effectiveness of your other pitches would have to increase to more than offset the decrease in effectiveness of your faster fastball.
For Price, according to the pitch f/x data on Fangraphs, he had a huge jump in his 2-seam velocity from 2010 to 2011, by 3 mph (part of that might be pitch classification errors of course). So you would expect that the effectiveness of that pitch would be much, much better. The value of that pitch, however went down!
Why? Perhaps it is because he doubled the usage of that pitch, from 17 to 34%. If you throw a particular pitch twice as often, you would expect that the value of that pitch would plummet, since the batters can look for it that much more often. The fact that the value (per pitch) only went down from +12 runs to +10 runs is a testament to the fact that it was that much of a tougher pitch to hit, at 3 mph faster.
And if you look at his changeup, the value of that pitch went up from +2 to +10 runs! If you double the frequency with which you throw one of your fastballs, then the change up is going to be that much harder to hit, and it was. Plus, the velocity of that change up actually went down (while the fastball velocities went up) AND he doubled the frequency of that change up as well! So it is amazing that the value went up so much.
Basically what I am saying is that increasing the quality of a certain pitch (in these cases, by increasing velocity) is only one part of the equation. If a pitcher changes the frequencies of all his pitches, then all kinds of interesting things might happen. A pitcher's overall effectiveness is a combination of his individual pitch quality (as determined by velocity, movement, location, and deception) plus the game theory aspect of pitch frequencies given the count, score, runners, inning, fielders, and batter (plus the ability of the pitcher/catcher to "read" a batter on that particular AB).
Good job Russell!
A few comments:
1) I agree that the 50 IP year II threshold is problematic, and I would not be comfortable drawing any conclusions without redoing the study while correcting this.
2) I looked at this as well a few years ago, and "concluded" that the Verducci Effect was merely regression toward the mean. Pitchers who had an increased workload tended to have very good seasons in year I. Very good seasons means they were "lucky" (as a group) and thus were going to to appear as if they got worse in year II due to regression. If one uses a control group, one must be sure that the control group was equally good in year I OR one must control for regression, perhaps by comparing actual and predicted (via Marcel or Pecota, etc.) performance in year II.
3) In testing this or any hypothesis, one must be careful to use "out of sample" (to the original hypothesis) data. In other words, if Verducci noticed his effect for players in, say, 2006 and 2007, one must remove that data (those years) in testing his hypothesis. Let's say that I "noticed" that in 2010, the HFA in baseball was much higher than usual, so I hypothesized that HFA was increasing in MLB (perhaps due to the decrease in greenie use). If I want to test this hypothesis, I cannot include the 2010 season, since obviously that will confirm it.
You mean pitching using a slide step? All pitchers throw from the stretch with runners on first, second, first and second, or first and third. Not sure what you mean...
This is a great analysis:
"*Even without an anointed closer, Leyland showed little inclination to mix and match. It made sense to leave Coke in to face Raul Ibanez for the first out of the ninth, but not to let him face Russell Martin and Alex Rodriguez, the next two batters. Leyland’s explanation for letting Coke face Martin was that “the numbers said [Martin] has not hit lefties that great,” which certainly isn’t what either his career splits or his 2012 splits suggest—Martin has hit southpaws far better this season. His explanation for letting Coke face A-Rod was that “Granderson was on deck, and you get a lefty for him.” Assuming Drew Smyly was available—he pitched two innings in Game One—it didn’t make much sense not to bring in Joaquin Benoit, Octavio Dotel, or Al Alburquerque for Martin and Rodriguez and use Smyly to get Granderson, especially with an off day on Monday."
When sabermetricians talk about "closer by committee" they mean using the best relievers, given the handedness of the batter or batters (and other characteristics like a pitcher's expected K% or GB%, when particularly needed), given the leverage of the situation. For example, Ben's suggestion for the 9th inning, use Coke for Ibanez and then one of the righties for Martin and A-Rod, is using a "closer by committee" correctly.
(And, BTW, if you bring in Benoit to face Martin and A-Rod, you can also use him for Grandy, since Benoit not only is an excellent pitcher, but has almost no platoon split.)
Leyland's idea of a "closer by committee" is to pick someone else to pitch the 9th and keep him in there regardless of the handedness of the batter, as long as he is "hot." I mean, wasn't that the intention of Leyland - to keep Coke in the game until he let up a run or a couple of base runners? And didn't he primarily start Coke in the 9th because he had successfully retired a few batters in the 8th (going with the "hot hand")?
So, let's not confuse the sabermetrician's concept of a "closer by committee" and that of a manager of Leyland's ilk.
One more thing: The primary disconnect with how a sabermetrician might use a pen and how a manager like Leyland might use one is in the notion of who matches up well against whom. Managers consider pitchers "good" when they are "hot" or rack up lots of saves and sabermetricians go by projections based on 3 or 4 years of performance. Managers consider lefty/righty matchups based on batting average for the current season, with no regard to regression. And finally, managers take great stock in batter/pitcher past results - for example, 4 for 6 against a particular pitcher, or 0 for 8, is deemed very significant for a manager (and will help dictate a decision) whereas the sabermetrician would pay no attention to any sample of batter/pitcher results.
So a manager like Leyland may try and bring in the best possible pitcher in high leverage situations but he really has no idea how to determine that.
From a pitcher's perspective, the SO is reduced in value (as opposed to a batted ball out, not a batted ball), since it is only one out. Also, the value of the walk is greater than in a neutral situation (for the batter) since it advances the runner. So, a pitcher would want to throw more strikes and pitch to contact, thus reducing both walks and K's (they typically go hand in hand from the perspective of a pitching approach).
And of course, like the sac fly, it is likely that pitchers are indeed trying to pitch a little lower in the zone to induce a DP but the batter is trying to keep it in the air to avoid the DP, both approaches cancelling one another out.
Then again, I don't think that pitcher trying to induce a ground ball and the batter trying to avoid a ground ball is as salient an effect as in the sac fly situation, where the batter is REALLY trying to elevate the ball, and the pitcher is REALLY going for a K and sometimes a ground ball, since a runner on third is a lot more important that a runner on first, generally speaking.
Colin, are you assuming that those 22 extra HR are coming from outs, or from singles, doubles, and triples? The reality, I think, is that they come mostly from outs but also from some singles and extra base hits (mostly doubles and triples).
Also, with a larger field, you have the outfielders playing deeper, which means a few extra singles to the short part of the outfield and base runners advancing more on balls hit to the OF, so moving fences in is not going to increase scoring as much as we might think based upon the extra HR.
In any case, if we assume that the fence change will increase run scoring from .11 to .20 runs per game per team, and we split the difference (.15), then we increase the PF for Safeco by around .15/4.40, or .034.
I have Safeco with a PF (run factor) of .9 going into this season, based on data since its inception. That is the second lowest PF in baseball (Petco is .83).
So .9 to .93 is not that much really. .93 will still be the 4th lowest in baseball, after SD, OAK, and SF, again according to my PF database.
It is not easy to make a park with a large foul territory (around the 6th largest in MLB), at sea level, and cool temperatures into a hitters or even a neutral park.
And it doesn't really suppress HR all that much. If you look at the parks database that Colin references (Seamheads), you will see an overall HR factor or around .9 or so, which is not so extreme. For the average full time player for the M's, that is less than 1 HR per year!
This is an interesting way to look at the development curve (correlations from one age to the next). I like it.
At the same time, don't these results mimic the standard aging curve, whereby there is a steep upward curve to around age 26, then a plateau (with a slight downward slope) until age 29 and then a fairly steep downward curve?
If we assume that every player has a similar curve only it shifts right or left, then we would expect the same correlations that you get.
Interesting that the mostly positive numbers indicate that the speed on the 0-0 count is significantly less than at the "average" count. I am not sure that I would have used the 0-0 count as the baseline rather than the average pitch speed across all counts.
Miller is one of the, if not the, most extreme pitchers' umpires in baseball, which basically means that he calls all sorts of pitches out of the zone, or at least those that most umpires think are out of the zone, strikes.
Nitpick: FIP went UP also, albeit by a smaller amount than ERA.
Jay, correct me if I am wrong, but I assume that the pool of pitchers in each group are not the same. In other words, some of the pitchers who got 1-5 or 1-7 saves did not go on to have any more opportunities and hence they were not in any other groups. Obviously anyone in a larger group was also in a smaller group.
If my assumption is correct, then your results are simply due to selective sampling, no? Some of the pitchers in group 1 pitches badly (blew a good proportion of their save opportunities) and were not allowed to have a save opportunity anymore, right? So, of course, the lower groups will have a lower save percentage. This tells us nothing about their talent withe respect to being able to close out a game. Nothing whatsoever.
So, if you are trying to find some evidence that experience matters, you are not going to, using this methodology. I'd have to say that this statement you made:
"The way many managers, mainstream media types, and even fans discuss the role, it would seem to, and to a limited extent, the data backs that up."
is false. The last section at least.
In fact, if you make sure that you use the same pitches in each group - in other words, if you worked backwards by only using the pitchers in the last group, you will likely see the opposite result - that these pitchers, as a group, who went on the get at least 22 or 26 save opportunities, were lucky in the beginning. You should find their save percentage in the first 5 or 10 games (group 1) to be higher than later on (the latter groups).
Isn't FRAA relative to that position, so shouldn't each position have a cumulative FRAA of zero, be definition?
If not, how do you measure defense NOT relative to a particular position? IOW, how would I know whether LF'ers were better or worse defensively than RF'ers without looking at players who played both positions?
Say we removed all LF'ers in MLB and replaced them with high schoolers who had plastic surgery to look just like the MLB'ers they replaced. How would we know that they were worse fielders than the RF'ers? Explain to me, please, how FRAA would know that they were worse?
Molto grazie Max! Fantastic, stellar stuff! Keep up the great work!
I don't see "Fair Ra" in the glossary anywhere. That appears to be an old glossary with terms that are not even used in your new spreadsheets, like all the "Eq" stuff...
Great job! It is nice to see BP so responsive to the needs/wants of their readership!
Max, are your spreadsheets (with all catchers and their numbers) for each category available? Thanks!
"...BB includes IBB but not HBP."
It does not appear as if it does. For most players, you can't tell from looking at the numbers, since most players have few IBB. But look at Fielder and Pujols. Fielder is projected with 92 BB+IBB in 687 PA. He typically gets IBB'd 25 times per season. That leaves 67 NIBB. That can't be right. Same thing with Pujols.
Also, you did not answer my question about singles and AB.
Downloaded the spreadsheets. For batters, there is no hits or singles column that I can find (I assume B2 and B3 are doubles and triples - is it too much trouble to use the letters D and T or 2B and 3B?). Am I missing it? I can almost infer them from the BA and PA but there is also no AB column and it is not clear if AB is PA-BB (there are no SH and SF and ROE, etc.).
Also does BB include IBB? What about HBP?
And finally, are all the numbers assuming player plays half his games in his home park? For minor league players, do the numbers assume he plays half his games in his minor league park as well?
Mike, great work, but I still think that the hit and run is virtually never correct and I think I found a fatal flaw in your analysis:
You see, Mike, I don’t think you can combine the two types of base runners. If you do, of course you are going to come up with a postive result (for the hit and run) as long as there are enough good base stealers in your sample (even if those were indeed hit and runs with those runners at first and not straight steals, although clearly some of your hit and runs are actually straight steals, especially with good base stealers on first).
Rather than compare the hit and run to no hit and run, you should be comparing the hit and run to:
1) a steal by those base runners and no hit and run.
2) no hit and run with a non-base stealer at first.
The ONLY thing we care about is what happens with a non-base stealer at first!
We already know the answer with a good base stealer at first. The answer is that a hit and run is better than no hit and run (with the base runner not going), BUT a steal with no hit and run is likely better than that!
So if the answer is that a hit and run with a non-base stealer at first is wrong and a straight steal (not all the time) with a good base stealer at first is better than a hit and run, then you would still get the results you are getting, but a hit and run would still be never correct!
Does everyone understand that?
"Would I want to pinch-hit for Hiroki Kuroda in the sixth inning down by a run so that I could get more work or innings for Matt Guerrier and Mike MacDougal? When the guys I'm batting are Trent Oeltjen, Dioner Navarro, Jamey Carroll, Justin Sellers, Eugenio Velez and others who primarily made up LA's bench last year?"
That is a good question. So let's orchestrate an answer. You seemed to have made your mind up already, based on what? Your gut instinct? Your gut has nothing to do with it.
You already have the batting line for an above average pitcher. Kuroda is a below average one, so that would have to be adjusted. You would then have to create a batting line for one of those fine batters you list as the Dodger PH'ers. Then you have to do some work to see the difference depending on the bases/outs state that you are contemplating. Then you would take that difference and multiply it by the LI, which is based on the score, inning, bases, and outs. Now you would need to compare that to the number of runs you are gaining or losing by replacing Kuroda with Guerrier or MacDougal for the average number of innings that Kuroda will pitch after he bats (I suspect around 1.25).
Have you done all that? If not, how would you know the answer to your own question?
One of the (many) ways that you have come to an incorrect conclusion regarding your own answer is by assuming that Kuroda is better than those two relief pitchers in true talent. When a manager has to make a decision like that, he obviously doesn't know how these pitchers are going to pitch for the remainder of the season, so he has to actually do a projection in his mind (or on paper I suppose).
Let's quickly look at a projection for Kuroda and Guerrier:
Kuroda 3.57 ERA
Kuroda 3.79 nERC
Guerrier 3.57 mERC
Hmmm. It seems like Guerrier is the better pitcher right off the bat. So, we would surely like to replace Kuroda with Guerrier when Kuroda is facing the lineup for the 3rd time, whether he comes to bat or not. So we don't really need to go through all of those calculations above...
Similar to what Tango says above, we also present evidence in the book that relievers in general can pitch more innings without giving up any performance.
Again, managers can leverage this strategy by using it less when their bullpen has been taxed lately and using it more when they need work or are not overly tired. Managers probably use this anyway as one of their criteria for letting the pitcher hit or not.
I suspect that the 3 principle reasons that managers would be resistant to this strategy, from most to least important (to them), are:
1) They think that their .221 thus far pitcher will continue to pitch at near that level.
2) They underestimate the value of the pinch hitter, especially in certain situations, like a 1 out bunt situations (BTW, an above-average hitting pitcher bunting with 1 out is a terrible strategy - even so with many 0 out situations).
3) They are fearful of over-taxing their bullpen.
Bullpen management is a legitimate concern as I state 2 times in the article.
Certainly the manager can add to the requisites for using this or a similar strategy (a quicker hook for the starter when he comes to bat in high leverage situation) that he has a reliever available who is better than the expected performance of the starter.
BTW, there are many examples where the starter who was allowed to hit was an awful starter (since the average starter in that bucket was .258 (TAv against), there are plenty of much worse ones), such that even a replacement reliever would be better. Surely in those cases, Bill's objection doesn't hold much water, again assuming that the bullpen in general is not overly taxed from recent overuse.
You typically don't call that a hit and run as many people have already pointed out, since, by definition, a hit and run means the runner goes and the batter must swing at any pitch other than one in the dirt.
It is simply the runner going in order to stay out of the double play.
The batter does absolutely nothing different, although one of the small negative consequences of a runner going on a 3-2 count, who is going to get thrown out more than the BE point for a steal (in this case, being down 2 runs in the 9th, the BE point is 90%+), is that the batter is forced to swing at more borderline pitches, since a K is now worth around 1.5 outs (more or less, depending on how often the runner is safe on a K) so the difference between a K and a BB is larger than usual, thus the BE point for how borderline a pitch has to be before you swing/take is different.
In a typical situation, you send the runner on 3-2 in order to potentially accomplish 2 things. One, stay out of the GIDP. Two, advance the runner an extra base more frequently on a hit (and occasionally two extra bases on a single).
As well, in a typical situation, the runner being safe some percentage of the time on a K (the ML average is around 52% on a 3-2 count), is also a plus of course (obviously the runner getting thrown out more than cancels that out).
In this case, advancing the extra base or the runner being safe on a K adds very little win expectancy, since his run means nothing.
The advantage of him running, therefore, which is obviously known by LaRussa (as much as I deride his intelligence), is staying out of the GIDP ONLY.
Again, the downside is that you increase the DP on a K of course, you add a few line drive DP, you perhaps distract the batter, and you force the batter to (correctly) swing at some more borderline pitches.
So which is better?
Unfortunately (for all those who posted here, on The Book blog, on FG, BBTF, and everyone else around the country with an "opinion"), you cannot figure it out without "the numbers!"
No amount of explanation, common sense, logic, or baseball experience or acumen will enable you to figure out which is correct - run or not run.
On the other hand, it is simple to do the math and figure it out. Are there some variables that we don't know or we cannot quantify exactly? Yes, as always there are. Does that preclude us from coming up with an answer? As usual, no it does not. Why? Because we can always set some upper and lower limits on the variables we are not sure of, such as the distraction to the batter.
Now here is the important part:
If it looks like the answer is "Run" at the upper and lower boundaries, then the answer is clearly "run." If the answer is "don't run" at both boundaries, then the answer is clearly "don't run." If the answer is "run" at one boundary and "don't run" at the other, then you can flip a coin or argue until you have to go to the bathroom.
From what I have seen of the numbers, the answer is probably "run." I have not done the calculations myself, and I have not seen rigorous ones.
Again, if you want to argue, please argue with numbers and not with rhetoric, opinion. emotion, voodoo, or snake oil. This particular decision, like most, cries out for "numbers." And as it turns out, it is relatively straightforward and easy to do. Again, no amount of logic or baseball experience is going to get you the right answer (other than accidentally).
Colin, great stuff. Minor point: The numbers are apples and oranges, of course, because the pool of pitchers faced, and other "environmental" factors, are going to be different (e.g., when the pitcher bats, he is likely facing the opponent pitcher early in the game, whereas when a pinch hitter bats, he is likely facing a reliever late in the game), but your point is well taken and clearly explained.
The two points are, one, you give up a substantial RE/WE letting your pitcher bat late in (or even in the middle of) a game in a high leverage situation (the LI was 2.51 in this game), and two, you likely don't even save/gain anything by allowing your starter to remain in the game, especially when you have lots of good reliever options (as in the post-season) and you don't have to worry about taxing your bullpen.
My rule has always been this: Always pinch hit for your starter after the 5th inning in a high leverage situation (the high leverage takes care of the fact that the game is likely close and there are likely runners on base) unless your starter is elite (like a Halliday or Lee), and even then, still pinch hit if the inning is past the 6th or 7th, unless a sac bunt is a viable option, such as you are down a run, up a run, or the game is tied, and there is a runner on first or second with 0 out.
We would really like to know these various managers’ SB%, preferably in the different score/bases/outs states. Knowing their SBA rates, while it tells us how often they like to run, tells us nothing about the value that that particular philosophy provides to their team. You would also have to wonder how each manager’s preference for the hit and run affects the SBA rate…
"In the end, it seems that once a player hits the majors, the book is out on him. Certainly there could be cases where batters suddenly stop getting pitches grooved down the middle, but in general it's not happening often enough to show up in the data. As a whole, each player tends to see a similar approach throughout the entire season, rather than a sudden—or even gradual—in season adjustment to the swinging tendencies."
Let's be careful about these generalized conclusions! You are looking at only a coarse and specific "approach" by pitchers - namely pitch location distribution and even then only as measured front he center of the plate or in and out of the zone. There are all other kinds of "approaches" that may change against rookies as the season goes on. The one you typically hear about is the percentage of fastballs or off-speed pitches thrown. Even then, one would have to be careful about lumping everyone together as if one group of players were thrown more off-speed as the season goes on and another group were thrown more fastballs, overall, the percentages would not change, as one would offset the other. Same thing actually with your study. What if as the season went on, some rookies were thrown more strikes and others were thrown less, depending on their perceived strengths and weaknesses?
Good stuff Rob. I love that you included a control group because the first thought that came into my mind was what if Pecota under-projected all such similar players.
In any case, I wouldn't discount such small effects. You seem to trivialize them in your closing remarks. It is so hard to find ANY effect what it comes to these things, that when you find a small effect like this, you should celebrate. Of course, the difference between the control and subject group might not even be statistically significant (I have no idea of the standard error), but even if it isn't, that doesn't mean of course, that it is not real. It just means that we are less certain that it might be.
Of course, there is reason to think that it is real. If a player declares that is in the BSOHL that generally means that he is not injured. So all other players (including those in your control group) probably have some injuries and it is not surprising that their performances is less than their projection. And also not surprising that an injury-free subset of players (BSOML) slightly outperforms their projections, since any projection engine if designed for all players on the average, including those with current and past injuries.
As well, for a player to declare that they are in the BSOML suggests that they may have had a past injury. Unless Pecota accounts for that (likely does not), you also expect any player with past injuries but currently healthy to outperform their projection...
I would start by separating day and night games for a start, although, as I said, fatigue in day games might be a confounding factor.
Otherwise, just run some kind of regression on pitch speed and temp controlling for the other factors of course.
As I personally hate regressions, I would simply split each pitchers games into 2 groups for day games and 2 groups for night games - below a certain temperature and above a certain temperature. You would have to do it for home games, otherwise the stadiums would be a confounding factor (the warm games would tend to be in different stadiums than the cold games).
If you did that, you would have to control for time of the year as well, otherwise your warm games would mostly be in the middle of the season and the cold games at the beginning and end, which could be a confounding factor as well.
Shouldn't be too hard to find a way to see effect of temp on overall velocity and the trends during a game.
Didn't someone have an article a while ago on pitch speed and temp? Also, someone can Alan Nathan could tell you the physical effect of changes in temp on pitch speed due to the differing air density, but that would not include any effects on the pitcher (like being looser in warmer weather or less fatigued in colder weather, etc.).
If you are asking me (MGL), I don't know. That is what I am asking!
Jeremy, does the chart which shows a general decline after a short uptick in the first 20 pitches control for the pitcher? If not, then the chart is suspect since there is likely a different pool of pitchers at each pitch count.
Also, what effect do you think temperature has on this? It would be nice to separate out day and night games to see whether you can tease out the temperature effect, although fatigue in day games might come into play also.
Great stuff as always Alan!
I agree with Tompshock and disagree with I75Titans. Allowing a team to store their balls in a climate controlled situation is just asking for them to cheat. It is too easy to do. If you allow all teams to do so, there will be a fair percentage who will cheat from time to time, whether they get caught or not. It shocks me (sort of) that MLB did not oversee the use of the Rockies' humidor from the outset. The other teams should have insisted on it.
Alan, supposedly in 2005, they started storing the balls for a longer period of time and perhaps removing them closer to game time. You will see two reductions in scoring/HR, I think. One, in 2002, and another in 2005. So I think the data should be bifurcated.
I was waiting for you to compare the 5th year performance to what would be expected from a Marcel or other projection. But, alas, it was not there. In similar studies by Tango and myself, we found that the performance after the drop off year was about that expected, a weighted average of years 1-4 with an age adjustment. If that is true, that pretty much ends the discussion, doesn't it? And also makes any of these types of discussions and speculations pretty mundane. No matter what players do in terms of breakout years or extremely down years, or anything in between, our best estimate of their performance in the future (post-season, next year, last week of the reg season, etc.) is simply a Marcel-like projection? If that is the case, which I think it is, we would have nothing to write about, huh?
You have a fatal flaw in your methodology:
"This would likely remove the selection bias, since a couple weeks right before the break are unlikely to affect whether an individual is asked to participate in the Derby (especially because the participants are selected about a week before the All-Star Game)."
While any period of time AFTER the selections are made should be unbiased, ANY period of time whether it be one week or 3 months BEFORE the selections will be a biased sample of players who have gotten lucky.
Well, I just read Chuck Brownson's article at BTB. Here is what he does:
"I then subtracted his actual stolen base attempts from the expected number and multiplied it times the run value of the SB (0.175) times the likelihood the runner would be caught stealing (CS%)."
That is completely wrong of course. For some strange reason he is assuming that all catchers "should" have the same SB attempts (per inning I guess) with their own CS% and then crediting or debiting them the difference between what they "should" have and what they do have. That is ridiculous of course. For example, if a catcher is 1/2 in 900 innings (around 100 games), he is going to assume that they "should have" been 35/70 (or so) rather than 1/2 and he is going to give them 12.25 "rep runs" or so, which makes no sense. None whatsoever. Where did he save 12 runs? Similarly if a catcher is 3/3 (0 CS%), he is going to assume 70/70 and dock him 10.5 runs or so. Again, makes no sense.
Eric, you should not have even mentioned "rep runs" in your article. You devote an entire paragraph to it, clearly implying that that it is part of a catcher's value. You could have mentioned that some catchers are so good that no one runs against them and therefore, paradoxically, they derive little or no value from their good arms, but you didn't. In fact, you implied that these catchers have more value than is being captured by the traditional SB/CS numbers, which is not true. If you didn't mean to imply that, why would you have even mentioned it, let alone devote an entire paragraph to it, based on an obscure and incorrect methodology by someone whom I have never even heard of?
Right you just referenced the article. I was just wondering what the methodology or rational was. You seem to be familiar with it.
You say, "it matters little." It matters none, at least as far as quantifying the catchers' value. In your example, the 30/100 catcher probably allows zero net runs or so (assuming an overall 70% BE rate) and the 15/30 allows maybe +3 runs. The catcher who is 0/0 has zero net runs of course. And, as I said above, those numbers have to be further adjusted by the net runs allowed by the average catcher if we want to compare catchers to the average catcher although there is no great reason why we have to do that (sum the league to zero).
Colin, very nice job pointing out the process of cherry picking (it is a controversial and complex issue).
Two interesting things about "flukes" are:
When one observes a fluke, often part of the fluke is that the player looks awful. In other words, players (pitchers and batters) go through fluctuations in which they look and perform awfully. We call them "flukes" or "luck" or randomness, even though technically they may not be, just like the landing of a coin on heads or tails, is not technically a random event (it is a function of how it is thrown), it is properly treated as such. So, to declare that something is not a "fluke" (in statistical terms), because you saw the player and "things just look different" is not good reasoning or logic. Things may in fact BE different about that player's approach or technique or even health (or his psychology, such as WRT the beaning) for some period of time, but if there is little we can do to predict when it will start or end (and I am not saying that we can or cannot), then for all practical purposes, it may be treated as random, luck, or a fluke.
The other thing is that declaring something as either a fluke or not is a false choice and a large one at that. There an in infinite number of combinations of fluke and non-fluke that can describe or explain a spate of performance.
"Sorry, but I don't think that is necessarily the case."
It IS the case, and I am glad that Tango chimed in since I thought that I was the only mad one in a sane world.
drawbb, outfield arms get valued by the number of advances and the number of assists (the OF throwing out a runner on the cases), as compared to the average OF at that position. If no one runs on you, you get credit for no advances. If players run on you, you get credit when they don't advance a base, when you throw out a runner (more credit than a non-advance obviously), and demerits when runners advance the extra base.
So everything is accounted for, without any special adjustment or calculation for "deterrence" (runners not running on you).
It is exactly the same with catchers. You get credit for throwing a runner out (and pick-offs), and you get demerits for allowing a stolen base (and throwing errors on a steal). And everything gets compared to the average catcher. There is no need to adjust for "deterrence." If no one runs on a catcher, nothing happens, just as if stolen base attempts were not allowed, like in Little League.
The way the "adjustment" occurs, if you even want to call it that, is by comparing everyone to the average catcher in the final step in the computations. The catcher that no one runs on gets exactly zero net runs, but if the average catcher has -2 net runs per season (IOW, all base runners combined generate net positive runs), then the catcher against whom no one runs, gets +2 runs in credit. Interestingly, if base runners generate net negative runs (ran too much), which they probably did for many years up until the last few years, those catchers with great arms and zero net runs (before the league adjustment) would have to be credited with net NEGATIVE runs, a little bit if a logical anomaly.
The answer to that, by those good catchers that no one runs against, is that if runners want to generate net negative runs by running too often (and/or possibly at the wrong times), then these catchers need to actually "bluff" the runners a little by not showing such a great arm and encouraging them to run a little. If that is the optimal strategy for them, and they do not do that, then they indeed deserve to be charged with net negative runs even if they have great arms and no one runs against them. That is because in baseball, as in most sports, it is not only athletic talent (like a strong arm) which creates value, but good strategy as well.
So, I would like to hear from Eric (or others) and have them explain to Tango and me what this "deterrence adjustment" (in quantifying catcher value) is all about, as it makes no sense to me.
I still don't understand why you need a "deterrence" adjustment factor, or whatever you want to call it. If a catcher allows no SB attempts because he can throw the ball 1000 miles an hour, then you simply compare him to the average catcher. If, in fact, runners run too much such that the average catcher saves his team some RE or WE, then the catcher against whom no one runs is a liability, right? Regardless, I don't see how there is any need to make any adjustments for catchers who deter the running game. We know the value of a SB and we know the value of the CS. So you simply take each catcher's total SB and CS, multiply each by their respective values and that is the value of the catcher's arm. What difference does it make how many SB attempts the catcher allows? That will be included in the calculations. If an average catcher costs his team 1 run per 150 games, then the catcher against whom no one runs is worth 1 run per 150 games, relative to the average catcher. Again, no need to do any separate calculations or adjustments based on whether a catcher "deters" runners or not, as long as you normalize every catcher to the average catcher, which everyone is going to do of course.
In addition, the likelihood of lots of typos, spelling, grammatical, and syntax errors in the above post, is very high. I am sleep deprived...
That is a good analysis, Eric. I think all of your assumptions are reasonable - or at least as reasonable as they can be, given that we really don't know any more than what we know.
You are assuming that if he stopped switching that his platoon ratio would be a little larger than the average LHB. You are correct in that you HAVE to assume that any switch hitter would have a larger than average platoon ratio. If they had a small one, they would less likely to switch hit. If you decrease the chances of a small one, the best estimate of his ration has to be larger than average. Plus, certainly if he stopped switching, he would be "rusty" when hitting lefties from the left side. And 1.21 is not that far off from 1.15. Perfectly reasonable if not TOO conservative.
The other assumption you made, which is not really an assumption but a mathematical requisite, which is that he really has a true platoon ratio of 1.19 a switcher rather than his observed rate of 1.29. You HAVE to make that assumption of course. We have to regress that sample 1.29 the appropriate amount, as we do with any sample stat. After all, your goal is to determine whether Berkman would likely perform better IN THE FUTURE as a lefty only or a switcher. To do that, you have to compute a projection for his switch hitting platoon split, which necessarily requires the regression that you did. So, the 1.19 is completely the correct thing to "do" (as much as you can "do" a number!).
So your conclusion that, "in the future," his mostly likely OPS versus left-handed pitching is .851 as a switcher and .839 if he hit as a lefty all the time, is 100% correct. Of course there is great uncertainty around those numbers and you can probably compute the standard error of the difference if you really wanted to (which you probably don't!).
Plus, there is enough room between those numbers such that even if some of your assumptions were "wrong" (the one which is really "pie in the sky" is the 1.21, but as I said, that is conservative anyway), it would still be likely that he is better off switching.
Then again, given that he IS switching despite having a large observed platoon ratio, that is VERY strong evidence in and of itself that he is better off switching. Remember that most of these analyses are really Bayesian, and in this case, the a prior is strong in favor of the likelihood that he is making the correct decision in switch hitting...
Eric, there are a lot of nuances here that are tricky to work with.
For example, why you wouldn't just compare a hitter's poor opp handed TAv (or whatever stat you want) to his expected performance given how he hits from the other side and assuming a somewhat worse than league average platoon split (the assumption is that one of the reasons a hitter switches is because his splits would be large if he didn't PLUS he is not used to hitting from the same side as the pitcher).
So, for example, take Berkman's numbers against righties, and then estimate what he would do if he hit lefty against lefties and then compare that to how he does not (batting righty versus lefties).
Of course, there are several problems with that. These are selectively sampled players (for having large splits as a switcher). Therefore we can assume that they got lucky from one side or the other (their best side), so using the numbers from that side to estimate what they would do from the same side against same-hand pitchers is going to yield too optimistic of a result.
If we use both sides to inform us as to a hitter like Berkman's overall talent, by taking his switch hitter split and regressing it toward the average switch hitter's split, then we can estimate what he would do versus lefties and righties if he stopped switch hitting and then compare that to their bad side now. I think that is what we did in The Book.
If you did it that way, I think you would find that Berkman, for example was right on the cusp as to whether he should switch hit or not. For most marginal examples (like Berkman) the lynchpin is probably how big their platoon split would be if they didn't switch - only them and their coaches know that...
"If a pitcher, however, throws too many balls in the zone, given his stuff, it will turn out to be bad advice."
Oy, that should be "too many STRIKES..."
"The batter and pitcher have "choice" as to the equilibrium point."
That should be "no" choice...
Brian, the equilibrium point does NOT change based on the pitcher's proclivity to throw a certain percentage of first pitch strikes and the batter's tendencies to swing at first pitch strikes and balls. The batter and pitcher have "choice" as to the equilibrium point. That is determined mathematically and is based on the pitcher's overall ability in terms of his stuff and his control and the batter's overall ability to hit, take, and swing at all the different pitches in all the different locations.
The trick is for the batter and pitcher to figure it all out. And of course of the batter is not acting optimally, then the pitcher's optimal strategy is not equal to the Nash equilibrium point and the same thing applies to the batters if the pitcher is not acting optimally.
Russell this is great stuff. The point about the pitching coaches, which is lost among these comments, is a great one. That is, the axiom, "Get the first pitch over" or, "The most important pitch is strike one," like all one-size fits all edicts, is dumb and is counterproductive toward guiding pitchers toward an optimal strategy.
Now, if a pitcher happens to be throwing too many pitches out of the zone, given his abilities, then obviously this advice will turn out to be good in general. If a pitcher, however, throws too many balls in the zone, given his stuff, it will turn out to be bad advice.
But, the worse part is that it ignores the different optimal strategies that different pitchers should have overall, and even worse, it ignores that fact that each pitcher should have vastly different first pitch strategies against different hitters.
On the first point, that different pitchers should have different first pitch strategies, for example, a pitcher with good in the zone stuff should be much more likely to throw first pitches (at any count actually) in the zone.
On the second point, to a hitter like Vlad, all pitchers should be much more likely to throw a first pitch out of the zone.
Anyway, this is a great example of where "scouting" (in this case tutelage from a pitching coach) PLUS analysis (game theory) equals the best strategy for teaching pitchers how to pitch optimally. BTW, the analysis applies much more to pitchers than to batters. For batters, the big "hitch" in using game theory to suggest optimal approaches is the batter's comfort zone. If you suggest to a batter that the correct strategy for Vlad is to take more first pitches, even though that might be correct on "paper" (given all of his various abilities), he simply might not be comfortable with any other strategy but the one he is using, and his performance might actually suffer even though on paper it should get better. Not so much with pitchers, and that is primarily because they "get to go first" and for batters their strategy is always ultimately reactive rather than pro-active, which it is for pitchers.
Tim, great stuff! We talk about run environments affecting strategy decisions, based on RE and WE matrices all the time, but this is a great attempt at actually quantifying it as well as assigning an actual run environment to a particular class of pitchers and teams.
One thing I think you messed up:
"70 percent successful attempt (batter is out, runners advance one base), 23 percent unsuccessful attempt with either the lead runner thrown out or a strikeout, 3.5 percent throwing error by the defense that allows runners to advance more than one base, 2 percent double play, 1.5 percent fielder’s choice that results in all runners being safe."
Where are the hits??
According to The Book, all sac bunt attempts result in singles 12% of the time.
The actual breakdown (again, overall - it much depends on the batter and the inning, where the inning is a proxy for how much the defense is expecting the bunt) is this:
Batter out, runner advances: 48.4%, not 70%
FC, both runners safe: .6%
An out, no runner advance: 26.2%
A hit: 13.4%
And a few other various and sundry outcomes.
Those numbers include when the batter gets 2 strikes and ends up swinging away. If we just look at actual bunt attempts all the through the PA, the numbers change slightly.
I would guess that Russell's data and analysis would show slightly more predictive value for cold streaks, for the reason mentioned (injuries and the like). But, the noise is so great (small samples of 10 PA, 25,and 10 PA) that the signal (a few players who are playing injured) to noise ratio is going to be very, very, small.
In fact, the very small predictive value that he did find for hot and cold streaks could easily be explained by parks and weather, as well as injuries...
"You're absolutely correct if the overall goal is to literally specifically investigate just Garland and Pineiro, but this is more of using them as archetypes to discuss what teams (and what situations for teams) should be looking into guys with certain skill-sets. Garland is considered by many to be Mr Consistency while Pineiro is the crazy volatile Tracy Jordan character from 30 Rock."
Guys with certain skill sets? How about if we determine FIRST whether the distribution of expected performance IS a skill before we talk about which kinds of teams would benefit from which kinds of distributions.
Now, I am not saying that a pitcher who has been consistent in the past (e.g. Garland) is NOT more likely to be consistent in the future (i.e., that the distribution of his expected future performance is likely to be narrower), and vice versa, but you sure implied that we know. And contrary to your quote above, you sure implied that Pineiro is likely to have a wider distribution of expected performance than Garland for 2010. Again, do you know that that is true?
And, as I said in my previous comment, you looked at the aggregate performance in Year Next of pitcher who were inconsistent in prior years (actually, those who had an "uptick" in Year Next minus one). Don't you think you should have looked at that distribution of performance among all those pitchers and then compare that distribution to pitchers, like Garland, who were relatively consistent for several years?
If you found no significant differences between the two groups, in terms of that distribution, your article would be moot, wouldn't it? Again, I don't know what you would find (and I actually expect thee to be differences), but until I know one way or another, it's lind of a leap to be talking about pitchers like Garland or Pineiro might benefit some teams more than others even though their weighted mean expected value in 2010 might be equal. Isn't it?
You made an assumption which is not necessarily correct (as far as I can tell from reading the article) and then when you had the opportunity to investigate it within the historical data, you didn't.
The assumption you made is that a pitcher who has been consistent in the past (e.g. Garland) will have a distribution of expected performance in the subsequent year that is much narrower than a pitcher who has been inconsistent in the past (e.g. Pineiro). IOW, what evidence do you have that "we know what to expect from Garland" but that Pineiro's 2010 performance "might be all over the place" even if we expect them both to have around the same weighted mean performance?
To couch the question one more way, how do you know or even suspect that this is true:
"Garland’s expected value might break down as 1.8 wins coming 20 percent of the time, 60 percent delivering 2.1 wins, and 20 percent 2.4 wins. In contrast, Pineiro checks in with a spread of 0.5 wins 20 percent of the time, 20 percent yielding 1.3 wins, 20 percent of the scenarios producing 2.1 wins, another 20 percent with 2.9 wins, and finally 20 percent generating 3.7 wins."
When you looked at the history of pitchers with "inconsistent" performance like Pineiro, you should have looked at the distribution of their performance in the year following the uptick and compared that to pitchers who had similar career profiles as Garland. We need to know if there is any difference in terms of the spread of that distribution, and if yes, to what degree. If the answer is "little or no difference" then your whole thesis falls apart, right?
"If I only had five observations per year, then I'd probably get a lot of random variation and so not a lot of consistency within managers over the years."
Do you mean managers with 5 SB opportunities or 5 managers per year? I am talking about the former, of course, when I am talking about sample size. The number of observations will NOT affect the correlations, only the standard error.
You always say, "Think of an ICC as like a y-t-y correlation." But, as I originally said, the magnitude of a y-t-y correlation specifically depends on the number of "opportunities" in each year and without knowing that number, it means nothing. If I regress OBP on OBP from one year to the next, and I only include players with 100 or less PA each year, I might get a correlation of .25. If I only include players with PA greater than 400, I might get .60. So just saying, "My y-t-y 'r' for OBP was .5" means nothing unless I know the number of PA per year in my sample. (It is also nice to know the number of players or "observations" as that will help me to figure my standard error around the correlation.)
So if I have bunch of players in a bunch of years, and you tell me the ICC for OBP, again, that means nothing to me unless I know the range or distribution of PA in the sample, right?
Maybe I have it wrong. Maybe the ICC is sort of a combination of "r," as when we do a y-t-y "r" and the underling sample size. For example, if you have a bunch of players with samples of 400 PA and you do an ICC for OBP and you have a bunch of players with samples of only 100 PA, will you come up with the same ICC?
Pizza, we may have discussed this before in another venue, but since "r" is always a function of (the underlying) sample size (not the number of pairs in the regression), in your intra-class correlations, how do we/you know the sample size associated with your "r"? For example, if I were working with the same data you are, and I regressed first half on second half, I might get an "r" of .4, if I regressed one whole year on another year, I might get an "r" of .5 or .6, if I regeressed 5 years of manager data on another 5 years, I might get .8, etc. In this instance, you mention that the "r" was .538. Without knowing how many games (or steal opportunities or whatever the "unit" is) that represents, I have no idea whether .538 is "consistent" or not.
To those people that don't understand the whole issue of survivorship, I want to reiterate the fact that we are not really trying to include players in our resultant trajectories who do not actually play.
We are merely trying to balance out the players who do play a Year II, because they will have tended to be slightly lucky in ANY year (Year I) that is followed by a subsequent year. And when we use the delta method, we are only including player season in which there is a Year I and a Year II, so all of the players we are including will tend to show a false decline (or a false not-so-large increase) in ANY pair of years.
In order to account or adjust for that, we include ALL players, even the ones who do not get a Year II (for any given Year I), by creating a phantom Year II and using a Marcel-type projection for their Year II performance (which doesn't really exist). That way, we can simulate a random, controlled experiment, whereby all players are forced to play at least one more year at any age. That would be the only way we could really ascertain true aging curves and peaks - by either forcing all players to play until they are 40 years old or so, or by at least forcing all players to play "one more year" whether they were allowed to or not (and then use the delta method because we have players who have played only 2 years, 3 years, 5 years, 10 years, etc., and we want to include all of these players, unlike JC).
Actually forcing all players to play until they were 40 or so (and starting them in the majors when they were 20 or so) would not give us a very good answer either. That would answer the question, "What is the average aging curve look like for all players who had some time in MLB and were allowed to play until they were 40 regardless of how well they aged or how well they played?" That would be sort of the reverse of JC's sample, but equally biased.
Forcing players to play one more year, which is essentially what I am doing when creating those "phantom Year II's," creates a little bit of bias as well, because there are reasons why these players do not get to play in Year II other than the fact that they got unlucky in Year I (although that is definitely part of it for some of these players), but it is a good method to balance out those slightly lucky players who do get a Year II at any age. And actually using the "5 runs worse" method of regressing that JC does not like is actually a good way to counteract that bias.
So using all players AND creating phantom Year II's for non-survivors, and then using the delta method to construct an aging curve, I believe is by far and away the best method of answering the question, "How does the typical MLB player age?" where "typical" means all players combined, from the ones that have a cup of coffee to the ones who play for 5 or 6 years, to the ones - as in JC's sample - who have long and illustrious careers.
Phil, yes, absolutely. I just went back to my program and changed one number in the code. That number is the mean towards which I regress the non-survivors in their Marcels. I changed that number from -5 to 0 and also to -10. It does not change the peak age or the trajectories very much at all. It is still 28 in the modern era (I arbitrarily define that as including any player season after 1979 in my data).
As I said, JC's criticism of the "5 runs worse than average" turns out to be a red herring, as whatever I use does not significantly affect the results.
Really, the survivorship problem is not as large as I thought it would be. If I do not include these players (who do not have a Year II), and thus my remaining players are a little lucky in all of their Year I's (that is the problem with not including the non-survivors), there is essentially a plateau from 27 to 28 (a .1 run increase from 27 to 28 actually).
Once I include the non-survivors and use the "5 runs less" for the mean that I regress towards, the 27 to 28 interval shows a .4 run increase (rather than .1 run without the non-survivors).
If I do not use "5 runs less" for the mean (I simply use a standard league average), that 27 to 28 interval is now a .5 increase rather than .4.
If I were to use "10 runs less" for the mean rather than "5 runs less", I get a .3 run increase for that same interval.
So, the peak age and overall trajectory is not very sensitive to the mean I use for the regression in the projections for the non-survivors.
Well, JC wanted a response to his criticism on that topic and I have provided a very adequate one I think.
"The projection is their last three years lwts per 500 PA, weighted by year (3/4/5) added to 500 PA of league average lwts for that age minus five. In other words, I am regressing their last three years lwts (weighted) toward five runs lower than a league average player for that age.
While Lichtman believes using five runs below average generates a "conservative" projection, the substitution is just a guess informed by nothing more than a hunch. In this case, the guess imposes the outcome for the exact factor we are trying to measure: the estimated decline is a pure product of the assumption. Thus, it is no surprise that Lichtman's adjusted delta-method estimates yield results that differ little from his raw delta-method estimates."
JC completely mis-characterizes or does not understand what I was doing.
He seems to imply that I am assuming a 5 run decrease for all of the "one-year" players (those who do not get a Year II).
I am not. I am assuming a Year II performance equal to a basic Marcel projection. While aging is or should be a part of a Marcel, so that it is true that I have to make some aging trajectory assumptions in order to construct the projection, the "5 runs worse than average" is the mean that I am using in the regression that is part of the Marcel (the projection). That is completely different than assuming a Year II which is 5 runs worse than Year I. That would be ridiculous. And that is what JC is implying that I am doing, I think.
Normally a Marcel regresses toward a mean which is the league average performance of similar players (age, size, etc.). The reason I used a mean (to regress towards) that was runs worse than a "generic" mean was that these players who do not see the light of day in Year II tend to be fringe players and therefore the means of their population are likely worse than the means of the population of all players.
In fact, if anything, I think I used a conservative (too optimistic/too high) mean. I contemplated using a mean which was 10 runs worse than a "generic" mean (mean of all MLB players).
Interestingly, as you can see from the charts in my articles, even using a "low" mean when doing the regressing, all of the players' projections in Year II were BETTER than in Year I until age 30. So these players actually showed a "peak" age of 29 or 30 (it is not a "true" peak because Year I is an "unlucky" year), which pushed my overall peak age slightly forward.
The most important thing is that whether I used a typical MLB mean for the regressing, 5 runs less than that (as I did) or even 10 runs less than that (which, as I said, may have been even more correct), it would not have changed my results. So criticizing that aspect of my work cannot indict the conclusions generated from that work.
Guys, in the pitch f/x data, did you control for that quality of the batters and for pitch count? IOW, is the control group matched up, batter-wise and pitch count-wise, with the non-control group, in the pitch f/x data, as it is in the "outcome" data? As you said, pitchers will tend to hit when the bottom of the lineup is coming up the next half inning and when their pitch counts are not high, at least late in the game. Early in the game, pitchers will tend to hit in the same inning as the opposing pitcher will tend to hit. So it is really critical to control for the quality and even identity of the batters, as well as pitch count.
The funny characters occur when I cut and paste my comments from another web site. I don't know what that happens.
It is not an ad hominem attack because it has nothing to do with my discussion of the substantive issue. An ad hominem attack or argument is when an “attack” on a person is represented as or is a diversion from a substantive argument. It is certainly OK to state, “And by the way, here is a comment on something else that the person said, which has nothing to do with the argument at hand.”
Did I have to explicitly say, “Warning! Time out. This next comment I am about to make has nothing to do with the argument at hand. Nothing at all. It is a side-bar.”
I’ve spoken my peace (or is it piece?) on JC’s research and on aging in general, and I don’t really have anything more to say on either issue, otherwise I run the risk of being even more redundant and repetitive than I already have been. And as always, I could be wrong on one or more of the assertions that I have made. Not to mention the fact that there is a lot of muddy water and gray area in this particular topic.
Doesn't this study boil down to the following statement?
"Players who play longer than average have later peaks than average."
And isn't that almost begging the question?
Yes and yes.
Not only do players who play longer have later "true" (underlying talent) peaks, but their observed peaks (which are not necessarily the same as their "true" peaks - for example, a player might have his best season at age 22 or at age 36) are also going to be later than their true peaks. It is sort of survivor bias in reverse. Players in JC’s sample tend to have gotten lucky all along the way, pushing the peak age forward as they do so and flattening out the post-peak part of the curve as they go forward in their careers.
"I especially appreciate the positive feedback."
When I write an article or do research, I especially appreciate the criticism. I have nothing to learn from the "pats on the back." But that's just me...
I should have said that the aging versus injury issue is a "red herring" rather than a straw-man argument, but my point is still the same...
As well, his injury versus aging issue is a straw-man argument. That is a completely separate issue (and a complex one at that) from the flaw in his study, which is generalizing a very small, biased sample of players to that of an average or typical MLB player.
One more thing. I have been trying to find where I made the comment, but I can't. I quickly looked at players who had played at least 5 years (I think) prior to age 27 and accumulated at least 2000 (I think) PA. So basically they are full-time players at the beginning of their careers.
JC's players are a subset of these players. Some of these (5/2000) players will go on to amass 10 years and 5000 PA and some will not. When I looked at all of these 5/2000 players going forward, I found a peak age of 27-28 and the same basic overall trajectory that I found for ALL players using my delta method corrected for survivor bias.
So obviously the subset of these 5/2000 players who do not make it to the 10/5000 level (JC's sample) peak earlier (and probably have a steeper decline after their peak) than the players who do, as you would expect.
Just more evidence that JC's sample is a biased set of players who peak later than the "average" player as well as later than the full-time player with 5 years under his belt.
Again, what purpose does it serve to determine the trajectory of this very small subset of players and why does JC refer to this trajectory as that of the "average MLB player" rather than a very small, biased, subset of players who play MLB?
I really don't understand his point and why this article is even called "How do baseball players age?" as if his very small subset of players represents the average or typical player. And why does JC think that his results are in opposition to that of myself, Tango, Bill James, and others who looked at ALL players and not just those who played for at least 10 years with at least 5000 PA? Obviously the smaller the subset of players we look at after the fact, and the longer and more prosperous the career, the later the peak age we will find and the shallower the curve after that peak, almost by definition. What is the point, I ask for the umpteenth time?
I use my aging research in order to help us with projecting player performance. Most or all of the other analysts that JC criticizes do the same, or at least that is implicit in their work. You can't possibly do that with JC's data.
I think that this is JC against the world on this one. There is no one in his corner that I am aware of, at least that actually does any serious baseball work. And there are plenty of brilliant minds who thoroughly understand this issue who have spoken their peace. Either JC is a cockeyed genius and we (Colin, Brian, Tango, me, et. al.) are all idiots, or...
As some of you know, I also wrote a two-part article on aging on THT. JC, in this article, references that work. Here is a comment I wrote on that site which I think aptly sums up this issue:
As I reiterate throughout the two parts, there really is no one-size fits all aging curve. And in practice, you are better off addressing each player on a case-by-case basis.
For example, if you have a 31 year-old FA that you are thinking about signing, you would want, at the very least, to look at the aging curves of similar players during the modern era - for example, full-time, 30-32 yo players who have played for X years already.
None of the generic curves I discuss, or JC’s, or anyone else’s, will be much help. You have to look at specific aging patterns for similar-type players, including such things as body-type, speed, injury history, etc.
In addition, a player’s own historical trajectory might give you some idea as to his future trajectory.
Really, the only 3 things you want to take away from this article, including the second part a-coming, is:
One, if you look at players who have already played for 10 seasons and many IP, as JC did, of course you get a very different aging curve than you would expect from any player before the fact, even in the middle of their careers. To extrapolate that to all players, most players, or even the “generic” player, sans the very part-time ones or the ones with very short careers, is ridiculous, as Tango, Phil B., and many others have already stated.
Two, the modern era appears to have a significantly different aging curve, probably for all players. i arbitrarily defined the modern era as post-1979, but it could be anything really.
And three, if you absolutely have to answer the question, “What does the average aging curve look like for MLB players, including those who do not have long and/or illustrious careers (and many of these part-time players DO reach their peaks), and what is their peak age,” the answer probably looks something like my last curve, at least in the modern era, although the one in the next installment after I adjust for survivor bias is probably more appropriate.
And that curve (for the “average” MLB player) is not unlike what we have thought all along - a fairly steep ascent until a peak of 27 or 28, and then a gradual decline which gets a little steeper if and when a player gets into his thirties and beyond. There is simply no way that we can expect a player (not knowing anything else about him, such as body type) who has not already finished a long career (or come close to finishing) to peak at 29 or 30, as JC suggests.
Of course it makes no real practical sense to talk about a player’s peak age and his trajectory after he has already finished his career.
I wrote some comments on The Book Blog. Colin is 100% correct. This trajectory has no useful value. It surely cannot be used for any projection purposes. It simply tells us the average "observed" (which is very different than "true," as I explain below) trajectory for very good players who had long and prosperous careers. Those players are a very small subset of all players at any age, especially at the younger ages (what percentage of young players end up having a career of at least 10 years and 5000 PA with at least 300 PA per year?).
So he comes up with an observed aging curve for a very, very small subset of players who by definition peak late and age gracefully (gradually). If we assume that all players have somewhat different "true" aging curves (if, because of nothing else, their differences in physiological versus chronological age), his subset of players is one that necessarily is going to have an aging curve with a late peak and a gradual decline - otherwise they likely would not have lasted that long and played as much and as regularly as they did.
In addition to that, and to make matters worse, the trajectory he found is not even a trajectory of "true talent." Because of his selection bias, it will necessarily be comprised of players who, by chance alone, had late peaks and gradual declines.
To illustrate that, let's say that all players have the exact same true aging curves. Now, if we let all players play 10 years, obviously by chance alone, some players will peak at 26, some at 32, etc. (even though they all have the same true peak). And some players will have steep post-peak (and pre-peak of course) trajectories and some will have shallow ones (in fact, every possible shape will occur if we have enough players in our sample), again by chance alone, even though they all have the same "true" shape. The players who peak late by chance and have a gradual performance decline by chance alone will tend to dominate JC's sample. Basically JC's sample (a VERY small subset of MLB players) consists of players who have true trajectories that peak late and decline gradually AND players whose "observed" peaked late and declined gradually, by chance alone. Is it any surprise that he finds a peak of 29 or 30 and a gradual decline after that? Heck if we look at players who played 15 years and 7000 PA, we are likely to get a later peak and more gradual decline still! Does the name Bonds sound familiar?
I am sorry, but with every fiber in my body, I think that it is ridiculous to characterize JC's resulting trajectories as "of the typical MLB player," or some such thing. It is an "observed" (as opposed to "true" - representing the changes in true talent of a player over time) aging trajectory of a very small subset of players who we already knew had long and prosperous careers. Nothing more and nothing less. Can someone tell me any practical use for this kind of data?
"That's actually the exact reason that I used the odds ratio correction method. I'm measuring outcomes relative to expectancies. So, if the player traded was an overall .400 OBP guy, the model knows that and expects him to be on base 40% of the time."
Pizza, if the players who are traded tend to be good, they also tend to be lucky, so any post-trade performance will regress, whether you use the odds ratio method of matchup expectancy or not.
Good stuff, BTW!
I too am waiting for a pitch f/x analysis of home and away. Mostly I want to know if umpires have a different K zone for home and away teams or make some occasionally biased calls.
Great stuff Eric! I think it is pretty critical that you break the numbers up into counts (and even game situations, like base runners, outs, score, etc.), or adjust for those things, although obviously you are going to run into serious sample size issues then.
Also, a batter's tendencies has a lot to do with their success when making contact or their ability to make contact. I realize that you are trying to look at these tendencies independent of a player's overall hitting success or their success when making contact or their ability to make contact, but the readers need to be careful about concluding anything about whether a batter is optimizing his approach without taking into consideration measure of success. For example a player like Valddy can afford to swing at pitches out of the zone because he is so good at it (making contact and hitting the ball hard when he does). A player like Castillo can not, because, for example, even if he were able to make contact on pitches out of the zone, he would not hit them very hard.
I'd be real wary of reading too much into team SBIP. Too much effect from park and the pitchers. While pitchers do not have that much control over BABIP, they certainly have more control over doubles and triples. In fact, there is a fairly strong correlation between a pitcher's HR rate and his extra base hit rate.
If anyone wants to check, I am going to guess that the best teams in SBIP also had a pitching staff which allowed fewer HR than average, and vice versa for the worst teams in SBIP.
I don't know about "love affair" but if you look on Fangraphs you will see that Chone, Marcel, and ZIPS had Sonnanstine's FIP projection (basically an indicator of the context-neutral expected performance of a pitcher) at around 4.00, which is very good. I am not sure of the scale of FIP (what a league average FIP is - probably around 4.30), but the same forecasters' FIP projection for James Shields was around .2 runs better and for Garza about the same. Pecota was a little worse, giving Sonny an eqERA projection of about .1 runs worse than average.
So basically, some of the "statheads" projected Sonny as anywhere from a little better than league average to slightly worse than league average, despite not being a high GB pitcher and despite having a lower than average K rate and a higher than average HR rate. That is because he was projected to have a fairly significant lower than average BB rate.
Anyway, Sonny has indeed pitched horribly this season and after watching him a few times this year, I wouldn't give a plug nickel for his services, for whatever that's worth.
A pitcher whose fastball averages 87 mph and is not a heavy ground ball pitcher is generally going to live on the edge...
FWIW, using Madson's projected platoon splits and that of Eyre and Taschner, Madson's projected "ERA" against lefties is 3.90 and Eyre's is 3.28, assuming that the lefty batter has average platoon splits himself. If he has more, then there is an even greater difference between Madson and Eyre. Taschner is 4.12 versus a lefty - he basically sucks overall, although is useful against lefties as compared to a RHP who is not all that great or has a large platoon split himself.
Most lefty pitchers are better versus LHB than all but the best RHP. That is why teams have LOOGYs in the first place. The traditional argument, "Why bring in a crappy LOOGY when he's, well, crappy," usually doesn't hold water since even a crappy LOOGY is pretty good against lefty batters and better than all but the best RH relievers.
But, as someone pointed out, as much as Joe would like to see Dunn be the last batter, that ain't gonna happen 30% of the time or so. So you have to factor into the analysis who is going to pitch to the following batter or batters that 30% of the time that Dunn does not make an out.
So, you have Madson versus Dunn, at a 3.90 "ERA", plus Madson versus the next batter 30% of the time, and the next batter after that, 8% of the time, or whatever it is, or Eyre versus Dunn at an "ERA" of 3.28 (much better than Madson's 3.90), plus someone else versus the next batter 30% of the time, plus the next batter after that 8% of the time.
I don't know the answer. It's probably a toss-up or reasonably close either way, depending on who else they had in the pen, if anyone, to take over for Eyre if Dunn gets on base. If you have to leave Eyre in there to pitch to the following RHB's, he is projected at 4.00 versus RHB, as opposed to Madson at 3.19. SO while you gain .62 runs with Dunn at the plate with Eyre rather than Madson, if you have to leave Eyre in after Dunn, you lose .81 runs 30% of the time for the next batter, and another .81 runs 8% of the time for the next batter after that. That is a net gain for bringing in Eyre of .32 runs per 9 innings, which is around .008 runs per batter. Multiply that by maybe 3 for the leverage and you gain maybe .002 wins - nothing to write home about.
Probably more important than those overall numbers are the relative values of the various events when you are making a decision about what relievers to bring in. Some pitchers are bad or good because they give up a lot or few walks or HRs or batted balls, or what have. The value of those events can be quite different depending on the inning and score and the player(s) at the plate. For example, with a 3 run lead to start the 9th, you want a pitcher who does not allow a lot of base runners. You don't care if he is a high HR guy. With runners on base, you want either a GB pitcher or a high K pitchers. In a one run game, especially with 2 outs, you want a low HR guy. Etc. Those are important considerations as well. How often have you seen a manager bring in his closer who is a high walks and high strikeout guy (low BA against) in the 9th inning with a 3 run lead and you intuitively cringe because you know that he is going to walk the bases loaded and then have to pitch out of a self-inflicted jam? Not only is a 3-run lead (LOW leverage}) a great time to save your closer for another (more important) game in general, but it is also a good time to bring in a lesser pitcher overall who is a low walks guy, even a low walks, high HR guy. If he gets in trouble, you can always bring in the closer anyway.
As you can see, one of the problems with taking out your best pitcher to bring in a lefty is that even if the lefty/lefty matchup is better, you don't have your best pitcher available anymore for the following batters, if they should bat.
The best use of a platoon matchup in favor of your closer, is at the beginning of the inning of course. How many times have you seen two lefty batters lead off the 9th inning, say Utley and Howard, and the opposing manager brings in his RH closer to start the inning? Assuming a decent LOOGY in the pen, the better move is usually the LOOGY for the first two batters and then the closer. Occasionally, you will see a manager like LaRussa, Scioscia or Pinella do something like that. I love it when I see it. Again, that is assuming that the LOOGY matchup is better than the closer/lefty matchup, which may not be the case if your closer is especially good, has a small platoon split himself, or the LOOGY is especially bad (like Taschner versus Madson).
Matt, what if all of these offensive rates, like OPS, HR/PA, etc. are multiplicative, or at least partially multiplicative and partially additive, rather than just additive, which I think they might be?
Or what if you use an odds ratio method, which I think might be the correct thing to do?
If home road difference is a fixed ratio rather than a fixed difference, and you correlate H/R differences from one year to the next, guess what? You are going to get a strong correlation. You will also get a stronger correlation for players with higher numbers, for the same reason.
Plus, to be honest, the park factor thing is really going to throw a monkey wrench into the equation. Even if you try and park adjust, I think. Using one year park factors is going to create a lot of noise and if you arbitrarily regress 50%, if that regression is not enough, you have a lot of noise, and if it is too much, you will get correlations just from the park factors alone. Plus, again, I would NOT be using Rockie players at all. Including them, whether you use park factors or not, is going to give you a positive correlation overall.
Anyway, we'll start with the assumption that what we consider to be a "fixed" HFA is that all teams have the same difference between their home and road WP (and it is not a given that we can't define a "fixed" HFA as a fixed ratio between home and road WP). We'll call that .08 (.54 -.46). So, for example, a good team with a .600 overall WP will be .640 at home and .560 on the road. The assumption is that that will also be true regardless of the level of offense or defense at home or on the road (I don't know if that is true or not either).
Now, what do we have to do with runs scored and allowed in order for that to be true? Let's say that we have a team that scores 4.5 runs and allows 4.5 runs. In order for them to have a .540 WP at home and .460 on the road, we would have to add .18 runs to their runs scored and subtract .18 from their runs allowed OR we would have to multiply their runs scored by 1.04 and divide their runs allowed by 1.04. Which way works better?
Let's say that we have a team that scores 6 and allows 6 overall. Their overall WP is still .500. If we add .18 runs to their home score and subtract that from their runs allowed, we get a WP of only .530. If we multiply their runs scored by 1.04 and divide their runs allowed buy 1.04, we get .539, which is near what we want.
So for level of offense and defense, it looks like we have to multiply runs by a fixed amount. What about for the strength of a team. Say a team scored 5 and allows 4 overall. They are a .610 team. We expect them to be .650 at home and .570 on the road. Let's do the same thing - try adding or subtracting a fixed number of runs and let's try multiplying home and road runs by a fixed number. Adding and subtracting .18 runs gives us 5.18 RS and 3.82 RA at home, which is a WP of .648 and on the road, it is 4.82 and 4.18, or .571 on the road, pretty close to what we want. What about multiplying and dividing by 1.04? At home, we get .646, and on the road, we get .572, so adding and subtracting seems to be better.
So, it is not real clear to me which is the correct way to do it.
To be honest, I think you have to do some sort of odds ratio rather than assuming a fixed difference or a fixed ratio. If you do that, you have to re-run your correlations. I am not exactly sure how to do that, but I'm sure someone can help...
I am afraid you're not going to find out anything here without controlling for the two things: One, the exact stolen base talent of the runner and two, the game situation.
The reason is that you cannot separate cause from effect. Obviously, no throws to first regardless of the number of pitches in the AB suggests that it is not a base stealing situation (maybe the game is a blowout or the batting team is down by more than a run late in the game). It also suggests that the runner is not such a base stealing threat, even if you only looked at runners with at least 15 steals (not all 15 steal players are created equal).
The other thing is let's say that throwing over to first had no effect on the runner attempting a steal or being safe on a steal? Why would the pitcher waste his time with throws? Because you can't pick anyone off unless you throw over!
Basically, I don't see the data telling us anything at all for the reasons articulated above.
A few things.
1) I think that the HBP against Rollins was in the Jersey and not the sleeve, no? I could be wrong.
2) Maddon should have taken Price out of the game after Howard, and he should not have brought him in until the lefties started hitting (Balfor should have pitched to Ruiz and Rollins). The reason is two-fold. Price had no fastball, at least according to the Fox gun. It was 92-93. It usually is 95-97. With that kind of fastball (not that 92-93 is bad) and his definite command problems in general, I don\'t think he is a particularly effective reliever versus RHB. Even with one day off, why burn him when you had plenty of relievers left in the pen. What you want to do in the post-season is to rotate your relievers as much as possible so that you have lefties and righties available every game, if possible.
3) Danley did NOT signal out and then ask for the appeal. That would have been ridiculous (OK, I would have thought that Eddings call would have been ridiculous also...). He started to call out and then changed his mind. There was nothing wrong with that. The home plate umpire should almost never call a batter out in a check swing. He cannot possibly see whether he went around or not. THat is not to day that Danley did not come as close as possible to calling him out with his hand, but in order to call a batter out, you usually raise and then close you hand and then say, \"Strike three, you\'re out\" or something like that. He clearly raised his hand to call him out, thought better of it, did NOT close his hand, let that hand motion towards first, and of course did not say, \"strike three.\" That is why Manuel did not argue much.
If Danely had actually \"called him out\" as Joe says in the article, Manuel would have gone through the roof, and justifiably so.