I had this ready to go this past year with my $9 Reddick-Gomes-Smith combo and then Reddick just decided to mash everything so I wasn't willing to take him out most of the year. Still might have been a decent rule for rotating these guys that outpaced Reddick alone, though (particularly since it has both AVG and OBP as categories).
(8x8, 20-team, 30 man roster, 3 DL slot, with 3 to 8 minor league slots separate from the 30, so Ryan Braun is valued at about $85 here).
"But we don’t currently have strike probability calculated on a pitch-by-pitch basis"
I don't have the data past 2010, but have most of this put together at a general level using my semi-parametric models I have talked about before (pitch type, count, location, right/left handed batter/pitcher, pitcher fixed effects) if you guys are interested. Easy to go through 2012 if someone has the data easily accessible and is willing to share.
Always good to note the league structure influences on player values. There are a couple things I believe that no valuation system (even very nice customized ones like the PFM here) out there handles well that I did not see here:
1) Daily lineup changes
2) Large benches
The first can be very important, because predictable LHB/RHB platoons instantly become more important. The second then comes into play, as you won't be able to find those platoons on the waiver wire.
I find these two issues are exacerbated by having H2H competition. Flexibility to stick in Juan Pierre for the last 3 days of a period when 1 SB behind (and 5 HR ahead) can be the difference of an extra category every period or two. The fantasy valuation systems generally have difficulty putting value on that flexibility.
Mike,
Really cool stuff. Thinking a bit more from our contact before, I think the next step here would be to look at within-pitcher variation in hSOB as well. Across pitcher variability gives more of an idea of the spread in talent, while within pitcher variation (with significant regression to the mean) might give more insight to ability to control the outcome. I think that you do address this with the split-half correlations, but I'd love to see Observed and Regressed standard deviations for the players in the tables as well to get an idea of the spread as you do at the aggregate level.
Really awesome stuff.
Hey Mike, glad you highlighted this here!
I think the key to understanding the link between the 4 or 9 'zones' and a heatmap is this: a heatmap essentially does the same thing. The main difference is that the 'zones' are continuous throughout the entire strike zone, which requires the heat map to use a weighted average of adjacent data. This is essentially lots of tiny blocks like above, but with a weighted averaging between blocks.
But things can really, really break down when there is not adjacent data (edge of where the batter swings). This is why I have stuck to using heat maps exclusively for density of pitches thrown or for umpire calls. Sample sizes are sufficient and there are generally not artificial boundaries to the smooth. Even then, using a heat map for umpire calls should be 1) Cross-validated and 2) Use a lot lot lot of data. Without CV, comparison between umpires with different sample sizes will be difficult upon visual inspection (the entire point of a heatmap). Without lots of data, CV breaks down with certain types of analyses (especially in the binomial and ordinal realm), and in that case you have no idea if you are smoothing optimally.
In the end, pitch data is just extremely noisy. Location and pitch type matters less than knowing where the pitch will end up (a hypothesis by me). As a professional hitter, if you're a crappy outside pitch hitter, but you know it's coming, you can probably hit a fastball out of the park. We don't know "when" the batter guesses right on a pitch and even then, it's difficult to hit any pitch, which creates this large amount of noise.
Lastly: thank you for pointing out the color palette issue! While I often use a Blue-to-Yellow-to-Red palette (most natural interpretation), I have noticed that using a single color presents the best representation when attempting to interpret the smooth at a more granular level visually. In this latter case, there isn't a "breakpoint" of interpretation of the color scheme.
Really enjoyed this. I think there is a lot of work to do here both from the perspective of the sabermetrician as well as applications of this in catcher training and practices. As usual, Mike presents stuff that is really pertinent to those both in the front office as well as on the field. Keep them coming.
Not often I don't zoom to the end and just read a paragraph about the results of an analysis. Checked this one out from beginning to end...very nice.
I agree there is a lot to learn regarding batter approach.
I find this nearly physically impossible:
"However, if, with runners on first and third, the pitcher, while in contact with the rubber, steps toward third and then immediately and in practically the same motion “wheels” and throws to first base, it is obviously an attempt to deceive the runner at first base, and in such a move it is practically impossible to step directly toward first base before the throw to first base, and such a move shall be called a balk."
Who pulled that one off that they had to make a rule about it?
This would also be impressive:
"(f) The pitcher delivers the ball to the batter while he is not facing the batter;"
Thanks for the comments! Definitely think there's added value in looking player by player, and certainly at pitch types. Hoping to check these things out in the near future. Definitely other angles to take at the 'approach to rookies' question.
Hey MGL,
Thanks for the comments. I agree with all of your points. I was hoping to go a bit further later on and look at individual player trajectories. And I generally look at location, rather than pitch selection here. There are absolutely more things to look at. One of the reasons I have focused on location is that I don't have pitch types clustered throughout the data outside of the Gameday stuff.
I am hoping to get into looking more specifically at individual player trajectories and rather than 'grooving it' vs. 'not grooving it', look more specifically at changes in average location (inside-to-outside, up-to-down). Going through each player that way is a pretty big project.
Thanks, Tango. Appreciate the comment.
A quick note: While Pitchers remain in the table at the bottom, I DO remove pitchers from the original analysis in the article.
I did it with them in as well, but it didn't really change anything (given the 'non-pitcher' result, this should be expected, as I doubt the pitching approach to batting pitchers is really going to change: just let them make the out).
Eric,
Just wanted to say this is a fantastic read, and quite fascinating. As someone interested in the economics of sport, I'd love to see more stuff like this.
Here is a question for you:
What about team owners' private income? Given that the firm is operating out of it's own state roughly half the time, have states looked to collect business taxes from owner income itself? Most owners have other streams of revenue and private LLCs to put the operating losses on the books with respect to the baseball team and get around taxes to begin with. Do states look to tax their general income (i.e. income from being Chairman of Starbucks) and attempt to specifically trace a portion of that to owning the sports team and operating in other states?
Comment rating: 0