CSS Button No Image Css3Menu.com
New! Search comments:
(NOTE: Relevance, Author, and Article are not applicable for comment searches)
Thanks, that clarifies things.
I was also unconvinced by your article, Matt. You miss three other crucial points:
1) Teams have different colors and the players are paid salaries to wear these colors. All players must take this into account.
2) Time is fleeting. The future is now and teams want to win now. If you don't win now, you don't win in the future.
3) Much of what you see as a need to make money is really just a survival instinct to pay for food and shelter to fulfill 1 and 2.
Read Little Women and some of this should become more clear to you.
What? Maybe you could explain? If I'm reading this correctly, you're suggesting that the Yankees spend a lot because they feel a need to make a profit in order to fulfill their destiny of being in New York. Expand. And leave out swipes at Matt's education this time.
It seems like we need to be asking why the Yankees are in a position such that A-Rod is worth so much to them. I assume it comes down to the fact that a return to an additional win (or making the playoffs, winning the WS, etc.) is worth more in NY since it's a bigger market. So, maybe the best way to go about this is the following: figure out some "return to outcomes" (wins, playoffs, WS) for each team or, more likely, for each category of teams - small, middle, large market teams. MLB, then, makes sure the "return to outcomes" are equal across teams by giving small market teams extra money based on outcomes. In other words, they subsidize wins, playoff appearance, WS wins, etc or tax the absence of those successes. The amount of the subsidy/tax is dependent on your market size. Once the returns are equal across teams, we should see similar budgets across teams since Pittsburgh and New York now both have the same incentive to invest in their team. By credibly promising that this structure will exist in the future and increasing such payouts by interest rates, teams will have no incentive to go after an extra win or two in this season at the cost of future success (in other words, this won't force teams to increase payroll for its own sake since investments should be more worthwhile).
You're right that sample sizes become an issue. I wasn't suggesting, however, to create 3-dimensional bins. That would be too much. But maybe a series of 1- and 2-dimensional bins.
Matt, I have a suggestion to revise SIERA while maintaining the same basic idea. Your point seems to be that non-linearities and interactions matter, but you handle them in a very brunt way when you simply multiply different variables together or square them. It also makes the coefficients hard to interpret. I would suggest creating "bins" for different ranges of each variable or interactions of variables. In other words, create a dummy variable for everyone with a K% between X1 and X2. Or, better, one dummy variable for K% between X1 and X2 and BB% between Y1 and Y2. The "trick" will be defining these bins as precisely as possible while still including a useful # of observations in each one. This is a very flexible way of dealing with interactions. Even better, it would make the results very easy to interpret. Anyone could take a pitcher, find his bin or bins, figure out his SIERA, and easily calculate his theoretical SIERA if he had, say, just slightly increased his strikeout rate.
I think everything you just said is right. I guess it depends on how you want to phrase the question. If you want to know how a 26 year old will do next year, you probably want to start with only those that play when they're 26. But if you're thinking about offering 5 year contracts, then your point is right.
I'm not sure I understand why you wouldn't want to make everyone play the next year. I think the ideal experiment would be the following: take everyone who's 25 - group A plays when they're 26, group B drops out. You're going to force group B to play when they're 26 to get their stats. Now you have the right aging curve for 25->26. To get the 26->27 aging curve though, you'd throw out group B because you no longer care about those guys. Is that the point you're making?
I still think that assuming the trajectory for missing players just assumes the final result in some way for the actual aging profile (again, I'm fine with that to answer the peak age question rigidly).
I _hate_ selection models and avoid using them at almost all costs. But this entire topic is dependent on selection so I think any method which does not attack the selection issue head-on is a non-starter. That stinks, but it's just the nature of the topic.
I actually find the robustness to using different assumptions on the dropouts pretty convincing. At the same time, I wonder if it answers the question too rigidly. To me, the peak age question is basically synonymous with determining the true average age profile. This check, however, makes them separate questions. By assuming random values for the dropouts, you're partially guessing the average age profile (for all the reasons described by people above). It just happens because of, say, attrition rates and lots of other factors that the "peak age" is unaffected. You're getting all the age-specific values wrong (or at least, we have no idea if they're right), but it just works out that the peak age is unaffected. Does that make sense? I'm not suggesting this is a criticism for this specific paper, but I think it's an important caveat. Agree?
You're right. But the idea, for now, is to get the methodology right. Get a general answer and then worry about applying the methodology better in the future.
Yeah, this is exactly what I'm suggesting. The Heckman model, though, identifies off of distributional assumptions. I'm suggesting finding an "instrument" which independently shocks someone's Pr(playing next year).
It just seems like everyone's trying to do weird stuff to account for selection when they need to attack the problem head-on. You want an "experiment" that makes sense. Say you could find a group of players that you knew had a 100% chance of playing next year because they had compromising pics of their GM (these pics were handed out randomly) Well, you wouldn't even need anyone else in your sample - you would just use this group to get at the core issue here. Of course, this group doesn't exist. But say you could estimate that some group has a 60% chance of playing next year because their team's farm system isn't great. And another group has a 40% chance. Comparing the results from the 2 groups accounts for "selection" and you can figure out the true effect then. That's the underlying experiment. You just need to find some "shock" to Pr(playing next year) that is unrelated to this year's performance.
I disagree with this. I don't think the point of this piece was to provide a descriptive analysis of what happened to a bunch of players who played for a long time. I believe that a general study should be done, but you need to properly estimate the parameter of interest. This methodology doesn't do that. Neither does MGL's. That's fine - it's a tough question. My main point is really just that this method makes the survivor bias a lot worse so I think it's a step in the wrong direction.
I don't know if JC is going to respond to comments, but I've suggested before using a selection model. Is there a reason, you're not doing this? Estimate separately the probability that someone will be playing next year (using, ideally, some type of "shock" for identification...position quality? Strength of team prospects at that position?) and control for that probability. This should get rid of selection bias if done correctly. I'm not an expert at selection models so maybe someone else could jump in here and help out...
No, I get that. I'm not making that profound of a point, but it's specific to the shape of the curve (note that my original example started everyone at the exact same level). Say you have 2 Mickey Mantle's, but one is dropping off faster than another. Both start at the same EqA, but Mantle A is out of baseball in 3 years. Mantle B gets better every single year. We only see Mantle B using JC's methodology so we think everyone - on average - gets better as they age. But, in reality, they don't. My point is really just a basic one - never select on the dependent variable.
So, definitely not. It's a big problem. You're selecting on something that's correlated with the outcome variable - being good enough to play when you're 24, 25,...,35. In the example I gave above, the correct answer is 1 (or, to translate it into terms of the study, players stay the same on average). But you only see the players with B >> 1. So you get the wrong answer.
The motivation of this study is to understand smart contracts. If I have a 24 year old player, I don't know if he'll still be playing when he's 35. If I did, that would be very helpful information.
To give an extreme example...say I have a 21 year old pitcher with an ERA of 3.09. I want to know if locking him up for his age-44 season is worthwhile. In general, the answer is no because pitchers - injuries or not - just aren't good when they're 44. But what if I do an empirical study containing only pitchers who pitched when they were 21-44? Well, now I'm looking at Nolan Ryan and it looks like a great idea. But I don't know that I have Nolan Ryan because I don't know that my current 21 year old pitcher will be pitching when he's 44. Knowing that would be tell me something about his ability and, more importantly, the change in his ability over time.
I think you've made the survivor bias problem much worse. Say that I am estimating how much better/worse players are when they are 35 than when they are 24. Everyone has an OBP=.350 when they are 24 and an OBP of (B x .350) when they are 35, where you want to estimate the mean B. Say B is distributed normally around 1. Also, assume that if a player is under .300 when he is age 34, he will not be given the chance to play when he is 35. This means (just assume that players are exactly the same when they are 34 and 35) you only observe players with B > .350/.300. That's much bigger than 1. Thus, biased. There are tons of stories you can tell just like this too.
You just can't select the sample based on "long survivorship."
Matt, can you regress HFA_2008 on avgHFA(1997-2007) and report the coeff and p-value? (or you choose the time period for the avg HFA that you think is most "relevant")
This would work the other way. If they're highly-correlated, then this should drive the standard errors for both variables up (towards insignificance).
Right, apologies for causing the confusion. I really meant that (you're implying that) FIP is a bad predictor _conditional_ on ERA for your selected sample, but not for the entire sample. My points are:
1) Intuitively, that makes sense. If FIP is really far away from ERA, then it's probably partially because of "noise" in FIP (non-linearities are important which FIP ignores).
2) This isn't necessarily true given the results you've shown (though I'm guessing it is true anyway). Say you care about predicting Y and you have predictors A and B. And you have Corr(Y, A) = Corr(Y, B). Yes, A and B are equally "effective" in predicting Y. But that doesn't mean you don't want to use both. To understand the importance of a predictor - like FIP - you can't just compare its importance to another predictor. You have to see if each predictor is meaningful _conditional_ on the other. The regression I suggested above would get at that better.
Fair enough. It seems like what we want to see is a regression like (where _# refers to the half):
ERA_2 = a + b1 * ERA_1 + b2 * (FIP_1 - ERA_1) + e
If FIP is "better" than ERA, then b2 > 0.
And you're saying that, really, the effect of (FIP_1 - ERA_1) isn't linear and should potentially have a different effect for FIP_1 - ERA_1 > 1. So, you could just alter the regression to let b2 change at certain cutoffs.
Might be interesting...
Your main point seems to be that Corr(FIP for 1st half, ERA for 2nd half) is higher than Corr(ERA for 1st half, ERA for 2nd half) for the overall sample. But the same when you select on a group that has a much higher FIP than ERA in the first half. That doesn't really mean FIP is a bad predictor for that group though - just within that group. You've eliminated a _major_ source of information for that group - the fact that FIP is 1+ runs greater than the ERA.
The inputs to FIP are linear so when there are non-linearities, FIP is going to be wrong, most likely in the outliers. You've basically selected on the outliers, meaning the information provided by FIP is relatively noisy within that group. I'm not saying anything 100% contradicting what you said. I just wouldn't say that a variable isn't a useful predictor when you've selected the sample in a way that should make it a poor predictor.
Right, non-linearities are definitely an issue though even if they weren't, I'm not sure this type of analysis would be helpful. I, also, would say teams should use "private information," though I have to admit that feels like a cop-out since even with that information, I'm not sure how I'd (correctly) use it.
I don't think using revenue/game buys you much unless you also want to identify the effect of "record in the last 10 games" (or something similar). There's no variation in "won the World Series last year" game-to-game within a season.
Oh, great article, Matt.
(Apologies - I realize this might be kind of annoying to some)
Another way of saying some of what I just wrote is...maybe write down what you think the correct specification is? What you wrote in your last post assumes the same effect for each team (which is what you're arguing against).
You want this?
Revenue_it = a_i + (X_it)'b_i+e_it
where X includes Win%, playoff success vars, etc. And you're assuming X is full rank for each team (not necessarily the case - eg. Nationals).
1) But you're still doing a time-series regression. You want team fixed effects AND team-specific coefficients. That's equivalent to doing a separate regression for each team (algebraically, assuming you would include the same variables for each team). Your identifying assumption, then, is that changes in playoff appearances or Win% is exogenous and uncorrelated with the year fixed effects. More importantly, how are you going to identify the revenue boost for a playoff appearance for the Nationals? Or, less extreme, for the Pirates? Are you planning to run this regression for the Pirates back to before 1992 and assume that coefficient is relevant to today? I would think the "year fixed effect" is crucially important in this case and that's just not identified (in each separate regression). That's not your fault and there isn't really much of an alternative, but I'm trying to figure out what you're proposing.
2) You would need to instrument both variables (the fact that they are or are not correlated with one another is irrelevant). I'm guessing that good instruments are hard to find here. Definitely for Win%, though possibly less so for playoff success (try to use some randomness in the outcomes?). But I'd still feel pretty uncomfortable with the results in a time-series regression.
Again, none of these problems are your fault, but I'm trying to figure out what assumptions you're willing to make. I actually think estimating separate coefficients would be very useful, but we just need to be clear on the restrictions that must be applied.
My guess is that people would definitely rate the articles if given a chance.
I would also recommend allowing the BP staff comments to be rated. It seems pretty strange that they are currently "exempt."
So, just to clarify, you want Matt (or someone) to run the following regression for each team:
Revenue_t = a + b1 * (Win%_t) + b2 * (Playoffs_t-1) + (other championship variables at time t-1) + e
Is that right? What would your instruments be exactly? Would you really trust a time-series regression to give you accurate estimates? (I suppose you can estimate this for all teams simultaneously and include a year fixed effect which is constant for all teams, but I don't think that really solves the problem).
The sentence implies that the 5th starter is replacement level.
I think your charts don't provide the right information. We want to see the change in pitcher performance holding the pitcher constant. You have "time through order" changing, but the composition is also changing. Instead, tell us the change in performance for each pitcher who made it through twice relative to how he did the first time. Then the change in performance for each pitcher who made it through three times relative to how he did the second time. And so on. Right now, those numbers aren't really that meaningful.
You could and probably should include everyone and just use analytical weights (based on PAs) to account for the fact that you're more confident in some observations than others. Basically, this is just a heteroscedasticity issue.
That doesn't completely solve the problem of how to check the accuracy of a projection system. The weights (or an arbitrary cutoff) are partially a function of how good the projection is doing (players who are playing way below expectations get fewer PAs). But weights are much better than dropping based on some cutoff.
This actually ended up being my favorite for this week. Matt's biggest "mistakes" were really that he (1) tried to introduce completely new ideas and (2) admitted nuance. You're not supposed to do that on sports radio but that doesn't mean it's wrong. If I had never read any of Matt's work before, this would have been the first time I ever learned something from a sports radio interview/podcast. In other threads, Will has suggested that you're "supposed" to briefly address a subject and then leave it at that unless the host asks for more. But that's why radio intervews suck. I sincerely don't mean this as a slight against the current BP staff (since it's the fault of the medium), but I have never heard any of them say anything interesting on the radio/podcast. It's always, "Player A will eventually start hitting, but maybe not." And that's it. And they make comments which if anyone else had made, they'd write a column detailing why that statement was stupid. Again, that's the fault of the medium - it's difficult not to do that. But Matt actually said new and smart stuff and discussed some more abstract ideas. And I appreciated that.
Also, he did use the word "collusion" correctly and I was happy to hear someone finally do that.
No, jkaplow21 is right. You're saying that the effect Matt finds is biased downward since some of the observations he's considering "treated" are actually "untreated." So, really, the effect is bigger. That's not worth a criticism.
I think this was Brian's best piece. I don't see any "heavy duty stats," Will. He uses percentages. I think we should allow the contestants to divide. At this rate, you're going to have Brian assigning different types of emoticons to players within 2 weeks instead of using any informative numbers.
Brian, just a suggestion for an extension (and not a negative comment on this article at all): maybe quantify the tradeoff between catcher defense (throwing out runners) and offense that's evolved over time. Very interesting stuff.
But notice that it didn't make a difference. Just like in the Idol piece you're referring to, Law of Large Numbers takes care of this.
Is the draft on TV this year?
Matt, great article. I don't think it's worth obsessing over the correct standard error here (I was fine with what you initially did) but since we are, I'll chime in. I think Tim makes a good point that the draft picks are not independent. If Strasburg doesn't go #1, he likely goes #2 and definitely goes in the first 2 rounds. The fact that a college player is picked #1 isn't that important if he would've been picked in the first 2 rounds both pre- and post-Moneyball. The underlying experiment, then, is...was there a "shock" that caused more college players to be drafted in the first 2 rounds post-Moneyball? This doesn't really affect the top players. Basically, the observation level is really just the year. You could also consider adjusting for "clustering" by team as well. And then getting the right standard error becomes a paper in itself (get variance when cluster by year, get variance when cluster by team, get variance when cluster by team-year...right variance is sqrt[(1)+(2)-(3)]). Honestly, I'm not even sure that's right. I'm fine with the original stat you gave us.
(Note to Matt and other contestants: I worry that everyone's going to start worrying about using any stats work since the comments seem to jump on small details everytime (I'm guilty of this as well), but I'd like to encourage you to just do it anyway.)
Law of Large Numbers takes care of this issue. Binomial converges to normal distribution.
What the heck? Indicator variables can have a standard deviation. That means they have a t-test and a p-value. Why do you think this is wrong? I've never known anyone who thought this.
I liked this article, but for the purposes of voicing an opinion that I'm sure many other readers share - I couldn't have cared less about the quotes. Honestly, I didn't even read them the first time. After seeing the judges' comments, I went back and re-read them and I'm pretty sure the article is the same with and without them. I could've guessed that Stanton was raw and worked hard. I'm not saying all industry quotes are bad. But let's not start scoring by saying, "Ok, point for random industry quote..."
Great point. I found this mistake crippling as well. Matt, your article isn't relevant to my 1913 fantasy league. Please rectify that.
(Great article, Matt.)
Yeah, this comment is ridiculous. To kind of relate this to American Idol...say this were a general talent show and this was "music week." The next Whitney Houston (or whatever, pick someone who's a talented singer) performs and nails her performance. Will's comment would be, "Great singing. But I like drums." More ridiculous when you consider that the audience watching the show really likes singing. This analogy isn't working so I'm going to stop. (Also, I'm not suggesting Tim will develop a drug addiction and lose his ability to write good articles.)
This was a great article. I had never really considered putting together my own fantasy program until this article and it gave me a useful framework to use. I also probably wouldn't have used Excel since I typically use other statistical programs, but Tim told me the exact commands that I would use in Excel. I'm much more likely to use Excel now. Should he have detailed the VLOOKUP command? Definitely not. I'm happy to research that myself. There really wasn't anything complicated in the article so I don't understand the complaint.
Great article, Tim. Obvious thumbs up. One possibility for future improvement (I realize that your main point here was not to come up with the perfect measure): I think you might be able to get a more useful number than the MPV one you suggested. Fantasy players don't really care that they added 10% (on average) to their categories because 10% means different things in different categories. Do you think using standard deviations for each category (and then averaging) makes more sense?
Thanks for the lengthy response. I misinterpreted how you were doing this and it was 100% my fault.
I'm guessing you're not responsible for this methodology, but I still don't think it's optimal. Admittedly, I'm surprised that - holding team quality constant - the park factors converge to the correct numbers. I'm not entirely convinced this holds generally, but I'll take it as given for now. I'm less surprised that if you allow heterogeneity in team quality but force all parks to be the same, that this method works fine as well. However, I'm pretty sure that this gets the wrong answer once you let teams have different HR rates and stadia have different park factors.
Instead of iterating, there's a pretty straightforward way to get park factors. The problem with unbalanced schedules is that they weight some parks more than others. Just eliminate the implicit weights. Using your example above, I don't need to iterate. Instead of using (A vs C in your example) A=[.04*4]/[2*.04+.04+.06]=.89 and C=[.04*4]/[.04+.04+2*.06]=.8 and then iterating to get C to converge to A, you can calculate the values automatically by just "unweighting" the balanced schedule (and including the park's own HR rate in both the numerator and denominator): A=[.04*4]/[.04+.04+.04+.06]=.89 and C=[.04*4]/[.04+.04+.04+.06]=.89.
In other words, you don't want to use a team's aggregate road numbers and aggregate home numbers. Each teamA-teamB matchup is an observation - calculate the ratio and don't weight any matchup more than any other. Just find the park factor for each team-team matchup and aggregate.
I'm afraid I haven't explain my point well. I don't think the advantage of what I'm proposing is just to eliminate iterating. By using one aggregate ratio, you can't separately (correctly) identify the team effect and the stadium effect. Instead, you need to separate things by team and by park - this separately identifies each one, allowing you to identify the park factor.
Again, thanks for responding! This is a helpful discussion.
Thanks for the replies, Matt. It's greatly appreciated.
1) I don't think it's good practice to use the predicted BABIP to get a standard deviation measure of true ability. For any regression, you have explain and unexplained variance. You've only used the explained variance but since you have a poor (noisy) measure of true ability, this doesn't really give you the right number (eg. regress wages on IQ and get predicted wages -> you wouldn't claim the std dev for the predicted wages is the correct std dev for wages since there's so much unexplained ability). Anyway, this actually isn't that big of a deal to me.
2) At the risk of coming off snarky, your theory is completely wrong (or at least illogical). You're just saying mixed strategies lead to better performance. So does throwing harder than 55MPH. Your point seems to be - all pitchers use mixed strategies (equally well?) and mixed strategies lead to randomness. By your logic, we should see no correlation in HR rates as well. It would be incredibly interesting (though, I'm guessing, difficult) if you did a study on pitch types/location to study patterns, looking at whether predictability affects BABIP. But you just can't prove it with the stats you've presented.
The explanation for this theory really has to center around why HR rates are highly-correlated across years, but BABIP is not. The general answer is noise. Take a HR ball and move it over 1 foot - probably still a HR. Now take a grounder and move it over one foot - the probability that the outcome changes (out->hit) is higher. Or don't move it over at all and the probability of an outcome change is still non-zero (fielder is a step slow in one scenario). This "extra randomness" just means more noise. Noisy measures are less-correlated.
Anyway, I really do appreciate the responses and I did vote for you (one of 2).
I think this was clearly the best. But it's traditional for commenters to point out the one thing they found annoying so I'll mention this: "more sophisticated tools from optimization theory like dynamic programming." This seemed a bit much. Would you really ever need anything more than a straightforward simulation?
But, really, great job!
I agree that I think this article ended up overreaching. There are plenty of issues with park factors that could've been explained before (and instead of) starting to calculate "time-invariant park factors."
For example, these two sentences really bother me: "After the initial calculation of each park's factors, use those to normalize each team's road statistics and rerun to generate a new version of factors. A third time is even better, but more than that doesn't add any meaningful accuracy."
I would've loved to hear more about why this is the accepted method since (as I commented above), I would assume this actually gets the wrong answer. Even if it's not a Basics article, it's always good to start at the beginning and explain why the community does something in a certain way. Instead, this article just glosses over a pretty major adjustment when there was plenty of opportunity to discuss it in detail.
General question for anyone: is there an explanation/justification for this method somewhere?
.010: Totally not your fault. Not sure I'd agree with the final number, but I'll side with you since you know the literature better.
Denominator: No, you interpreted correctly (PA vs. BIP). Not the main influence, but it probably has some effect.
Mixed strategies: I just don't see how mixed strategies explain the low correlation in BABIP. A few posts below you talk about what would happen if Eck had always thrown the same pitch on 3-2 counts. Yeah, he'd probably be a bad pitcher. But what does this have to do with BABIP?
Are you trying to explain the low line drive rate overall or the low correlation in BABIP? If all pitchers are using mixed strategies equally well, then it's probably not the reason for the low correlation.
Here's an analogy: What would happen if Eck could only throw 55 MPH? His line drive rate would probably skyrocket. Since his line drive rate was low, I can assume he threw more than 55 MPH. I guess I can assume every pitcher throws more than 55 MPH because of the low line drive rate in the majors. But that doesn't mean I can say anything about it explaining BABIP correlations. I get that mixed strategies are good for pitchers' performance stats. They have to be. But why does this affect BABIP correlation? There's a major link missing here in your argument.
I'm not positive where your .010 number is coming from but that's likely my fault. I do think that a 0.149 correlation is actually pretty high for a measure that noisy. For one, it just seems like a noisy measure on its own. Second, the denominator is smaller than the other measures (fewer observations->higher variance). And you're subtracting out the team average (which is measured with error itself). This is all fine on your part, but it just means we end up with a noisy measure and a low observed correlation.
I'm sure that mixed strategies are important, but your conclusion doesn't really make sense. If a pitcher threw the same pitch in the same location, the correlation for strikeouts and homeruns would also be pretty high. And they are. The correlation for BABIP would probably be smaller because there would be lots of noise. This is exactly what we see.
So, the evidence doesn't really make your point. I do assume that all pitchers use a mixed strategy (though an article about this using pitch type/location data would be really interesting). But your argument seems to be: pitchers want to prevent line drives, there's no correlation in line drive rates for consecutive seasons -> pitchers used a mixed strategy? Not sure how we reach that conclusion. Are you saying all pitchers used a mixed strategy and mixed strategies lead to randomness? If so, why the correlation in homerun rates (which pitchers are also definitely trying to prevent)?
I'm just missing the link. Are all pitchers using a mixed strategy? Equally well? And then mixed strategies lead to randomness? Or they just lead to better outcomes in general?
This might just be a factual question for Brian (or anyone else who's calculated park factors before). To adjust for the unbalanced schedule, it looks like you are doing the following steps: (1) Calculate park factors assuming balanced schedule; (2) Adjust the team's road stats based on park factors found in step 1; (3) Recalculate park factors; (4) Iterate for as long as you want (though it stops mattering very quickly). Why do you do it this way? Is this just the traditional way to do it? (I really have no idea.)
My bigger point is that, unless I'm missing something, this method gets the wrong answer, doesn't it? Say every stadium is exactly the same, except for Coors which dramatically increases HRs. Team A plays a disproportionate number of games in Coors so their stadium ends up with a park factor of 0.9. Team B ends up with 0.97 (#s are illustrative only - not sure they really make sense). Team A and Team B _should_ end up with the same park factor. But this would never happen. Because A and B are exactly the same, they (on average) have the same numbers of HRs hit in them. But, after your adjustment, it looks like Stadium A has 0.9/0.97 as many HRs as Stadium B. The park factors should definitely converge quickly, but they're converging to the wrong numbers. Am I off-base?
I can think of some straightforward ways to get the right numbers (which, honestly, I've always just assumed were used to get PFs) so I'm wondering why this method was used. Thanks!
I have to completely disagree with KG - I really liked the description and analysis of the first 80% of the article, but the last 2 paragraphs are a disaster. Obviously, mixed strategies are likely important to pitchers. But do they explain the low year-to-year correlation in BABIP? I have no idea and there's no reason to believe so based on the statistics presented in this article. The last paragraph is just presented as "I outlined the history of a weird finding and here's a random theory that I will claim explains it."
The author says, "It's not that pitchers don't control BABIP—it's that pitchers barely differ in their abilities to control it..." This is completely wrong. Simply because there's a weak correlation between two numbers does not mean that pitchers have little control. The measure in question could just have a lot of noise so the correlations naturally weaken. There are tons of possible reasons for this noise and a throwing in a game theory explanation isn't really necessary or helpful.
Anyway, I sincerely liked the first part of this article. Stopping before the last paragraph, I would consider it the top entry.
1) I just meant that I'm sure your chaining method is a better predictor for players who were moved up level-by-level (since the method does something similar). The Direct Method is probably really great for those that were directly promoted to the majors since you're just checking the average against itself.
3) To summarize my position - I don't think your test tells us anything about biases since it's self-reinforcing. Given that, we have to evaluate the methods on their own. I'm sure there will always be some bias at least, but we should try to move towards less bias. Your selection method seems to make it much, much worse. Now, for a projection system, this isn't necessarily bad. You don't want translation ratios for random players - you want ones for promoted players because you want something that's descriptive of what does happen. If you were a GM and you wanted to start promoting people based primarily on stats, then you want the ratios for a randomly-promoted player (your first sentence). I would think this would be an interesting avenue for research - finding random (non-performance) shocks that cause players to get promoted (injuries at higher levels?).
1) Right, I was focusing more on the other 2 methods. The larger point is that a higher correlation does not mean a method is better. My intuition would be that the Direct method would produce higher conversion ratios, but the fact that this correlates better with the entire sample is probably just chance. The Direct method, mechanically, predicts better for those that were promoted all the way to the majors. The Chain method, mechanically, predicts better for those that were promoted one class at a time. The fact that one predicts the entire sample better than the other is a result of chance and sample composition. Basically, the way you're checking the validity of your methods is pretty mechanical so don't draw any conclusions about which method is better based on it.
2) Just to help explain how you practically did this - why would the MLB Chained and MLB Direct methods produce different results for AAA? It seems like those should be exactly the same.
3) Here's the broader point: You start out saying, "Major League Equivalencies (MLEs) are a set of formulas that will translate a player's minor league statistics into those that he would be expected to produce if he was in the major leagues." I agree with this - we want to know the expected production of a randomly promoted AAA hitter given his stats. This is different from what you later say, "If the MLEs are being used to judge how well a player will perform if and when he makes the majors, is it correct to base the factors partly on the records of players who failed to advance?" The promoted players are not random, biasing the results. All MLEs (that I know of) have this problem, but you exacerbate it by selecting the sample based on the best-of-the-best, the ones that eventually make the majors (or, worse, the ones that get > 2.5 PA/G). Yes, the eventual translations are better predictors for that group, but that would be true no matter how you sliced the sample - the average for a group is a better predictor for that group than some other average. But is it useful outside of that group in some way? Improvements in MLEs need to answer that question.
I think I don't understand who's in the sample to get the MLEs and then who's in the sample to test the accuracy of the predictions. It's not surprising that when you limit the sample to players that make the majors that the predictions for major league players is better. You've eliminated the players that didn't get past Double A and Triple A and only kept the exact same players you're using to test the different methods (or, at least, a sample that is very similar). Do you include the "pinch hit" players in the final sample to test the methods?
Take your method to a different (admittedly, strange) degree. Say I take the Triple A stats and major league stats for everyone who makes an All-Star team and calculate conversion rates. Then I test the accuracy of this prediction by looking at the sample of All-Star players. You'll inevitably get a strong prediction, but it doesn't make it useful. I'm interested in figuring out who - given their Triple A stats - is going to make the All-Stars. Instead, I have the conversion rate of the players who make All-Star teams. That's probably not representative for the average Triple A player.
In your context, I want to know who should play in the majors given their minor league stats. Not the conversion rates of those who did play in the majors. This isn't your cross to bear, obviously, but it should be addressed when altering the method to get MLEs. A better correlation between predicted stats and actual stats doesn't mean it's more useful if it loses some of its "representative-ness."
I'm guessing this is not an original point, but the big problem with MLEs seems to be the interpretation. The desired (in my opinion) interpretation is, "What can we expect a player in Double A to produce if promoted to the majors given his stats?" But, instead, we get the expectation of Double A players _that were promoted_. This is a selected sample comprised of players who, on average, are more ready for the next level given their stats. If you randomly promoted a player, it's unlikely you'd get the same stats. The difference is basically whether you want MLEs to be descriptive of what happens or predictive of what would happen in other situations.
I bring this up because the different methods described above are different ways to select the sample. You're not just changing the "elapsed time" or accounting for the "pinch hit penalty." You're also changing the bias due to selection. By selecting based on plate appearance per game, you're dropping players that started in Triple A but were only good enough to pinch hit in the majors. But you're keeping players that started in Triple A but were good enough to play regularly in the majors. Your sample ends up only including the successful players. Similar points can be made for the other permutations.
This isn't a fatal flaw, but I think the differences in the results need to also be discussed in this light. Thoughts?
I like this topic, but I'm not sure the method gets the right conclusion. My initial guess is that it's biased towards finding a larger effect for defense.
Say that there are teams that are primarily focused on run prevention. These teams are going to, on average, get better pitchers and better defenders. So teams with good defenses are going to look good for 2 reasons: 1) defense saves run; 2) they have better pitchers. A simpler way to say this is that you found the correlation, but it doesn't imply causation. (To ramble a bit - We care about what should happen to run prevention when a team improves its defense. You found the run prevention of teams with different quality defenses.)
I can think of other stories too (some even go in the other direction), but the larger point (correlation vs. causation) still holds.
Here's an idea that I think solves this problem and gives you a fantastic topic for the future: Do this by pitcher. Look at the relationship between a pitcher's ERA and the DER he got in that year. Look at how the pitcher's ERA changes when the DER changes year-to-year. That would be pretty cool. This keeps the quality of the pitchers "fixed."
Needless to say, the fact that I even spent time thinking about this means this was a really interesting article.