keyboard_arrow_uptop

Bio: I’ve been a stats-geek since before Bill James started self-publishing Abstracts, and have been a Prospectus subscriber for about ten years now. Recently, I’ve written for Seamheads, StatSpeak and FanGraphs, and am waiting for the call up to the big leagues. My personal interests and writing have focused on statistical analysis, and I must admit I’ve been somewhat disappointed in the number of research articles published at BP since Dan Fox left last year. Eric has been a welcome addition, and I am hoping that I can also contribute to BP’s publishing of analysis.

Entry:
Major League Equivalencies

Major League Equivalencies (MLEs) are a set of formulas that will translate a player’s minor league statistics into those that he would be expected to produce if he was in the major leagues. They form a part of most projections systems, including BP’s PECOTA and Davenport Translations. Modeling the level of competition at each stop in the minors can prove to be much more daunting than dealing strictly with major league data, as there are several potential selection biases which can markedly affect the accuracy of the projections.

1. How much elapsed time should be allowed between samples?

2. Does including bench players, who may suffer a pinch hit penalty, bias the factors?

3. Should all players be sampled, or only those who advance all the way to MLB?

4. Should the lower minors be compared directly to MLB, or to the next highest level?

The results show a wide variance in HR and SO rates, and increasingly large overall discrepancies in projections from Double-A and High-A. The best approach might not be the one you would expect.

To test these scenarios, I created matched pairs of batting data with different selection criteria in order to calculate the ratios between major and minor league performance using each method. In the following tables, factors listed are the expected ratio between the minor and major league percent. For example, if a player has a BB% of .100 in High-A, and a factor of 0.71, he would be expected to have a BB% of (.100 x .71) = .071 in the majors.

By recording statistics for each player in each season, we are taking a sample, over a given period of time, which estimates the player’s “true talent” in each of the various categories. As a player ages through his twenties, he will on average lose speed, but gain power, strike zone judgement and contact skills. If it takes two years of statistics to get a good measure of a player, at the end of the two years he is likely not exactly the same player he was before. It would then make sense not to let too much time elapse between the two sets of stats being compared.

With such a time constraint, there is not a sufficient number of players at Double-A or lower who can be compared to their Major League stats. In my first test, I set a time restriction of one year, and collected all batting stats for Class A Advanced (A+), comparing them to what the same players did in Double-A (AA) in the year before, the same year, or a year later. Double-A was compared to Triple-A, and Triple-A to MLB. In order to calculate the factors for the lower minors, the results must be “chained” – that is, to know the factor from High-A to MLB, take (A+ to AA) times (AA to AAA) times (AAA to MLB). This first test, with all players, and using chaining, is labeled “All Chained”.

SDT = (HHR)/(AB-SO-HR) {Singles, doubles, triples}
DO = DO/(AB-SO-HR) {Doubles}
TR = TR/(AB-SO-HR) {Triples}
HR = HR/(AB-SO) {Homeruns}
HP = HP/(AB+HP+BB) {Hit by Pitch}
BB = BB/(AB+HP+BB) {Walks}
SO = SO/(AB+HP+BB) {Strikeouts}

```
All Chained
Level  SDT    DO      TR     HR     HP    BB    SO
AAA   0.90   0.94    1.02   0.78   0.85  0.82  1.20
AA    0.88   0.91    1.03   0.75   0.74  0.73  1.26
A+    0.84   0.92    1.03   0.73   0.66  0.71  1.33
```

A bias which exists in using all players is the “pinch hit penalty”. It has been shown that most players do not hit as well coming off the bench as they do starting and playing regularly. The factors will be depressed by a disproportionate number of players at the higher level (particularly in the majors) playing sparingly. In order to account for this, I decided only to use players who had an average of more than 2.5 plate appearances per game in each level. This test is labeled “Min PA Chained,” which makes the HR and SO factors and to a lesser extent SDT more beneficial to the batter.

```
Min PA Chained
Level  SDT    DO    TR    HR    HP    BB    SO
AAA   0.91   0.94  1.02  0.80  0.84  0.82  1.18
AA    0.89   0.91  1.03  0.76  0.73  0.73  1.23
A+    0.85   0.92  1.02  0.74  0.66  0.71  1.30
```

The first two tests included many players who never advanced through every level, failing to get to the majors. If the MLEs are being used to judge how well a player will perform if and when he makes the majors, is it correct to base the factors partly on the records of players who failed to advance? In the third test, I produced a list of 368 MLB “rookies” from 2003 to 2008. My definition of a rookie season is a player who had 150 or fewer career major league plate appearances entering the season, and more than 150 during that season. Factors were calculated using only the records of these 368 players, in seasons where they had 2.5 or more PA per game at each level. These results are labeled “MLB Chained,” and show virtually the same factors at Triple-A (MLB Chained is limited to players who achieved at least 150 PA in their rookie season, while Min PA only requires 2.5 PA per game). It’s in the lower minors where larger factors favoring the batter are seen across the board.

```
MLB Chained
Level  SDT    DO     TR     HR     HP     BB     SO
AAA   0.91   0.97   1.01   0.82   0.82   0.82   1.18
AA    0.90   0.94   1.02   0.81   0.71   0.76   1.22
A+    0.87   0.99   1.00   0.88   0.66   0.76   1.24
```

In the event that there was still any bias or distortion that existed in the method of chaining the factors through multiple levels, the fourth and final test compared the minor league records at each level directly to the MLB records compiled no later than one year after the player’s rookie season, otherwise not setting any maximum elapsed time. This is labeled “MLB Direct,” which uses the same list of players and same playing time criteria as “MLB Chained.” The differences being a direct comparison vs chaining, and for the lower minors, a longer elapsed time between the records being compared, which will introduce more aging factors being built into the level factors. The Triple-A factors vary from “MLB Chained” in that the samples do not need to be within a year of each other. Again, HR and SO factors at Triple-A improve slightly for the batter, with larger gains in all categories in the lower minors.

```
MLB Direct
Level  SDT    DO     TR    HR    HP     BB     SO
AAA   0.92   0.98   1.01  0.85  0.85   0.82   1.15
AA    0.94   0.98   1.01  0.94  0.75   0.83   1.16
A+    0.93   1.03   0.99  1.08  0.72   0.83   1.14
```

At all levels, “All Chained” has the least favorable factors for batters, while “MLB Direct” is the most favorable. Using only players who reached MLB is always more favorable than using all players. There is little difference between methods for Triple-A, except in HR and SO. Going into the lower minors, the differences between chaining and direct comparison become more pronounced, as each level requires another multiplication to generate the final factors, which also then multiplies any biases that exist between each level.

```
Level   SDT    DO     TR     HR     HP     BB     SO
All Chained     AAA   0.90   0.94   1.02   0.78   0.85   0.82   1.20
Min PA Chained  AAA   0.91   0.94   1.02   0.80   0.84   0.82   1.18
MLB Chained     AAA   0.91   0.97   1.01   0.82   0.82   0.82   1.18
MLB Direct      AAA   0.92   0.98   1.01   0.85   0.85   0.82   1.15

Level    SDT    DO     TR     HR 	   HP     BB     SO
All Chained     AA    0.88   0.91   1.03   0.75   0.74   0.73   1.26
Min PA Chained  AA    0.89   0.91   1.03   0.76   0.73   0.73   1.23
MLB Chained     AA    0.90   0.94   1.02   0.81   0.71   0.76   1.22
MLB Direct      AA    0.94   0.98   1.01   0.94   0.75   0.83   1.16

Level   SDT    DO      TR      HR     HP     BB     SO
All Chained     A+   0.84   0.92    1.03    0.73   0.66   0.71   1.33
Min PA Chained  A+   0.85   0.92    1.02    0.74   0.66   0.71   1.30
MLB Chained     A+   0.87   0.99    1.00    0.88   0.66   0.76   1.24
MLB Direct      A+   0.93   1.03    0.99    1.08   0.72   0.83   1.14
```

Now that we see how the factors compare to one another, we can judge their relative accuracies? The purpose of the MLEs is to show how well a player in the minors will perform, if and when he reaches the majors. I took the list of 368 rookies from 2003-2008 to see how well each of the methods translated their statistics at each level, compared to each player’s MLB records.

Tom Tango’s Marcel system was used to generate the baseline MLB records. Marcel uses three years of data, weighted 5/4/3. I generated the Marcels one year after each player’s rookie season, giving more time for the player to collect a sufficient sample size, while not going too far into the future, when the player’s skills might be somewhat different from when he entered the majors.

I used three methods to test the accuracy:

1. Comparing the weighted means of each player’s projections and Marcel

2. Calculating the root mean square error between each player’s projections and Marcel

3. Calculating a similarity score, where the difference between each player’s projections and Marcel is expressed as a percentage of the standard deviation (t-score) of all players stats in each of the categories, and then using the Pythagorean Theorem to determine the “distance” in t-scores, in all categories together, from the projection to the observed.

```
Level  pSDT  pXBH  pHR   pBB   pSO   eSDT  eXBH  eHR   eBB   eSO  vSDT  vXBH   vHR   vBB   vSO   Sim
All Chained   AAA 0.301 0.247 0.030 0.073 0.201 0.307 0.255 0.037 0.078 0.175 0.029 0.052 0.016 0.025 0.068 1.212
MinPA Chained AAA 0.305 0.248 0.031 0.073 0.197 0.307 0.255 0.037 0.078 0.175 0.029 0.052 0.016 0.025 0.065 1.154
MLB Chained   AAA 0.304 0.255 0.032 0.073 0.197 0.307 0.255 0.037 0.078 0.175 0.029 0.053 0.017 0.025 0.065 1.158
MLB Direct    AAA 0.309 0.258 0.033 0.073 0.192 0.307 0.255 0.037 0.078 0.175 0.029 0.054 0.017 0.026 0.061 1.096

Level  pSDT  pXBH  pHR   pBB   pSO   eSDT  eXBH  eHR   eBB   eSO  vSDT  vXBH   vHR   vBB   vSO   Sim
All Chained   AA  0.295 0.242 0.028 0.067 0.214 0.308 0.255 0.037 0.078 0.177 0.034 0.053 0.017 0.024 0.072 1.287
MinPA Chained AA  0.298 0.243 0.028 0.067 0.209 0.308 0.255 0.037 0.078 0.177 0.033 0.053 0.017 0.024 0.068 1.217
MLB Chained   AA  0.299 0.250 0.030 0.069 0.207 0.308 0.255 0.037 0.078 0.177 0.033 0.053 0.018 0.024 0.066 1.184
MLB Direct    AA  0.313 0.260 0.035 0.075 0.197 0.308 0.255 0.037 0.078 0.177 0.033 0.056 0.020 0.026 0.059 1.047

Level  pSDT  pXBH  pHR   pBB   pSO   eSDT  eXBH  eHR   eBB   eSO  vSDT  vXBH   vHR   vBB   vSO   Sim
All Chained   A+  0.286 0.235 0.024 0.065 0.228 0.309 0.255 0.037 0.078 0.177 0.038 0.061 0.019 0.025 0.089 1.580
MinPA Chained A+  0.290 0.236 0.024 0.065 0.222 0.309 0.255 0.037 0.078 0.177 0.036 0.061 0.019 0.025 0.084 1.495
MLB Chained   A+  0.294 0.252 0.028 0.070 0.212 0.309 0.255 0.037 0.078 0.177 0.034 0.063 0.019 0.026 0.075 1.340
MLB Direct    A+  0.315 0.262 0.035 0.076 0.195 0.309 0.255 0.037 0.078 0.177 0.034 0.066 0.021 0.028 0.061 1.102
0.302 0.254 0.040 0.082 0.164
```

All of the methods under projected HR, BB and SO for Triple-A, with “MLB Direct” slightly high on base hits and extra base hits, while the others were a little low. At Double-A and High-A, “MLB Direct” gives much the same projections for the test group as it did at Triple-A, while the others, which employed chaining, give progressively worse projections, for the same players, the more steps removed they are from MLB. I believe this is because each multiplication of one level to another required in the chaining process also multiplies any biases found between each level.

Despite the increased passage of time inherent in direct comparison of minor to major league statistics, as compared to chaining comparisons of data in consecutive seasons, the direct comparison method consistently gives the closest estimate of future MLB performance. In addition, direct comparison produces virtually the same MLB projection despite which level of minors was used in the calculation, where as chaining produces projections which are increasingly in error the further down into the minors.

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

### Latest Articles

8/11
0
8/11
0
• ##### The Call-Up: Vaughn Grissom \$
8/11
1
You need to be logged in to comment. Login or Subscribe
BananaHammock
5/18
I like it however some more specific examples would have helped nail it through.
SaberTJ
5/18
This...
hotstatrat
5/18
Nicely written, Brian.

â€œAs a player ages through his twenties, he will on average lose speed, but gain power, strike zone judgementâ€¦ It would then make sense not to let too much time elapse between the two sets of stats being compared.â€ Would it not make more sense adjust his statistics based on appropriate aging patterns? Of course, with limited amateur resources that may have been too difficult, but arenâ€™t there some reasonable aging algorithms publicly available?

â€œThe factors will be depressed by a disproportionate number of players at the higher level (particularly in the majors) playing sparingly.â€ Nice catch. Thatâ€™s a nuance I never considered before.

â€œ. . . is it correct to base the factors partly on the records of players who failed to advance?â€ Sure it is. That makes it more useful for assessing if a player could cut it in the majors. If your MLE is accurate, it should reflect that ability.

â€œThe purpose of the MLEs is to show how well a player in the minors will perform, if and when he reaches the majors.â€ That may be so, but in plain English â€œMajor League equivalentsâ€ would be a direct translation of what that player would have done in the Majors, not what he will do when he gets there. We need a better word for this.

â€œ. . . the direct comparison method consistently gives the closest estimate of future MLB performance.â€ Given that the â€œMLB directâ€ method shows the MLE of an A+ level player to have deflated indications of power (doubles and home runs on balls in play), while AA and AAA show inflated indications of power by the same measure, does this pass the smell test?

Whether it does or not, this is an interesting exploration. Perhaps, this study shows that players of future major league ability already demonstrate in A+ whether they are capable or not, while their time in AA and AAA is a waste! Their hit rates on balls in play (SDT or BA/BiP) go down at each level on their way up, because fielding improves at each level. Their walk rates drop and strike out rates rise, because the pitchers they face are better at each higher level. How much of that is due to his improvements in hitting may be very little â€“ as little as his power rates improve going from A+ to MLB over the number of years it takes.
blcartwright
5/18
"Would it not make more sense adjust his statistics based on appropriate aging patterns?"

Yes, a finished projections system should include aging patterns, but these tests were to determine which selection criteria were best for a matched pairs comparison, which is the first of several steps before applying age corrections. In that test I was attempting to minimize other influences such as aging. Several projections that I am familiar with deliberately choose to limit their comparisons to adjacent seasons or only same seasons in order to avoid an aging bias. I wanted to find out if this is a wise approach, or if it creates more problems than it solves.
jtrichey
5/18
A lot of numbers here. I agree this would have been more interesting with at least a few examples of real life players.
hotstatrat
5/18

Wouldn't the final accuracy test with players who actually made the Major Leagues produce a bias against the first two MLE methods which include minor leaguers who did not reach the Majors?

Who were the players you sampled for these data sets? You stated whom their MLEs were compared against in the final test, but I don't see where you stated where the samples come from in the first place other than "matched pairs of batting data". I know you were limited in word count, but the data selection process is rather critical in such an analysis.
Oleoay
5/18
I agree as well about needing examples. I got to the point where I could figure out what you were analyzing if I really wanted to, but then realized I could just find the Marcel system (which has existed for longer) and use that instead...

I also wonder about the level of precision with the calculations. If you multiply a .00002 by a .00003, are you getting .00006, .0001 or .0000 (depending on whether you are calculating or rounding up/down). In this kind of article, where a difference of .0001 is significant, it'd be important to include a statement about what rounding, if any, you are using.
MHaywood1025
5/18
Brian,

I really had to take my time reading your piece, mainly because I wasn't clear on what your actual goal was until the very end--not really a big deal, because you did make it clear eventually.

I was happy to see that you included both types of comparisons, and not just the ones that worked. I beleive this kind of reporting/analysis is the most informative.

Lastly, I think it is important to include all of the players at the lower levels, and not just the handful that make it to the MLB. After all, we are trying to forecast prospects, and a team won't know which of their prospects will make the majors. If I understand your piece correctly, the numbers have been adjusted using only those that made the majors. But if we then apply these numbers to all prospects, won't it make some look better than they actually are? If I were a GM, I would rather err on the side of a prospect doing better than I projected (using deflated numbers) than spending money, time, etc. on a prospect who was not as good as I thought (because of inflated numbers).

blcartwright
5/18
The factors were calculated in four different ways. The results of each were tested with the 368 rookies in MLB from 2003 to 2008.

I didn't do any rounding other than telling Access to display 3 deciamls.

The purpose the the MLE fctors is to translate minor league stats into 'equivalent' major league stats. How can I test the accuracy of the translation on players who did not perform in both situations?

Say I have a set of ten x,y pairs, each in two different coordinate systems. I can compare the two sets of data to derive a matrix to convert from one to another. Then each of the points are run through this matrix to compare their predicted location to their observed, the residual error. Once the residuals are sufficiently small, the conversion is deemed reliable, and any point from coordinate system one can be converted to system two. The complication in MLEs is that there are more than two systems, with six levels of minors below MLB.

What is the method of modeling the talent levels of the various minor leagues that will create the smallest residual errors? I listed four different methods, and calculated the error rates for each. We need to avoid picking a method because inutitively it sounds good without firt testing it against the alternatives.
hotstatrat
5/18
"How can I test the accuracy of the translation on players who did not perform in both situations?" Not easily, I appreciate that - and I appreciate what you have done. It could not be reasonably done given our limited resources and limited preparation time. It presents a bias, nonetheless. Perhaps, you could have taken it down a chain and tested the accuracy of the ones who didn't have an adquately measurable rookie year in the majors with their performance in AAA using an AAAE dervived by the same method.
blcartwright
5/18
The method I used to test the results is a standard technique in industry.

I make my living creating digital maps from aerial photography. Surveyors go into the field and measure a small but sufficient number of points that we can see in the photography, reporting to us the point's east, north and elevation. When putting the cursor on each of these point in the photos on the screen, the software reports back the pixel location, then then compares the ground coordinates to th photo coordinates. Once the translation is establshed, I can go to any point in the photography and the software will give me a calculated ground coordinate. That coordinate translation is the basis of the entire map compilation process, and it's the same concept I used here in the results test.
Oleoay
5/18
For a visual representation, watch "The Englishman Who Went Up a Hill But Came Down a Mountain".
hotstatrat
5/19
Yeah, I'll have to watch the movie. I'm not sure I get what Brian is pairing up with his "x"s and "y"s. Are they the same player at different levels? Which players? Is he using the same ones to devise the MLE relationships as he is using to compare their accuracy? Is there a bias in that?
Oleoay
5/19
Basically, he came up with a modification of a system. To test the system's validity, he compared it to Marcel's system that has been tested more thoroughly to see how reliable/accurate his proposed changes are system is. He also uses some statistical tools (weighted means, root mean square error), etc. as ways to compare the two systems.

The theory is that if his system compares well to Marcel's system, but is more accurate, then his system is a better system.
JayhawkBill
5/18
Brian, this is an excellent article but a rigorous article. It's tough communicating the results of so much research in so short a piece. Furthermore, while I enjoyed your research and your results the most, I'm not sure that BP is aiming for articles that demand quite so much concentration and study to comprehend. (Well, articles that require so much concentration and study for ME to comprehend...)

In any case, thanks for sharing the research, and best of luck: I know your work, and I want the winner of this contest to be a writer capable of bringing to BP the sort of insight and analysis that one always finds in your articles.
wcarroll
5/18
I'm adding my judging comment to each article:

Cartwright, Brian -- 7. I think this is a BP quality piece. Most of it is way above my head and therein lies my problem with it. I just don't grasp most of it and he makes little effort to bring it down to a level where I get any sort of takeaway or even an explanation. As an example he starts off with a question about "elapsed time" - I'm not sure what it means and I don't see where he answered it in the piece. There's going to be an element of BP readers that loves this and I don't blame them, but there's going to be a lot of people that just don't get it.
blcartwright
5/18
I thought I took a whole paragraph to explain "elapsed time"

How much elapsed time should be allowed between samples?

...By recording statistics for each player in each season, we are taking a sample, over a given period of time, which estimates the player's "true talent" in each of the various categories. As a player ages through his twenties, he will on average lose speed, but gain power, strike zone judgement and contact skills. If it takes two years of statistics to get a good measure of a player, at the end of the two years he is likely not exactly the same player he was before. It would then make sense not to let too much time elapse between the two sets of stats being compared.

With such a time constraint...
hessshaun
5/20
Agreed on that front. Glad you commented like you did. Not that I don't, or could not, find this interesting, but for me, I require more of a breakdown.

First, in order to do so, that would require three more articles of this size to get me where I need to be.

Second, it appears as if Brian knows what he is talking about, but that doesn't work if I, being the moronic reader, do not "get" the numbers.

Third, I would like be able to understand what he is trying to convey. I think your challenge is going to be breaking down your knowledge through words. You are a professor teaching in middle school at this point. For me at least, I still laugh at farts.
wcarroll
5/18
i still don't get it.

and the time constraint? heck, you've got until friday to get the next piece done. and then the next week ... and the next week ...

'we do this every day' - earl weaver
TheDumbSmartGuy
5/18
FYI, "With such a time constraint..." was the continuation from the article.
blcartwright
5/18
Mitchel Lichtman said recently on his blog that in calculating his unpublished projections, he only compares players in different levels in the same season. The reasoning is that if you compare 2007 to 2008 stats, the player's true talent may have changed during that time. That approach limits you to a chaining process to compare the lower minors to MLB. If I compare two stat samples that are four years apart, the odds that the player's talent has changed (the thing I'm NOT trying to measure at this step) is higher, but it gives me an opportunity to directly compare a player's performance in Class A to his performance in MLB. Which method gives the best results?
Oleoay
5/18
Um. Since not many people who are in Class A are promoted to MLB, am I interested in what his equivalency is?

I'd be more interested in what his projected numbers would be in time... and thus, I'd also be interested in predicted changes in true talent.
blcartwright
5/18
Not directly promoted, but the betters ones are eventually. You have a player hitting 300/350/500 in Class A. How does that project on the stars/scrubs spectrum? Take all the players who played in this players league, and also in MLB. See how their performances changed. Apply the same translations to all the current players to estimate their propect status. There are other details and nuances, but that's the basics.
Oleoay
5/18
Ok, so why not instead write about how your system can be used to determine stars/scrubs instead of a comparison between your system and another? I guess I'm looking for more of an applied analysis bent.
blcartwright
5/18
It's a matter of comparng the accuracy of different approches. After he most accurate of several have been determined, the rest of the projection system can be built.
nationalcoholic
5/18
Brian, little advice: Stay out of the comments. Seriously. If you need to comment on your own work then your work wasn't written well enough in the first place.

And Will, I believe the "With such a time constraint..." hanger was the lead of the next paragraph, not a complaint about working conditions. Although I suppose it could be both.
Oleoay
5/18
I disagree with this actually, to an extent. I find it important if writers respond to questions and comments about their work. Again, this is a research/analysis site that should be open to feedback, critique, and reanalysis... not a fiction short story site where what you write has to standalone.

Though that sentiment doesn't mean a writer shouldn't be as clear and concise as possible.
beitvash
5/18
I agree. If a writer is willing to respond to question/comments and make clarifications, that's a plus. That's something I expect when a technical article is published and it's a good thing.
nationalcoholic
5/19
Response, accountability, and clarifications are nice, but at some point during this process it lapses into a defense of the work, and that's not exactly the sort of professionalism I've come to expect from BP writers. It's a fine line, but the quantity, length, and tone of the comments from Brian have crossed it. I don't begrudge him this, as it's something new internet writers must learn on their own. Brian, you've got a great baseball mind for even entertaining this sort of in-depth analysis, but you'd be doing yourself a favor if you let the work speak for itself.
hotstatrat
5/20
Geez, responses from the writers are great bonuses. I can understand not answering all comments - although, many authors do write back to people who write to them. Why the heck are you discouraging these writers from doing this over this web site? Professionalism? Call it stand-offishness. It's not to any of our benefit.
JayhawkBill
5/18
You know, the more hours that pass after my reading this, the more I understand how important this article is. I understood the first time what you'd proven, Brian, but I'm getting a better grasp with reflection of why your findings are important.

Thanks, Brian. I'm realizing that this is not just a great contest entry, but that it's also going to be one of those articles that might influence thought in the community for some months to come. It took me a while to appreciate what you'd written, but it was well worth the time.
trooney
5/19
Well thought out techniques and I liked the process of examining the bias inherent in each of the four methods. I find the debate over the end goal of MLEs to be interesting. Is the idea to find out 1)how any player from minor leagues would have done in the big leagues in that year or 2)to find out how the guys that will eventually get there can be expected to perform. I'm not sure I believe there is any mathematical way to determine #1. Very rough approximations are a possibility but ultimately there are so many factors involved that I just don't think it's possible to acheive much accuracy at all. #2 seems slightly more feasible, and this seems to be what you were getting at with the MLB Direct method. I like the fact that the translation factors were raised by this method b/c many MLEs seem to grossly undervalue the most talented players (at least when looking back in hindsight). Nice work.
tkniker
5/19
I found this to be a great article, though I'm not sure if this is the type of research that some want to see on BP.

I find it similar to what I see going on in my academic field of Operations Research. One of the key building block problems is the Travelling Salesman Problem (TSP). Simply stated if I have n cities, a distance matrix from each city to each other city, what is the shortest distance tour that visits each city only once. It's a simple problem that has numerous applications (semi-conductor fabrication, vehicle routing, etc.), but also becomes incredibly hard to solve optimally once the number of cities starts getting up into the hundreds or thousands.

It seems that a significant portion of the research (and countless academic articles) is spent on slightly improving the solution time/quality to this one single problem and some of its minor variations. It's very important research, but it can get mind-numbing reading yet another approach to improve something by a percentage point or two

I think it comes down readers voicing their opinion of seeing research that finds slightly better ways of doing what's been done before or applying techniques, research and analysis to questions that have yet to be addressed.
hotstatrat
5/19
That's a good point, Tim. For the purposes of team building, whether it be real teams or fantasy teams, one paragraph of real intelligence about how dedicated a player is towards improving, or how physically capable he is of improving, or whether an injury is going to prevent him from improving is far far more valueable than a superior projection of a few hundreths of OBA+Slg points.

Other than that, refreshing new topics unreleated to winning, but relate to our lives in some way - explainable or not - is most welcome. Although, the sheer adventure of tackling a subject with a mathematical approach can be interesting, too. We need to find the right balance.
georgeforeman03
5/19
I really liked a lot of what's going on here, but I've got a Master's degree in Statistics, and I have no idea what all of the statistics you report in the final table mean.

Also, a quick question:

To test the accuracy of the various projection methods, you use the actual accumulated MLB statistics from a group of player that, by necessity, have reached the bigs. Wouldn't this bias your results in favor of the projection methods that, similarly, only look at players that reached the big leagues?

Similarly, any system for translating minor league statistics to the big leagues will necessarily be optimized for translating minor league statistics for players that reach the big leagues. Since in practice, we will inevitably end up using such projections on players who won't reach the majors, won't this introduce unaccounted for variation into the model? Is there any way to deal with this? Is it even important to?
blcartwright
5/19
I have a BA in Geography with a minor in Math & Computer Science. I have taken several college stats courses, along with Operations Research, but admittedly it's been a while ago, compared to some of you younger guys.

To check how well the means of the predicted and expected compared, sum(e-p)/sum(n)

To check root mean square (error is error, regardless of negative or positive) sqrt(sum((e-p)^2)/sum(n))

To use Pythagoras to calculate simlarity score sqrt(sum((e1-p1)^2+(e2-p2)^2+(e3-p3)^2)/sum(n))

As I said in my last comment. the control group hit 270 in their first 2-3 seasons of MLB. Based on only High A or lower stats, MLB Direct projects them to hit 267. All Chained project 229. Even if MLB Direct has some bias, All Chained has other biases which clearly outweigh and make the results unusable. If a large group of players hits 270, and a projection says they are 229 hitters, I would say that projection is wrong.
blcartwright
5/19
Comments and some things I've learned this week -
I've been a BP subscriber for about 10 years, and this is the fourth site where I have published articles. One of the problems when writing for a new audience is knowing the type of article they expect. When I first started reading BP, it was a stats site, but apparently is not so much anymore, although there are some (many?) of us who wish it still was. I am still primarily a stats guy, and would like to write mainly stats articles for BP, but wherever I write will be learning how to tailor my message to the audience.

As someone who works with the stats virtually everyday, the numbers quoted in the article were in my language, and had lots of meaning for me. When someone says "superior projection of a few hundreths of OBA+Slg" I realize I have not made it clear in terms the readers are familiar with.

Restating the test results:
BA OB SA
270 327 427 Marcel of MLB of 368 players in test sample
267 323 419 A+ using MLB Direct
242 295 368 A+ using MLB chained
234 284 345 A+ using MinPA Chained
229 279 338 A+ using All Chained

Collectively the 368 players had a MLB line of 270/327/427 in their first 2-3 seasons. Using only batting data from High A and below, the MLB Direct approach would tells us that they project to 267/323/419, very close. A projection system that uses an All Chained method would project these same players, based on the same set of stats, to have a 229/279/338 line, which is clearly not good enough to even play in the majors, instead of MLB average, and therefor not a useful method.
Oleoay
5/20
I like applied theory and analysis, particularly new concepts or anything that excites my creativity and lets me take that idea to apply to other questions. While "building a better light bulb" is in a sense, we've seen on this site that any number or statistic is open to sample size questions, luck, statistical noise and other kinds of deviations... so I tend to look and think about stats in a more general sense unless something really profound catches my attention.
Oleoay
5/20
In direct response to the 368 players and how they compare between MLB Direct and All Chained... even if the MLB Direct would project that group as major leaguers, they aren't people I'd go crazy about... pretty much the definition of replacement level. So does it matter in the long run if MLB Direct overprojects a few bench players?

Now, if you came up with a similar system that could separate the "star" prospect wheat from the chaff given different minor league environments, that I'd find more interesting. If you could project the kinds of prospects that do well in A ball but tend to flame out at Double AA, that would also be interesting (and useful to a major league team)... or the reverse, a player profile that tends to do horribly in A ball but tend to mature/"fill out" into star players, that'd also be interesting.

In other words, I don't care as much about the average A baller since I won't see most of them at Coors Field or read about them on ESPN.. but I would be interested in the "news" names from the draft that flame out, or the "sleepers".
blcartwright
5/20
Richard, these are 368 guys who have played at A ball sometime in the past 11 seasons (my minor league db covers 1998-2008) and have since made the majors. What about the guy in A ball in 2009? I can judge his chances by comparing him to the players who have come before him. This article was a technical look at only one part, although critical, of developing projections. Some of the comments and questions have ventured outside that into projections in general. Once we determine the best method of analyzing those previous players, we can apply the translation to the current ones. Next, look for similar players in the past. How many of those became stars/scrubs/never made the majors? Why did they? That fills out the projection, such as PECOTA does, but was not intended to be the subject here.

I realize there is a fine line on responding to comments. I don't like to go back and forth too many times on the same point. I'll clarify, but would like to move on to another question. Sometimes the reader doesn't get it - I'll look to see if it's my fault for not explaining it well enough.
Oleoay
5/20
Ok, I can understand that. Here's the thing though... would I rather read an article on the development/refinement of a an analytical system tool, or read an article about the practical application of that tool once the system has been refined? As I indicated, I like applied theory and analysis and what it means in the real world of baseball. Nothing against an article about the work you've done on your tool, but I'm more looking forward to seeing the tool in action.

Just my personal taste. I didn't say I disliked the article, it just didn't do much for me. Others appreciated it more than I did and perhaps they're right. I can admit it when I'm wrong :)
dpowell
5/20
I'm guessing this is not an original point, but the big problem with MLEs seems to be the interpretation. The desired (in my opinion) interpretation is, "What can we expect a player in Double A to produce if promoted to the majors given his stats?" But, instead, we get the expectation of Double A players _that were promoted_. This is a selected sample comprised of players who, on average, are more ready for the next level given their stats. If you randomly promoted a player, it's unlikely you'd get the same stats. The difference is basically whether you want MLEs to be descriptive of what happens or predictive of what would happen in other situations.

I bring this up because the different methods described above are different ways to select the sample. You're not just changing the "elapsed time" or accounting for the "pinch hit penalty." You're also changing the bias due to selection. By selecting based on plate appearance per game, you're dropping players that started in Triple A but were only good enough to pinch hit in the majors. But you're keeping players that started in Triple A but were good enough to play regularly in the majors. Your sample ends up only including the successful players. Similar points can be made for the other permutations.

This isn't a fatal flaw, but I think the differences in the results need to also be discussed in this light. Thoughts?

blcartwright
5/20
The general point is that intuition may tell you that a particular method might be better because it avoids such and such bias, but you should model several different possible methods and then test them empirically to see which is actually more accurate.

A control group has to consist of people or objects that exist in both circumstances. In this case, there are more than two circumstances, so you have a choice between chaing and direct comparison. The tests show that chaining multiplies biases.

If there are two players who hit the same at a given level, and one got promoted and the other didn't, you can loom at other things like speed, defense and defensive position.
dpowell
5/20
I think I don't understand who's in the sample to get the MLEs and then who's in the sample to test the accuracy of the predictions. It's not surprising that when you limit the sample to players that make the majors that the predictions for major league players is better. You've eliminated the players that didn't get past Double A and Triple A and only kept the exact same players you're using to test the different methods (or, at least, a sample that is very similar). Do you include the "pinch hit" players in the final sample to test the methods?

Take your method to a different (admittedly, strange) degree. Say I take the Triple A stats and major league stats for everyone who makes an All-Star team and calculate conversion rates. Then I test the accuracy of this prediction by looking at the sample of All-Star players. You'll inevitably get a strong prediction, but it doesn't make it useful. I'm interested in figuring out who - given their Triple A stats - is going to make the All-Stars. Instead, I have the conversion rate of the players who make All-Star teams. That's probably not representative for the average Triple A player.

In your context, I want to know who should play in the majors given their minor league stats. Not the conversion rates of those who did play in the majors. This isn't your cross to bear, obviously, but it should be addressed when altering the method to get MLEs. A better correlation between predicted stats and actual stats doesn't mean it's more useful if it loses some of its "representative-ness."
blcartwright
5/20
MLB Chained and MLB Direct used the same sets of players, the same stats. If there are biases in the selection of players, it is cancelled out. The difference in these two methods is whether A+ is being compared to MLB, or A+ to AA to AAA to MLB. For the same players, chaining gives a result that has more error, an error that says the test players (as a group) are not good enough to even play in MLB, when in fact they performed at MLB average. If there's a bias that creates a result that's 95% of true value, by chaining the result is 90% in AA, 85% in A+, 81% in A, 74% in A-...it just keeps getting worse the more steps away from MLB.
dpowell
5/21
1) Right, I was focusing more on the other 2 methods. The larger point is that a higher correlation does not mean a method is better. My intuition would be that the Direct method would produce higher conversion ratios, but the fact that this correlates better with the entire sample is probably just chance. The Direct method, mechanically, predicts better for those that were promoted all the way to the majors. The Chain method, mechanically, predicts better for those that were promoted one class at a time. The fact that one predicts the entire sample better than the other is a result of chance and sample composition. Basically, the way you're checking the validity of your methods is pretty mechanical so don't draw any conclusions about which method is better based on it.

2) Just to help explain how you practically did this - why would the MLB Chained and MLB Direct methods produce different results for AAA? It seems like those should be exactly the same.

3) Here's the broader point: You start out saying, "Major League Equivalencies (MLEs) are a set of formulas that will translate a player's minor league statistics into those that he would be expected to produce if he was in the major leagues." I agree with this - we want to know the expected production of a randomly promoted AAA hitter given his stats. This is different from what you later say, "If the MLEs are being used to judge how well a player will perform if and when he makes the majors, is it correct to base the factors partly on the records of players who failed to advance?" The promoted players are not random, biasing the results. All MLEs (that I know of) have this problem, but you exacerbate it by selecting the sample based on the best-of-the-best, the ones that eventually make the majors (or, worse, the ones that get > 2.5 PA/G). Yes, the eventual translations are better predictors for that group, but that would be true no matter how you sliced the sample - the average for a group is a better predictor for that group than some other average. But is it useful outside of that group in some way? Improvements in MLEs need to answer that question.
blcartwright
5/21
1) I picked methods 3 and 4 in my comparison because of their similarity - (almost) all of the difference was in chaining or not. I don't understand what you mean (or what your evidence is) that chaining predicts better "for those that were promoted one class at a time".

2) Slight idfferences between MLB Chained and MLB Direct for AAA to MLB are because Direct did not have a requirement that that two seasons be within a year of each other.

3) When I said "is it correct to base the factors partly on the records of players who failed to advance?" I was proposing an alternate scenario, predicting an argument that could be used to back another method. I was simply trying to find several different methods to test.

There will always be some biases. When you get rid of one, you create another. By empirically testing the results I was trying to find the method that was least effected. This was not a case of trying to prove my method was right. Last year when I was designing my projections, I spent a lot of time looking for a method I felt was accurate enough, more so than the other projections available. This article describes my process of discovery.

Good questions.
dpowell
5/21
1) I just meant that I'm sure your chaining method is a better predictor for players who were moved up level-by-level (since the method does something similar). The Direct Method is probably really great for those that were directly promoted to the majors since you're just checking the average against itself.

2) Thanks.

3) To summarize my position - I don't think your test tells us anything about biases since it's self-reinforcing. Given that, we have to evaluate the methods on their own. I'm sure there will always be some bias at least, but we should try to move towards less bias. Your selection method seems to make it much, much worse. Now, for a projection system, this isn't necessarily bad. You don't want translation ratios for random players - you want ones for promoted players because you want something that's descriptive of what does happen. If you were a GM and you wanted to start promoting people based primarily on stats, then you want the ratios for a randomly-promoted player (your first sentence). I would think this would be an interesting avenue for research - finding random (non-performance) shocks that cause players to get promoted (injuries at higher levels?).
rbross
5/21
this is brilliant, Brian. And useful. You'd get my vote if we were voting on these entries. Well done.
Oleoay
5/21
Um, in American Idol, you vote for the people who you want removed from the contest... so if you like him, that means you would not give him a vote...
Oleoay
5/21
Erps never mind, I was thinking Survivor.

So I guess we each get to vote for one article, and whichever article has the fewest votes goes away?
rbross
5/21
I hope that's the way it is. I don't like the idea of picking the worst article. As much as the fundamental result will be the same, that just seems too cruel.