Will Carroll‘s experiment with using comments as something more than just comments was a rousing success, and I’d like to try a version of my own. I don’t do a lot of quantitative work-it’s not that I’m incapable of it, but I find that I often end up with a lot of information, and a difficulty in separating the signal from the noise. So I have a lot of data here, but no real conclusions. What I’m going to do here is just throw it all on the table, and then we can discuss it together in the comment thread and see what will come of it.
This all began with an e-mail from a subscriber named Steve:
The Indians signed RHP Chen-Chang Lee:
He is listed at 5’7″. I remember the bromide, ‘teams have a bias against short right-handed pitchers,’ something the Astros tried to leverage over the years. But ‘short’ meant 5’11” or 6’0″ (Roy Oswalt), not 5’7″ (David Eckstein). Who was the last effective major league pitcher to be that short? I’m not saying this is a silly signing or Lee can’t be successful (he was very good in the Olympics and can apparently throw 94 mph), but I’m honestly curious what other truly short pitchers there have been.
This prompted me to start thinking about pitcher size. I write a lot about ideal pitcher frames, and I complain about short pitchers as much as the next guy, but does it really matter? I’m not convinced of it, but here’s some work I’ve done, without coming to any real conclusions, that I hope will spur some discussion below.
The first thing I did was to call the Indians in order to get more information on Lee, and I was able to reach a team official who informed me that Lee is actually five-foot-eleven. “Look, I’ve stood next to the guy, and he’s taller than me,” said the official, who added, “I don’t think we’d be giving that kind of money to a pitcher who was five-foot-seven.” In addition, Lee is different from most pitchers in the sense that he’s a side-armer. A converted shortstop who did not begin pitching until he was 14 years old, Lee’s fastball sits at 88-92 mph coming from that lower angle, and he also features a solid slider. He’ll likely begin next year in the rotation at High-A Kinston in order to get him innings.
So, let’s shift our focus and switch to the major league starters. Thanks to some great work from out data guru, William Burke, we can examine some pitcher profiles and see what they look like. First we have the performances by height of every starting pitcher this year:
Height GS IP ERA 5-10 78 503.0 4.51 5-11 162 998.0 4.47 6-0 444 2653.2 4.16 6-1 673 4276.0 4.23 6-2 888 5677.2 4.41 6-3 784 5069.1 4.32 6-4 578 3620.1 4.83 6-5 454 2724.0 4.69 6-6 176 1265.0 3.99 6-7 169 1085.2 3.93 6-8 16 134.1 4.49 6-9 51 319.2 5.41 6-10 43 251.1 4.23
The average height for a starting pitcher is 6’2¾”, and the mid-point of the values lies just barely over 6’2″; the taller pitchers throw off the average significantly. There are 279 starts made by players five or more inches over the 6’2″ average, but none five inches or more under it. While they’re not the largest groups, note that the only heights with ERAs under four are the cadres comprising those who are 6’4″ and 6’5″-sizes generally associated with the classic power-pitcher’s build.
Build connotes both height and weight, though, so I took this one step further. Using the top 40 starting pitchers this year as measured by VORP, I calculated the BMI (body mass index) for each player. Now BMI is a pretty silly system when you check out how it’s used; by this measurement, Dan Haren and Jamie Moyer are overweight, while Matt Cain is obese. But if we can ignore the labels, it does give us a good sense of the player’s bulkiness.
The average weight of a starting pitcher this year is 213.05 pounds. Combined with our average height, that gives us a BMI of 26.8. From there I developed a matrix using standard deviations from these average heights and BMIs, with an 0.5 standard deviation from the average considered normal, 0.5-1.5 from average significant, and more than 1.5 extreme. Thus, we have:
-1.5 -0.5 +0.5 +1.5 Height/Weight Skinny Thin Normal Beefy Fat Total +1.5 Skyscraper 0 0 3 1 1 5 +0.5 Tall 0 1 2 1 0 4 Normal 0 6 5 5 1 17 -0.5 Short 1 4 5 2 1 13 -1.5 Diminutive 1 0 0 0 0 1 ------------------------------------------------------ Total 2 11 15 9 3 40
The next thing I did was to play Olympic diving judge and get rid of the highs and lows-so anyone in the extreme categories is out. Before we do that however, let’s quickly pay homage to the two opposite ends of the spectrum. There is only one Skyscraper/Fat pitcher (CC Sabathia), and only one Diminutive/Skinny pitcher (Tim Lincecum); both are among the best in the game. I’m not sure that tells us anything other than that there are no absolutes. So focusing on the remaining nine categories, we end up with the following.
Tall/Thin: Dan Haren
Tall/Normal: Scott Baker, James Shields
Tall/Beefy: Carlos Zambrano
Normal/Thin: Zack Greinke, Cole Hamels, Cliff Lee, Jon Lester, Mike Mussina, Ervin Santana
Normal/Normal: Aaron Cook, Ryan Dempster, Justin Duchscherer, Kyle Lohse, Joe Saunders
Normal/Beefy: Felix Hernandez, Paul Maholm, Ricky Nolasco, Brandon Webb, Todd Wellemeyer
Short/Thin: Shaun Marcum, Jaime Moyer, Roy Oswalt, Jake Peavy
Short/Normal: John Danks, Jeremy Guthrie, Scott Kazmir, Daisuke Matsuzaka, Edinson Volquez
Short/Beefy: Johan Santana, Ben Sheets
Looking at these lists, and combining them mentally by height and BMI, and you start to see some trends here. Which group is the best? Which group would you think is more likely to give you 225 innings? Which group has the best health record? There are some interesting answers here. Normal/Thin is the most impressive list overall, but looking at the beefy list gives me far more confidence in regards to durability.
Now back to the original e-mail, which asks about five-foot-seven pitchers (even though we now know that Lee is merely short, as opposed to off-the-charts small). With no pitcher under five-foot-ten starting a game this year, William Burke compiled the top pitchers’ seasons by those under that height in the modern-modern era (since 1969), and the list is dominated by two names:
Year Height GS IP ERA VORP Pitcher 1977 68 34 221.1 3.38 38.3 Fred Norman 1997 69 25 182.2 3.74 36.4 Tom Gordon 2004 69 0 89.2 2.21 36.3 Tom Gordon 1993 69 14 155.2 3.58 34.0 Tom Gordon 1974 68 26 186.1 3.09 33.3 Fred Norman 1979 68 31 195.1 3.64 33.2 Fred Norman 1973 68 35 240.1 3.60 32.1 Fred Norman 1973 69 4 91.0 1.68 29.1 Fred Beene 1994 69 24 155.1 4.35 29.0 Tom Gordon 1998 69 0 79.1 2.72 28.3 Tom Gordon 1983 69 0 87.1 2.47 25.7 Salome Barojas 2005 69 0 80.2 2.57 25.3 Tom Gordon 1976 68 24 180.1 3.09 24.3 Fred Norman 1977 69 10 116.2 2.70 24.1 Pablo Torrealba 1989 69 16 163.0 3.64 23.0 Tom Gordon 1975 68 26 188.0 3.69 22.0 Fred Norman 1987 69 22 158.2 4.37 21.8 Guy Hoffman 1973 69 0 89.2 2.41 21.6 Ramon Hernandez 1992 69 17 100.0 3.06 21.1 Brian Barnes 1969 68 33 202.0 3.52 20.9 Tom Phoebus
So where do we go from here with this data, or indeed, is there anywhere to go? Let’s begin a discussion in the comments section about where we are so far.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
I truly believe that baseball is a Darwinian process. If a pticher has the talent, then he will find somewhere to pitch regardless of his size. And if he puts up the numbers, he will get a shot at the next level.
Your data shows the result of a filtering process called High School, College, Indy Leagues and Minor Leagues. Scouts pick from the pool of available talent. And most of the talent is tall.
Also, compared to basketball, swimming, and um, jockeying(?), size has much less of an impact on the success or failure of an individual in baseball. Therefore, other factors dominate height in the Darwinian process.
You need two groups, the first group would have the top 40 (or so) pitchers by VORP (or other metric) and the second group would have everybody else. Then take the heights of the players and compares the means. It should give you a pretty clear answer.
I\'ll see if I can\'t dig something up because I need practice doing this anyway.
One thing to keep in mind when doing any analysis on this topic is that the population size of 6\'2\" people is considerably larger than 6\'6\" pitchers, more so than what looking at height data of mlb pitchers will reflect. Because of this comparisons get hard. MLB \"Skyscrapers\" may represent the top .00001 percentile of all \"Skyscrapers\" in the world, while MLB \"Normals\" may represent the top .0000000000001 percent of \"Normals\" in the world. Comparing the average of one groups \"very good\" to another groups \"elite\" is not going to be very informative.
Interesting findings, but two points to consider.
1) As others have already commented, much of the natural selection takes place at the amateur level - just like with our lack of left-handed catchers
2) Your first height/production chart does not account small sample size issues - I think it would be more instructive to only look at starting pitchers who at least pitched 50 innings as starters. It might also be instructive to view the same issue with releivers and then to compare the charts - I would make the minimum cut-off around 30 appearances. Why should Danny Ray Herrera skew the numbers all by his lonesome?
Thanks
Numbers in chart are AVG IP over last 3 seasons
Height Thin Normal Beefy
Tall 218 189 206
Normal 193 184 183
Short 185 178 187
This fits with the conventional wisdom, but it might be better to rerun with VORP instead of IP to see how that plays out.
That means ALL Major League pitchers are above average height, even the \"diminutive\" Lincecum. That seems extreme enough to be more than just bias.
However, how accurate are players\' reported heights? Certified doctors and nurses have measured my height with an error of +/- THREE inches. If there is a bias for taller pitchers, wouldn\'t most teams tend to err on the plus side and throw off the data?
I do think that the questions you asked like \"Which group would you think is more likely to give you 225 innings? Which group has the best health record?\" are good questions if you were to consider the role of pitchers as an NFL coach would want to analyze the role of the WR. Some obviously make better slot reveivers and others would be better for blocking purposes while others would be useful for diversions.
In baseball, at least in today\'s game, you have pitching roles. So maybe this is one way to set up what location in the rotation you should place a pitcher. A frame capable of higher workloads would go towards the front of the rotation while other pitchers should be relievers instead of starters. However, there are probably many who would mess with the roles because of their own better mechanics. There are exceptions that dismiss general rules all the time.
But what we may be able to develop from this sort of data with the right questions is probabilities in efficeincy. I would define efficiency in this context as the maximum output for the longest period of time. At some point/pitch count or inning count on the season certain body types may prove to become less efficient in certain roles. But if a pitcher, because of his body type, were in another role perhaps his effectiveness/efficiency can be lengthened for the full season rather than see it decrease when the penant race is in full swing, namely, when it is most important to have effectiveness. I also believe this would help evaluate if certain pitchers are worth more than other pitchers of equal body type. Financially you want the most for your money and if you get above average efficiency then you get more for your money.
First the research would need to be done on a broad scale (beyond top 40 VORP) to see what is the average pitch number or inning number at which each body type begins to decline. From there you should be able to determine roles or what pitchers would be better compared to the same body type as far as financially more sound. I think you\'d also need to consider some age analysis too. Same age and body type comparisons would help. Body type might vary as to when the pitcher declines in age. Perhaps the tall-thin/normal crowd remains efficient longer in life than other body types?
Does this make sense? There are a lot of ways this can go to help a baseball team, whether in the front office, or managing in game scenarios over the course of a season.
Looking at 3 year averages, only for pitchers with at least 20 starts:
Best VORP by subgroup is Short/Beefy (Santana seems to dominate the 3-year averages). Best VORP by group is Short, with Beefy and Thin groups also being close.
Most Average IP by subgroup is Tall/Thin, but it\'s one data point. Tall/Beefy is also high on the list. But what\'s surprising is that Most Average IP by group is Thin, with Beefy and Tall groups also being close.
Best ERA by subgroup is also Short/Beefy. Best ERA by group is Short.
In general, the biggest outlier is the Normal weight. Everything else looks like random noise punctuated by certain data points (Haren and Santana, once again on opposite ends of the spectrum).
If you break it down to just this year, where the data is more complete because the group of pitchers was sorted by this year:
Best VORP is still Short/Beefy, but best group is Thin or Short.
Best ERA is also Short/Beefy, but best group is Short or Thin.
The lack of data points make the data skew quite a bit due to Zambrano hurting the Tall/Beefy categories and Haren and Santana helping the Tall/Thin and Short/Beefy categories. But it still falls out that Normal seems to have the worst performance in general.
1) Listed height/weight for professional athletes is notoriously inaccurate. You say that no pitcher listed at 5-10 has made a start this year. Well, I\'ve stood next to Radhames Liz and he\'s about as 5-10 as it gets. Unfortunately, the O\'s list him at 6-0 or 6-1 or something.
2) As others have mentioned, there is a selection bias to the data. Tim Lincecum and CC Sabathia are at the extremes and they both are in the data pool because of their exceptional talent. Their talent led them to be put on the mound as children, they would\'ve had extra chances because they had large signing bonuses (not that they needed them), etc. How many other 5-11 or 300 lb guys can we say that about?
So what do I suggest?
Well, there\'s nothing we can do about these faults in the data points, but we can try to get some conclusions from them anyway. For instance, we could see HOW a 300 lb man or a skinny 5-11 kid succeeds. Certainly, they utilize different arsenals, arm speeds, release points, and other variables. to be among the most effective pitchers in the world. If CC tried to pitch like Tiny Tim, or vice versa, I\'m sure their effectiveness would be compromised.
I guess I am saying that the issue isn\'t what an ideal pitcher\'s frame is; it\'s more like how pitchers with certain frames tend to find more success. If there were an easy way to quantify the variables I listed above, it actually wouldn\'t be that hard to throw together.
Perhaps the college ranks are a good place to look ... as their objective is \"winning\" rather than \"development for MLB\" ...
I\'ve seen many good pitchers who are short ... such as Lincecum, obviously ... the other example who comes to mind is former Oregon St closer Kevin Gunderson ... http://itmightbedangerous.blogspot.com/2008/09/organizational-consistency.html ... listed as 5\'10\" ... hmmm, I\'ll bet he\'s not 5\'10\" :-)
How can it be that guys like him don\'t make MLB because of \"bias\" ... as opposed to \"results\"? He\'s getting enough of an opportunity to make it or not based on results, imo.
And I guess that what I\'m saying is that even D1 college programs don\'t have the luxory of making the choice you are proposing.
I think the biggest impact of the \"bias\" is that results in mis-reporting actual/true player heights ...
If there is a selection bias, where amateur managers/scouts/general managers/etc... tend to look for pitchers with a specific body type, the players that make up that body type will not necessarily represent the best available talent, because they have a different selection criteria applied. But a pitcher who makes the major leagues with an extreme body type (Sabathia/Lincecum even Santana/Haren) will have to have a talent level that overcomes the selection bias. So rather than seeing a normal distribution of talent around a normal distribution of body types, you will see an inverse distribution of talent. Which is essentially what you see (albeit with the small sample size/rough analysis caveat).
My suggestion to test it with a real sample size is to use PECOTA. Take all of the pitchers in the pool and give them a height/weight in the middle of each of the subgroups and re-run PECOTA. Since PECOTA takes these factors into account, you will see an increase or decrease in projected value IF the data is more than just noise. I will bet that you will find that putting more normal body types into the projection will cause a very conservative projection, whereas each of the extremes will cause some increase in the projection.
The analysis should really be looking at domestic players drafted (and a controlled subset of foreign-born undrafted players to avoid the Nomos and Dice-Ks of the world) to see whether height impacted whether they \'made it\'. If in the first four rounds of the draft 70% of pitchers over 6\'3\" make it, but that number drops to 40% for pitchers over 5\'11\", that would be telling data, particularly as the pitchers taken in similar rounds will carry similar expectations.
It would also be useful if we had reliable measurements, but you work with what you get.
I have to disagree. The qualifier of *in the first four rounds* means you have just introduced a new variable. One might hypothesize that shorter pitchers have trouble getting drafted early, due to a lack of projection. That could certainly impact the results beyond a simple \"Tall pitchers succeed/fail more often\" type of conclusion based on that data set. All it would tell us is that tall/short pitchers drafted early succeed/fail at a higher rate. That could just end up telling us, for instance, that clubs are bad at identifying which short pitchers to draft early.
Again, I think answering a question on ideal body type is pretty impossible given the data to work with and the amount of noise you\'re going to get with all the variables everyone has come up with.
The better approach is to quantify pitch velo/movment with pitchfx, find a simple way to quantify release point (mlb.tv pixels?), and arm speed (you could probably use max velocity as a proxy for now) and run some multivariate testing within each body type group.
You wouldn\'t find an ideal body type, but you might find some pitching styles that are more likely to be successful within each group. For instance, you might find that over-the-top deliveries are more successful as player height increases (just speculating).
The more I think about this, the more this kind of study seems feasible.
As many have mentioned, the \"shorter\" pitchers at the ML level have been selected out b/c they are special. Therefore they don\'t necessarily serve as the best data points. The real question is whether the selection bias at the lower levels is appropriate-- a tough question to answer quantitatively.
I know he is only in the Midwest League but his stats were pretty good...
His name is Tim Collins
http://web.minorleaguebaseball.com/milb/stats/stats.jsp?sid=milb&t=p_pbp&pid=525768
I\'d be interested to know the percentage of \"short\" lefties vs. righties who have been successful. Granted there are lots of selection biases based on population (short people who are left handed are more unique than other demographics). My hunch is that most of the successful, short pitchers are lefties.
The most relevent questions this type of investigation could answer that I can think of off hand are:
1. Do teams over- or under-draft pitchers of a certain build?
2. Do pitchers of a particular build take longer to reach their peak . . .
3. . . . develop further . . .
4. . . . more durable . . .
5. . . . have longer lasting careers . . .
6. Taking 1976reds question a step further, is there a difference at any height between the development of lefties vs. righties? There is a myth begging for verification.
The questions about fooling the hitters and pitch movement really boil down to success, which is ultimately all that matters.
The first question above regarding under/over-drafting would obviously be looked at from the point of view of the draft as suggested by tballgame. Some fair way to fudge the pitchers drafted lower due to unaffordable expected salary demands would be helpful. A control would have to be in place to make sure high school pitchers are not shorter or taller than pitchers drafted out of college. And, yes, as mikehollman points out, excluding lowest draftees might have some bias as well. However, I think some cut off could be allowed for those drafted so low that there is very little chance of their reaching the majors.
Questions 2-5 above are somewhat related. As drmboat suggests PECOTA could be a useful tool. As PECOTA is based on historical data, I don\'t understand Corkedbat\'s objection.
I generally agree with the comments warning about sample size, but I think some generalities might be found if a broad enough perspective is taken.