Spitballing: Checks and Balances

June 23, 2011

Background: You’ve got to admit they’re getting better

“When the 100-meter freestyle is held today in high school girls’ regional swimming meets, it is generally won by a girl who swims the distance in just under 60 seconds. That time would have won the men’s Olympic competition in 1920, or any year before it.”—Baseball Between The Numbers

Common sense should tell us that baseball players are better now than they have ever been before. Other sports, in which achievements are standardized and enumerated, permit little debate. In baseball, there is no question that players are bigger, better able to recover from injuries, have greater incentives to succeed, and come from a more diverse and substantial pool, than in years past. But given that in every season only one team wins the World Series, the average team totals 81 wins over 162 games, and hitters score exactly as many runs as pitchers allow, it’s difficult to put a number on changes in player talent.

The trick is in the control. It wouldn’t be too hard to evaluate players over time if their competition and environment remained constant. There is one skill among one group of players that has likely held constant over time—pitcher hitting. Because pitchers are selected independent of their hitting ability, any variation in performance in pitchers as hitters on a league-wide basis can be attributed to pitcher quality. Therefore, the league’s quality of pitching can be, and has been, measured by using pitchers as hitters as the control. (I can imagine a time when pitchers’ hitting abilities will be valued more highly, which would cause their hitting averages to go up even as they continued to get better at pitching.)

In Baseball Between the Numbers, Nate Silver introduced a method of quantifying league difficulty over time by using any hitter who played in two consecutive years as his control. By seeing how those players performed in consecutive years, variation in performance could be accounted for by the competition. Mitchel Litchtman similarly studied the drop in run-scoring in 2010 by selecting a control group of average major-league players and seeing how they performed from one year to the next.

My methodology will be most similar to that of Tom Tango, who searched for the reason behind changes in home run rates over time by controlling for hitter, pitcher, and park. My question is: How much better have hitters and pitchers gotten during the Retrosheet Era? To answer, I will focus on the Three True Outcomes, and although I probably should, I will not control for things like park or league and will instead consider all other changes to be “environmental” effects.

How to interpret this chart?

Basically, I compared how players who faced each other in consecutive years fared in those confrontations and in their matchups against everyone else. If they did better against the same players, that means their environment was different.* If they did worse against everyone else than they did against the same players, that means everyone else got better. That’s about it. Feel free to skip to the results now if you want.

*This entire analysis hinges upon this assumption. If, say, over a period of time, batters began to get better than pitchers by aging in an uncharacteristic way, that era might throw things off.

Details: In which I take the harmonic mean of harmonic means and in so doing lose sensation in my face.

I established two bins. Bin one was my control group. It contained the number of plate appearances, homers, walks, and strikeouts for year n and year n+1 for all batters and pitchers who faced each other in consecutive years. Bin two, my experimental group, contained the number of plate appearances, homers, walks, and strikeouts for year n and year n+1 for all batters and pitchers against everyone else.

Within each bin, I paired up year n and year n+1’s HR rates, BB rates, and K rates. I found the average difference contained in each pair for each player in each bin, weighing by the harmonic mean of plate appearances in both years.

I then averaged all of those averages over every pair of year n and year n+1, this time weighing by the harmonic mean of the two groups’ harmonic means. It was also at this point that I realized I would not be able to clearly explain what I was doing.

Results: Environment

The following graph shows the difference in each rate stat from the previous year among players who faced each other in both years. For example, in 2009, our control homered in 3 percent of PAs, while in 2010, that number fell to 2.75 percent. The difference is -0.25 percent. I have also plotted the lines of best fit.

It appears that over time, the difficulty of recording a strikeout has increased, and it’s become a bit easier to draw a base on balls. I assume this is a function of the strike zone’s ever-shrinking nature.

We can point to specific changes in MLB, such as rule changes or new ballparks, to see how much they changed our control group. This also serves as a sanity check, as we already know that the lower mound in 1969 raised run-scoring. But how does that bear out in the numbers?

1968 was the original Year of the Pitcher. In 1969, the mound was lowered. This resulted in the second-highest increase in homers, highest increase in walks, and biggest decrease in strikeouts in the control group, which dates back to 1954.

The biggest jump in home runs in the control group came in 1977, the year Rawlings succeeded Spalding as the official supplier of major-league baseballs, per Wikipedia.

In 1963, when the top of the strike zone was changed from “between the batter’s armpits and the top of his knees” to “the top of the batter’s shoulders and his knees” (per baseball almanac), home run rates and walk rates fell .8 percent and .24 percent, respectively, as strikeout rates rose .34 percent. In 1969, the zone was changed back to the armpits, which served as another reason for the bump in offensive output that year, in addition to the mound being raised.

The opening of Coors Field in 1993 coincided with a 0.43 percent increase in home runs per plate appearance in our control group.

Results: Pitchers

Pitchers are continually getting better at striking batters out. Every single year.

We already know pitchers are throwing harder than they did just a couple years ago. The number of Pinedas and Strasburgs entering the league is outweighing the amount of Mike Mussinas and Andy Pettittes departing. Furthermore, new pitches have been introduced, such as the splitter in the 1980s, making pitchers more difficult to hit.

One notable year was 1984, when the experimental pitcher group added 2 percent to its strikeout rate. Most of that can be attributed to rookies recording a strikeout rate 2 percent higher than non-rookie pitchers. It’s a rarity for rookies to perform as well as veterans in anything, but 1984 was the year of possibly the greatest rookie pitcher ever, Doc Gooden, who entered the league as baseball’s best pitcher.

Results: Hitters

My interpretation of this chart is that home runs are the driving force. Batters are exchanging more homers for more strikeouts and fewer walks. Just as pitchers are throwing faster, I assume batters are swinging harder.

Conclusion

I believe the data show three trends. The league has discouraged strikeouts over time, presumably through a smaller and smaller strike zone. Pitchers have more than counteracted that effect, and I think that’s because they can throw harder than they used to. Batters are also better at hitting home runs than they used to be, which I believe to be a result of increased strength, similar to that of pitchers.

The cause of fluctuations in MLB run-scoring is important. Consider: a change in MLB pitching quality one year does not necessitate a change in replacement level pitching quality that year. Were a drop in run-scoring the result of environmental differences or inferior batters, that would mean nothing in evaluating pitchers. However, if all pitchers in MLB improved as replacement level remained constant (a possibility, albeit a reach), that would impact their value. We like to say that a drop in run-scoring is the cause of better pitching or worse hitting, but without performing this type of analysis, it’s very difficult to know for sure.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Jeremy Greenhouse

Latest Articles

You need to be logged in to comment. Login or Subscribe

markpadden

6/23

So, essentially, your "experimental" group is comprised of baseball's youngest players, correct? And these youngest pitchers have been improving their strikeout rates at an average of ~1.5% per year -- so ~15% aggregate over the past 10 years? Just making sure I am interpreting correctly.

Meanwhile, the veterans have been striking out fewer and walking more batters. So the conclusion is that rookie pitchers over the past couple of years have driven the overall decline in run scoring?

Reply to markpadden

jgreenhouse

6/23

Every player is included in the experimental group. It consists of players who did not fit in the control. So, some pitchers faced Derek Jeter two years in a row and he is part of the control group, while other pitchers are facing him for the first time, and for them, he is part of the experimental group.

Reply to jgreenhouse

markpadden

6/24

Thanks. But wouldn't the experimental group (matchups between guys who did not see each other last year) be over-represented by young players? It seems like young and/or injured-part-of-last-season players would be selected for abnormally in the exp. bin, as they would often have far fewer matchups last season vs. this season. I.e., they would generate a disproportionate pct. of the "new" matchups this season by virtue of their limited playing time last season.

I'm not sure I am interpreting your methodology correctly, so the above comments could be off. But maybe you could write a follow-on article to break down your exact process and also what each graph represents in more detail.

Reply to markpadden

jgreenhouse

6/24

Exactly, it will be represented by many old players in year 1 and many young players in year 2. They are represented proportionate to their plate appearances.

Reply to jgreenhouse

markpadden

6/25

The problem I have is that you are measuring the diff. between how given players perform vs. past opponents and comparing it to how they perform vs. brand new opponents, during a given season. From that comparison, you are concluding that strikeout skill is going up for pitchers each and every year.

For example, looking at the year 2 comparison of control vs. experimental for pitchers... I would argue that the experimental bin for these pitchers will contain an abnormal pct. of rookie batters or other young batters whose playing time has increased from year 1 to year 2. A batter with increasing playing time from year 1 to year 2 will be more likely to have his plate appearances qualify for any given pitcher's experimental bin. Younger batters tend to strike out more. The net result is that the experimental bin for a pitcher in year 2 will contain more plate appearances vs. high-K batters than the control bin will for that same pitcher. So the effect of pitchers' improving K skill every year may just be a measurement of the bias between the control and exp. groups.

A similar and additive bias should also be present in the year 1 control vs. exp. comparision. The exp. group will have an abnormal number of players with declining playing time (older players), which will depress observed K skills for pitchers (older batters are harder to strike out). Since you are comparing year 2 to year 1 to assess pitcher skill change, this will further inflate the increase in K skill.

Overall, comparing the experimental year 1 results with the experimental year 2 results requires the assumption that the two experimental groups of opponents (batters) are roughly equal in skills. I'm not sure they are.

Does this make sense?

Reply to markpadden

jgreenhouse

6/25

Everything you say makes sense, but to me you're describing why the method works as opposed to the method's flaws.

The control groups were almost always better than the experimental groups. However, I'm not comparing them directly. I'm comparing the control group in year 1 to the control group in year 2, and the difference between them, with the difference between the experimental group in year 1 and the experimental group in year 2. There is no assumption being made that the experimental groups in both years will be equal in skill.* In fact, I am trying to find the difference in skill between the two groups. If good young players are replacing bad old players, then the league is truly getting better.

*There is an assumption being made that the control groups are equal in skill both years.

Reply to jgreenhouse

markpadden

6/27

So, you are performing the following calc., right: (ExpYear2-ExpYear1)-(ControlYear2-ControlYear1)? And the result is construed as the change in skill for year 2?

I think I get it now...

My only remaining concern would be if the avg. age of the exp. group was for some reason trending one way over a decade or so, it could lead to erroneous conclusions about true player ability changes (which would be better explained by changes in how aggressively/cautiously players are brought up from the minors). No idea if this is the case, but just mentioning as a potential factor.

Reply to markpadden

studes

6/23

Jeremy, did you regress the stats of the first year? If not, I believe you have a selection bias.

Reply to studes

studes

6/23

That said, I'm still digesting the methodology.

Reply to studes

jgreenhouse

6/23

Studes, I didn't regress, and I hadn't really given the matter of regression any thought. Could you elaborate?

Reply to jgreenhouse

TangoTiger1

6/23

Any time you have a matched pair, you imply survivorship. That by itself implies that the first in the pair was produced by someone with more good luck than bad luck. That said, the pitcher also survives, so he also had more good luck than bad luck.

Do those two things cancel out? I've done similar work that says: not. The pitcher benefits from more good luck than bad luck than the hitter, in terms of being allowed to survive into the next year.

I mean, it's a pretty tiny effect, but we're looking at tiny effects to begin with. Just something to be very careful about.

Reply to TangoTiger1

jgreenhouse

6/23

That's very interesting. What might that mean with regards to the results shown?

Reply to jgreenhouse

TangoTiger1

6/25

I don't know! What I do know is that every time I've done these kinds of studies, I worry if my conclusions seem too far-reaching.

Here's a good example of what I'm talking about.

(Do you get notified when we post?)

Reply to TangoTiger1

studes

6/23

Okay, so I guess I don't quite understand this:

Basically, I compared how players who faced each other in consecutive years fared in those confrontations and in their matchups against everyone else. If they did better against the same players, that means their environment was different.* If they did worse against everyone else than they did against the same players, that means everyone else got better.

How does this paragraph translate into the numbers on the graph above it?

Reply to studes

jgreenhouse

6/23

The paragraph tries to explain how I plan on interpreting the chart. It tries to translate into the numbers in all of the following charts.

Reply to jgreenhouse

studes

6/23

Okay, I thought you were interpreting that graph.

I admit that I don't understand your methodology. I mean, I understand your description as far as it goes, but I don't understand how that translates into specific numbers on your graphs.

Probably just me. ;)

Reply to studes

jgreenhouse

6/24

I doubt anyone was able to parse the explanation of my methodology, which certainly wasn't written too thoroughly.

My methodology was to find how the same hitters perform against the same pitchers over two consecutive years as well as against everyone else. For example, same hitters strike out 10% of the time in year one and 9% in year two. The "environmental" effect is -1%. Those same hitters strike out against everybody else 5% of the time in year one and 4% of the time in year two. The change in pitcher quality effect is 0%. That's about it.

Reply to jgreenhouse

Spitballing: Checks and Balances

Thank you for reading

Latest Articles

The Stash List ’24: Week Four $

Box Score Banter: No Exit B

MLU: Triantos Tries on Some Power $

Speed, Spin, and Snap $

Pat Murphy, Wade Miley, and the Ship of Theseus $

Jeremy Greenhouse

Latest Articles

The Stash List ’24: Week Four $

Box Score Banter: No Exit B

MLU: Triantos Tries on Some Power $