CSS Button No Image Css3Menu.com

Baseball Prospectus home
  
  
Click here to log in Click here for forgotten password Click here to subscribe

Premium and Super Premium Subscribers Get a 20% Discount at MLB.tv!

<< Previous Article
Premium Article Under The Knife: The W... (04/14)
<< Previous Column
Premium Article Crooked Numbers: On th... (04/07)
Next Column >>
Premium Article Crooked Numbers: April... (04/21)
Next Article >>
Prospectus Triple Play... (04/14)

April 14, 2005

Crooked Numbers

Sizing Up Small Sample Size

by James Click

the archives are now free.

All Baseball Prospectus Premium and Fantasy articles more than a year old are now free as a thank you to the entire Internet for making our work possible.

Not a subscriber? Get exclusive content like this delivered hot to your inbox every weekday. Click here for more information on Baseball Prospectus subscriptions or use the buttons to the right to subscribe and get instant access to the best baseball content on the web.

Subscribe for $4.95 per month
Recurring subscription - cancel anytime.


a 33% savings over the monthly price!

Purchase a $39.95 gift subscription
a 33% savings over the monthly price!

Already a subscriber? Click here and use the blue login bar to log in.

Every year is a fresh start. For teams and for players the changes of a winter's worth of work are finally on display. Despite all the changes from last year, most of baseball remains the same from year to year, but there is an adjustment period in the early part of the season as teams and players settle into the season.

Small sample size doesn't mean no sample. While there's meaning in how a team starts off, it's also important to determine whether the early parts of the season can be deceptive for reasons other than the lack of sufficient data, especially when considering individual player performances. There's already evidence that hitters tend to perform better in the first half of the season than in the second half. There's the conventional wisdom that pitchers dominate in the colder months early in the season while August is when the bats wake up. Then there are the A's fans who keep looking at Barry Zito's 4.51 ERA the last three Aprils followed by five months of 2.74, 3.80, 3.46, 3.13, and 3.34.

April results that don't fit the public perception are usually attributed to some change discovered by the media looking for the cause. A hot start by a hitter is attributed to a change in batting stance, weight, or physique. This year's example is Eric Hinske whose new stance is the easy answer to his hot start. With pitchers, learning or mastering a new pitch or changing the delivery are the easy answers for early success. Teams off to hot starts have new veteran leadership or youthful exuberance.

Inherent in a lot of this discussion is the idea that other players have yet to adjust to the changes in their opponents. In the matchup of batter and pitcher, we usually assume that the pitcher benefits more from deception and lack of information than the hitter. Pitchers never before seen by hitters can hide the ball in different ways, mix in unexpected pitches, and throw off a hitter's timing with a new windup. Warren Spahn put it best: "Hitting is timing; pitching is upsetting timing."

One quick way to determine if pitchers see an early season advantage is to look at league wide stats broken down by month:

There are several different trends depending on the year. 2000 saw ERA decline steadily throughout the season until September; 2001 and 2003 peaked in June, 2002 in July, and 2004 in August. More importantly, there doesn't appear to be any distinct trend towards lower ERAs in April; if anything, there's a slight dip in May and a rise in June in four of the five seasons, perhaps as teams begin to weed through who's playing well and who's not before the hitters catch up in the hotter months.

Month-by-month ERA may not be the best indicator of any inherent advantage by newer pitchers or pitchers who have changed their repertoire since last season. Instead, let's break things down by a pitcher's starts against a particular team. To do so, I'll look at each pitcher's performance broken down by the number of times he's seen that team, including the current appearance. Let's see what we get when looking at 2004 numbers:

App Year   IP     ERA  K_PA  BB_PA HR_PA H_PA
1   2004 21305.0 4.50  .166  .087  .029  .237
2   2004 10795.0 4.56  .166  .085  .030  .240
3   2004  5209.0 4.28  .173  .085  .029  .228
4   2004  2859.0 4.42  .177  .083  .030  .237
5   2004  1482.7 4.38  .179  .084  .029  .236
6   2004  769.7  4.20  .183  .086  .028  .233
7   2004  410.7  3.75  .187  .091  .024  .224
8   2004  251.3  4.08  .202  .105  .033  .225
9   2004  164.3  3.40  .228  .077  .029  .228
10  2004   90.3  3.49  .211  .110  .018  .175
11  2004   43.7  4.12  .199  .044  .017  .249
12  2004   11.3  2.38  .163  .102  .041  .184
13  2004    2.0  0.00  .429  .000  .000  .143

Or if you're a more visual person:

(For the curious, those 2.0 IP in the thirteenth appearance against a team were contributed by Tom Gordon against the Orioles, Scott Eyre against the Diamondbacks, and Joe Nathan against the Tigers. The highest since 1990 was Mike Myers against the Diamondbacks in 2001 with 15 appearances. Maybe that's why the Snakes acquired him that winter.)

In the over 30,000 innings when pitchers faced a team either once or twice in 2004, they had ERAs of 4.50 and 4.56. After that, as appearances increase, ERA declines steadily. Of the four major metrics to accompany ERA, K/PA increases while ERA decreases--as we would expect--but BB/PA increases as well. (Nate Silver has already discussed the advantages of using K/PA rather than K/9, so I'll endeavor to use K/PA in the future. For reference, Randy Johnson and Johan Santana led all qualifiers last year with a K/PA of .301 while Kirk Rueter finished last with .067.)

Lest we think that 2004 was bucking a trend or the small sample sizes as appearances increase, here are the numbers for 2003 and 2002:

The most obvious explanation here is that players who are called upon to face teams many times are going to be the best pitchers in the league. To check for that bias, here's what we would have expected each group to do based on their weighted season performances in 2004. Here's what we get:

App Year  IP      ERA  K/PA  BB/PA HR/PA H/PA
1  2004 21305.0  4.73  .164  .087  .030  .239
2  2004 10795.0  4.50  .169  .085  .029  .236
3  2004  5209.0  4.31  .173  .084  .028  .234
4  2004  2859.0  4.20  .175  .084  .028  .232
5  2004  1482.7  4.09  .179  .085  .027  .229
6  2004   769.7  4.03  .183  .088  .026  .227
7  2004   410.7  3.73  .192  .089  .024  .221
8  2004   251.3  3.63  .200  .089  .023  .219
9  2004   164.3  3.42  .199  .085  .021  .218
10 2004    90.3  3.31  .205  .084  .021  .217
11 2004    43.7  3.42  .207  .085  .022  .216
12 2004    11.3  2.90  .198  .080  .018  .202
13 2004     2.0  2.37  .276  .076  .017  .168

Compare that to the other chart above and we get the following:

App Year    IP   ERA   K/PA BB/PA HR/PA  H/PA
1  2004 21305.0  0.23 -.002  .000  .001  .002
2  2004 10795.0 -0.06  .003  .000 -.001 -.004
3  2004  5209.0  0.03  .000 -.001 -.001  .006
4  2004  2859.0 -0.22 -.002  .001 -.002 -.005
5  2004  1482.7 -0.29  .000  .001 -.002 -.007
6  2004   769.7 -0.17  .000  .002 -.002 -.006
7  2004   410.7 -0.02  .005 -.002  .000 -.003
8  2004   251.3 -0.45 -.002 -.016 -.010 -.006
9  2004   164.3  0.02 -.029  .008 -.008 -.010
10 2004    90.3 -0.18 -.006 -.026  .003  .042
11 2004    43.7 -0.70  .008  .041  .005 -.033
12 2004    11.3  0.52  .035 -.022 -.023  .018
13 2004     2.0  2.37 -.153  .076  .017  .025

As opposed to the apparent improvement in performance as appearances increase, pitchers actually perform worse as their appearances mount. Pitchers performed about a quarter of a run better in their initial appearance against batters than we would expect from their complete season performance, but performed steadily worse as appearances mounted. The discrepancy between the expected and actual ERA in the initial performance against a team is especially conclusive given the massive sample size of innings involved in the initial appearance. Teams may be pretty good about selecting the correct pitchers for the majority of the playing time, but diminishing returns increase as those pitchers face the same teams more and more during a season.

Though there isn't any apparent improvement in pitching performance in April compared to other months of the season as evidenced by league ERA, pitchers do appear to see a slight advantage in their initial appearance against an opposing team. Things tend to even out in the second or third appearance, but after that, the batters appear to have figured things out and the advantage is now gone. Adding a new pitch or a new wrinkle to a pitcher's motion may work for a while, but don't expect that advantage to last all season. This trend doesn't bode well for struggling players like Zito, so if you're an A's fan, perhaps you should just forget you read any of this and read up on regression to the mean.

Related Content:  The Who

0 comments have been left for this article.

<< Previous Article
Premium Article Under The Knife: The W... (04/14)
<< Previous Column
Premium Article Crooked Numbers: On th... (04/07)
Next Column >>
Premium Article Crooked Numbers: April... (04/21)
Next Article >>
Prospectus Triple Play... (04/14)

RECENTLY AT BASEBALL PROSPECTUS
Premium Article Minor League Update: Games of July 25-27
Premium Article The Prospectus Hit List: Monday, July 28
Fantasy Article The Buyer's Guide: Francisco Liriano
Premium Article Transaction Analysis: Bochy and Peavy, Back ...
This is Not Your Father's Baseball Road Trip...
Premium Article The HOF Rule Change
Premium Article Monday Morning Ten Pack: July 28, 2014

MORE FROM APRIL 14, 2005
Prospectus Triple Play: Boston Red Sox, Cinc...
Premium Article Under The Knife: The Warriors Returneth
Premium Article Prospectus Today: Credit Where It's Due
Premium Article Getting the Call

MORE BY JAMES CLICK
2005-05-05 - Premium Article Crooked Numbers: Do Not Pass Go
2005-04-28 - Premium Article Crooked Numbers: The Ivy is Always Greener.....
2005-04-21 - Premium Article Crooked Numbers: April Fools
2005-04-14 - Premium Article Crooked Numbers: Sizing Up Small Sample Size
2005-04-07 - Premium Article Crooked Numbers: On the Run
2005-04-05 - Prospectus Triple Play: Chicago White Sox, O...
2005-03-31 - Premium Article Crooked Numbers: Neither Snow Nor Sleet...
More...

MORE CROOKED NUMBERS
2005-05-05 - Premium Article Crooked Numbers: Do Not Pass Go
2005-04-28 - Premium Article Crooked Numbers: The Ivy is Always Greener.....
2005-04-21 - Premium Article Crooked Numbers: April Fools
2005-04-14 - Premium Article Crooked Numbers: Sizing Up Small Sample Size
2005-04-07 - Premium Article Crooked Numbers: On the Run
2005-03-31 - Premium Article Crooked Numbers: Neither Snow Nor Sleet...
2005-03-24 - Premium Article Crooked Numbers: Whiff or Whiff-Out You
More...