keyboard_arrow_uptop

As expected, the Toronto Blue Jays are scuffling, playing .500 ball through two weeks and hanging around at the bottom of a deep AL East. Their $100 million investment in pitching has yielded just ten innings of work so far, as A.J. Burnett has made only one start while B.J. Ryan pitches about as often as a modern closer does. The back end of the rotation has suffered from the winter’s defensive downgrades, especially Josh Towers, who’s allowed 25 hits in 12 2/3 innings. The offense is averaging six runs a game, but that’s not sustainable–they’re not going to hit .321 all year. Look for the Blue Jays to stay within a handful of games of .500 throughout the season, and be disappointed by their final record.

Out in Oakland, the A’s are off to an unimpressive 6-7 start, albeit one that has them tied for first in the AL West. They’ve allowed more than five runs a game, a figure that belies a stat line showing them to have the highest strikeout rate in the AL, a better than 2-to-1 strikeout-to-walk ratio, and the fourth fewest home runs allowed in the league. That 5.26 ERA is going to fall, and when it does, the A’s will get separation in the West on their way to a division title.

The two paragraphs above are factually accurate, deceptively analytical…and a load of crap.

I’m a bit dogmatic on the idea of not drawing conclusions off of small sample sizes. The statistical reason is that baseball performance, by both teams and players, can vary widely over the course of the season. Two weeks of play simply isn’t enough time for the underlying ability to shine through the variance, rendering the data essentially unusable.

The stickier problem, though, is confirmation bias. Confirmation bias is the very human tendency to assign importance to the data that fits our hypothesis, and dismiss the data that undermines it.

The Blue Jays are off to a .500 start and playing lousy defense behind a contact staff? That’s to be expected when you overspend on so-so free-agent pitchers and trade away a great glove man like Orlando Hudson. And that six runs a game they’re scoring (in part due to acquired-for-Hudson Troy Glaus and his 1037 OPS)? That won’t last. The A’s, however…their 6-7 start isn’t something to be taken seriously. The core talent is very good, and they’ve just been unlucky in how many runs they’ve allowed in the early part of the season.

It’s insidious, and quite frankly, it’s a lot more dangerous in the work of someone like me, who lards their arguments with information and data rather than random opinions about character and fortitude and clutch and heart and spleen. It’s easier to see through mainstream “analysis,” with its high daily level of nonsense, but when a performance analyst is pointing to numbers and making a case, it’s harder to see through the biases.

Think about last season, when even at the All-Star break I was still dismissing the Chicago White Sox, focusing on their record in one-run games rather than their terrific defense and functional, if wildly misunderstood, offense. All of the points I made about the Sox were true, but the team’s record in one-run games meant more to me because it fit my preconception that they were a sub-.500 team that was getting lucky. They were actually a .565 team that was getting lucky, but because I dismissed the information that supported the idea, I didn’t see their true ability. That’s confirmation bias.

Perhaps the most notable example of this in the early part of the 2006 season is Barry Bonds. Bonds is hitting .192/.488/.269 with no home runs in 26 at-bats, amidst a media circus and with knee and elbow problems. (Worth noting: Bonds has a .305 EqA, even with the low BA and no power; OBP is life, life is OBP.)

If you’re inclined to believe that Barry Bonds is simply going through a normal decline, you point to the surgically-repaired knees, the bone chips in his elbow, and the microscopic sample size. You note that Bonds played the first two weeks in three of the lousiest hitting environments in MLB, much of that time spent in cool, damp weather, and that he’s dealing with a level of scrutiny that we don’t put on nominees for Cabinet positions.

If, instead, you believe that Barry Bonds is a steroid-using cheater who has been scared off of the juice by the new penalties, a book about his usage and the opprobrium of the American populace, you look at the goose egg in the home-run column, the sub-Mendoza batting average, and the handful of warning-track flyballs and you snicker. Of course, Bonds is done; he’s stopped using steroids, which were the only reason he played so well so late in his career.

In either case, you’re wrong. There’s not enough evidence to support either conclusion at this point, and in choosing your data, you’re guilty of confirmation bias. Bonds may be done, or he may just be going through an injury-enhanced slump. We won’t know for some time, and that’s the only reasonable conclusion.

Being aware of confirmation bias is important at all times, but it is especially so early in the season, when you can pretty much find information to support any conclusion you care to draw, and dismiss that which doesn’t suit you as “small sample size.” It’s all small sample size, which is why any analysis that emphasizes on-field performance in April is to be dismissed. It’s definitely got a sample-size problem, and it most likely comes with a confirmation bias.