CSS Button No Image Css3Menu.com

Baseball Prospectus home
  
  
Click here to log in Click here for forgotten password Click here to subscribe

<< Previous Article
Transaction Analysis: ... (03/11)
<< Previous Column
Premium Article Baseball Therapy: Of D... (03/04)
Next Column >>
Premium Article Baseball Therapy: You ... (03/18)
Next Article >>
Premium Article Prospects Will Break Y... (03/11)

March 11, 2013

Baseball Therapy

Maybe I'm Wrong

by Russell A. Carleton

the archives are now free.

All Baseball Prospectus Premium and Fantasy articles more than a year old are now free as a thank you to the entire Internet for making our work possible.

Not a subscriber? Get exclusive content like this delivered hot to your inbox every weekday. Click here for more information on Baseball Prospectus subscriptions or use the buttons to the right to subscribe and get instant access to the best baseball content on the web.

Subscribe for $4.95 per month
Recurring subscription - cancel anytime.


a 33% savings over the monthly price!

Purchase a $39.95 gift subscription
a 33% savings over the monthly price!

Already a subscriber? Click here and use the blue login bar to log in.

Well, this year's Sloan Sports Analytics Conference has come and gone, and I wasn't able to attend. Worse, I couldn't go to the SABR Analytics Conference either. Of course, I've followed along as best as I could, but there's no substitute for actually being in the room.

With analytics having their big moment in the sun, and with the topic of how analytics fit into sports still something of an open cultural question, there have been a few writers who have considered that intersection and written something about it. Over at SB Nation, Andrew Sharp wrote a review of the Sloan Conference (seems that he was in the room), which contained this excerpt:

If there's genius on display at Sloan, it's this: When scouts or coaches or old school GMs get something wrong, it's an example of traditional scouting methods failing. When analytics get something wrong, it's "randomness" that you can't control. A small part of a much bigger process, and teams and fans should trust that process until they get a better outcome.

This might be the most damning critique of sabermetrics (and sports analytics in general) I've ever seen. Worse, it might be true.

Let's first name what we're talking about. It's a well-known phenomenon in psychology called the self-serving bias. When something good happens to you, you will tend to see it as the result of your own hard work and talent. When something bad happens, you will tend to blame bad luck. When something happens to someone else, especially to a rival, those attributions are generally switched. This has been proven in the psychological literature about 500 times. It's everywhere. You will do it today, I promise. I will too.

If you're reading Baseball Prospectus, you're probably tempted to reflexively say "Sharp is wrong." I was too. But, if we're going to be intellectually honest, this critique deserves a more thoughtful answer. One thing that sabermetrics can pride itself on is that we've made great strides in analyzing baseball while minimizing the biases that go along with "the human element." If we're going to be good scientists, we can't let this bias bring us down. Otherwise, we're just cheerleaders for spreadsheets.

There is a lot of randomness in baseball. Balls take weird hops when they hit a pebble. Wind air currents push a ball just inside the line for a double, rather than a loud foul ball. Teams have years where they win 80 percent of their one-run games. How much can we blame on random noise when we (as a group) don't get something right? How many of our correct predictions are the result of random noise breaking our way, rather than our own brilliant ideas? And is there room to say that "old school" methods (to the extent that the stereotype of the scout going purely by gut feel even exists any more) might have some credence, but might also have to deal with the problem of bad luck as well? There isn't a way to directly answer these questions fully, but the questions themselves raise some important issues.

Consider that almost no one, sabermetrically inclined or not, predicted the Orioles or A's (see, Moneyball works!) to be even close to the playoffs last year. Everyone missed badly on those two, except for a few optimistic fans. Sure, the Orioles caught some very lucky breaks along the way, and if the season were played again a few million times over in parallel universes, I don't think that they would get that lucky again very often. But here I am blaming luck for something that I didn't get right. Then again, had one of my models predicted the A's as AL West champions, I would certainly point to this as confirmation of my awesome powers of awesomeness. This despite the fact that the model most likely to have produced that prediction would have been for me to have chosen a team from the AL West out of a hat and proclaimed them to be my pre-season favorite.

There are plenty of well-scouted draft picks who busted, and not just due to injuries. Look at the recap of the first round of any draft over the past few years. It's fun to play the "He was drafted in front of these other three really good players" game... unless you're a scouting director. Then again, there are plenty of sabermetric darlings out there who were supposed to be the next big thing, but who just didn't make it either. Matt Murton will patrol the outfield for the Hanshin Tigers this year. If we're going to blame scouts for trying to draft blue jeans models, should we not also point out that OBP is not the only thing in the world that makes a baseball player good?

I think that one blind spot that sabermetrics hasn't yet dealt with is that we've misunderstood what it is exactly that we're good at. Sabermetrics is good at taking a lot of observations and sorting through the patterns in them in an unbiased way, at least within the constraints of how the mathematical model that sorts those data is defined. If the model is a good one, then it will perform well in describing the past and predicting the future. But what if the problem is that we just don't really understand what we're trying to model and our equations are off? It's great to have unbiased residuals, but how's the R-squared doing? (And why do we so rarely report it?)

To be clear, I do believe that the amount of randomness present in baseball can overwhelm even a good model. But the reflexive use of "I got unlucky over a small sample size" to explain away any variance is a problem that needs to be addressed in the field. Maybe it's the truth, but it's too easy to say and accept without really thinking about it. "Luck" gives you a get-out-of-jail-free card from having to examine what really went wrong. It short-circuits critical self-analysis, which is the key to breaking out of the self-serving bias.

To the extent that Andrew Sharp's critique is correct, it means that too often in sabermetrics, we sneakily start with the assumption that our models are correct. My model for processing information is inherently sound, and any variance is the result of chance. Paired with the other side of the self-serving bias (Your model is inherently flawed, no wonder you got such bad results), it begins with the assumption that I'm right and you have no idea what you're doing. There's a PR problem with that approach—one that probably has a lot to do with how contentious the field is in baseball more generally—but worse, at that point, we're not even doing science.

Maybe our models are brilliant. But maybe they're not. Maybe we don't know as much as we think we do or as much as we'd like other people to believe that we do. Maybe the "old school" models, even with their flaws, have strengths, but they too got unlucky over a small sample size.  Maybe the reason that it stung so much when I read this critique was that when I thought about it, the very unsettling conclusion that I had to come to was "Maybe I'm wrong."

Russell A. Carleton is an author of Baseball Prospectus. 
Click here to see Russell's other articles. You can contact Russell by clicking here

Related Content:  Sabermetrics

21 comments have been left for this article. (Click to hide comments)

BP Comment Quick Links

lmarighi

Very thoughtful and important article. Being willing to examine failures and being willing to accept that sometimes one fails even if one did one's best is critical, but not easy.

Mar 11, 2013 03:12 AM
rating: 4
 
Karl Hungus

Great article. One key point that I think needs to be looked at is that there is a large percentage of people that right out of the gate despise statistics because they dont understand or dont want to understand the meaning derived from the math. Scouts and coaches are given the benefit of the doubt out of the gate while stats continually need to be vetted. In the words of the mlb network resident genius Harold Reynolds "I just know that that aint right."

Mar 11, 2013 05:14 AM
rating: 1
 
ScottyB

As a stats guy myself (in another field), I cringe when I see error variance attributed to "luck"- and this does happen a lot in sabermetric analysis (and most other walks of life, too).

Error variance is something we failed to predict, some of it is randomness or luck (chance would be a better word than luck, though), but it is mostly due to things we neglected to consider in our model.

I would also love to see more variance built into sabermetric numbers. For example, given that we do (pretty much) know how big a sample we need for certain stats to stabilize) we can build in confidence intervals or something around our stats.

Then we can have a debate on candidate A with 6.2(+-.5) WAR vs. candidate B with 5.9(.3), or with 1 year defense stats, something like 2.0(1.6). We kinda do this with PECTOTA's quartile projections, I'd love to see this with backwards looking stats, too. It would keep us griunded in the fact that stats often have a lot of error built in.

Mar 11, 2013 07:43 AM
rating: 9
 
John Douglass

"given that we do (pretty much) know how big a sample we need for certain stats to stabilize) we can build in confidence intervals or something around our stats."

Right. Obvious example, that I harped on in the Angels depth chart already, is PECOTA's projection of Pujols walking almost 12% of the time this year. The explanation in the bounceback candidates piece a couple weeks ago which featured Pujols was that PECOTA has a long memory. But that long memory is going to ultimately make the PECOTA projection for Pujols look bad at the end of the year, with his OBP coming in way under his projection when he walks about 60% as much as is being projected.

For players like Zobrist and Bautista who were walking a lot in 2010, and still walking a lot in 2012, saying they will walk 13.2% (Zobrist) and 14.8% (Bautista) still makes sense because they demonstrably still had the high-walk skill at the end of 2012 in a big enough sample. Pujols did not. Isolating walk rates, strikeout rates, and other quick-to-stabilize rates in the projection engine, and giving them a different weight than the standard "long memory" would ultimately lead to better projections. Pujols' walk rate in 2009, 2010, when he was a completely different player, are at this point virtually irrelevant.

Crediting PECOTA for it's long memory without acknowledging where a long memory is a detriment I think is an example of what Russell is talking about.

Mar 11, 2013 10:28 AM
rating: 7
 
gpurcell

Absolutely, epsilon will contain true randomness, model mis-specification, and omitted variables.

In an effort to correct the omitted variable problem, I suspect that the model mis-specification problem has become more acute in SABR analysis as defense and (now) baserunning are rolled up into a single value statistic that corrupts the information from the much better specified hitter-value function.

Mar 11, 2013 11:15 AM
rating: 1
 
kmg1016

There's a great section in the Francona book that deals with this very issue from a manager's perspective:

"...the problem was that the number would change as we played. On Thursday Mike Lowell could be a good fit to play, but on Friday he could be a bad fit because of what happened to the numbers Thursday. It was a little too fluid for me."

At what point do you reach a level of statistical significance where that Thursday/Friday dynamic isn't the case?

Similarly, if you flip a coin and get heads seven times in a row, are the odds that you're going to get tails on the eighth flip 50/50? Perhaps, if you view the eighth flip as part of a series. But as an independent act, it's 50/50 just like the rest.

Good food for thought here...

Mar 11, 2013 08:03 AM
rating: 2
 
gpurcell

It's mainly a communication issue--rather than point estimates these data can be presented as trend information.

Mar 11, 2013 11:17 AM
rating: 0
 
TGisriel

One of the most interesting things I learned from following the Orioles last year was considering what (besides the knee jerk reaction of "luck")are the characteristics of a team that wins a disproportionate number of 1 run games.

It reminds me of the refinement of the Voros theory that pitchers don't influence the results of balls in play (but on further study some do, at the edges).

The conclusion I reached from following the Orioles was that there are good reasons to be skeptical of the ability of a team that wins a dispropotionate number of their 1 run games to be able to continue that ability, but the ones that have a really good bullpen are the ones that are most likely to be able to sustain success in 1 run games.

Mar 11, 2013 08:05 AM
rating: 1
 
Tony B

Nice article that brings up a salient point.

Why are r^2 so rarely reporte in sabermetric work - is it a sort of early convention that's persisted?

Mar 11, 2013 08:10 AM
rating: 3
 
Grasul

I think the biggest issue with Sabermetrics is confusing better with good. There are plenty of examples of Sabermetric models that have value and provide interesting levels of analysis. The biggest problem I see with Sabrematricians is their extrapolation of something demonstrably better into being considered right. Those are different things. We can say WAR has value and is informative; which isn't the same thing as saying WAR correctly values player's contributions.

Mar 11, 2013 08:49 AM
rating: 3
 
gpurcell

The biggest issue, IMO, is reification of the useful simplifying assumption of replacement level talent. The baseline is built on quicksand (and it has only gotten worse throwing defense and baserunning into the mix).

Mar 11, 2013 11:16 AM
rating: 3
 
BarryR

I would give you a +1 but for some reason I can't do that.
I freely admit that in discussing players I never use WAR because I can't defend it. There was an article here the other day about "replacement level" where the definition was hard to pin down from paragraph to paragraph.

Mar 12, 2013 11:52 AM
rating: 0
 
pobothecat

A subtle, important point.

Mar 12, 2013 09:10 AM
rating: 0
 
jfribley

"...we sneakily start with the assumption that our models are correct."

The curse of economics. Although the upside is that when sabermetric models are wrong, they aren't catastrophically wrong! As in the case of economics.

Mar 11, 2013 09:07 AM
rating: 0
 
rweiler

The difference is in baseball, when the models don't predict the real world data, the model gets changed. In economics, when that happens, they keep the model and throw out the data.

Mar 11, 2013 12:41 PM
rating: 5
 
jfribley

A+

Mar 11, 2013 14:17 PM
rating: 0
 
vansloot

Great article. One of the big points that was made at the GM Panel at the SABR Analytics conference (either by Jed Hoyer or Rick Hahn, I forget which) is that if you give some information to a player and it doesn't work, he will never trust you again. It doesn't matter that you would be right 99.5% of the time. So, information presented to players needs to be presented correctly, even if it is "right".

http://sabr.org/latest/2013-sabr-analytics-general-managers-panel

Mar 11, 2013 09:26 AM
rating: 1
 
jsdspud

Great article. I'm an accountant and part of my job is to prepare budgets. I love doing it because if I'm right I look good. If I am wrong, then I make a comment that I can't predict the future or I would have hit the lottery years ago. I never tell my boss that my model was incorrect.

Mar 11, 2013 09:29 AM
rating: 1
 
fgreenagel2

Well done. I think one can never write about confirmation bias enough -- that belief is so ingrained in people. This article feels like a continuation of your essay at the back of the 2013 annual.

Mar 11, 2013 14:25 PM
rating: 0
 
Sean

One question I haven't solved: We'd all agree that judging the process instead of the results is most important. And a good manager hires people "smarter" than him, including, but not limited to, a position like an analyst. But if that manager has to resist judging the results, and he's unable to fully grasp the methods being used, how is he to evaluate the researcher?

Of course, I'm really limiting things to the quality of research itself, not communication or interpersonal abilities.

Mar 11, 2013 15:18 PM
rating: 0
 
jrmayne

I always thought that everyone had confirmation bias, and this proves it.

Mar 12, 2013 01:28 AM
rating: 10
 
You must be a Premium subscriber to post a comment.
Not a subscriber? Sign up today!
<< Previous Article
Transaction Analysis: ... (03/11)
<< Previous Column
Premium Article Baseball Therapy: Of D... (03/04)
Next Column >>
Premium Article Baseball Therapy: You ... (03/18)
Next Article >>
Premium Article Prospects Will Break Y... (03/11)

RECENTLY AT BASEBALL PROSPECTUS
Fantasy Article The Buyer's Guide: Jimmy Nelson
Pebble Hunting: Scott Boras Has Baseball's M...
Premium Article Rubbing Mud: The Worst Holes On Contenders
Fantasy Rounders: Don't Rain on My Paredes
Premium Article What You Need to Know: Nuts To Strasburg!
Premium Article Monday Morning Ten Pack: April 27, 2015
Premium Article Transaction Analysis: Hamilton Returns

MORE FROM MARCH 11, 2013
Premium Article Prospects Will Break Your Heart: Backfields ...
Transaction Analysis: Extending to Business
Premium Article Pebble Hunting: The End of First Basemen?
Premium Article Rumor Roundup: The Battle for the Keystone
The Week in Quotes: March 4-10
Fantasy Article Top 100 Dynasty League Prospects
Fantasy Article Fantasy Auction Values: Fourth Edition, Marc...

MORE BY RUSSELL A. CARLETON
2013-03-25 - Premium Article Baseball Therapy: Could the All-Bullpen Appr...
2013-03-21 - Premium Article Baseball Therapy: Is Brandon Inge Worth 10 W...
2013-03-18 - Premium Article Baseball Therapy: You Gotta Keep 'Em Separat...
2013-03-11 - Premium Article Baseball Therapy: Maybe I'm Wrong
2013-03-04 - Premium Article Baseball Therapy: Of Dogs, Men, and Stolen B...
2013-03-04 - BP Unfiltered: Daddy, What's Replacement Lev...
2013-02-26 - Premium Article Baseball Therapy: Can't Buy Me Chemistry?
More...

MORE BASEBALL THERAPY
2013-03-25 - Premium Article Baseball Therapy: Could the All-Bullpen Appr...
2013-03-21 - Premium Article Baseball Therapy: Is Brandon Inge Worth 10 W...
2013-03-18 - Premium Article Baseball Therapy: You Gotta Keep 'Em Separat...
2013-03-11 - Premium Article Baseball Therapy: Maybe I'm Wrong
2013-03-04 - Premium Article Baseball Therapy: Of Dogs, Men, and Stolen B...
2013-02-26 - Premium Article Baseball Therapy: Can't Buy Me Chemistry?
2013-02-18 - Baseball Therapy: What Really Predicts Pitch...
More...

INCOMING ARTICLE LINKS
2013-03-18 - Premium Article Baseball Therapy: You Gotta Keep 'Em Separat...