CSS Button No Image Css3Menu.com

Baseball Prospectus home
  
  
Click here to log in Click here for forgotten password Click here to subscribe

<< Previous Article
Premium Article Future Shock: The Bund... (08/15)
<< Previous Column
Premium Article Manufactured Runs: Get... (07/18)
Next Column >>
Manufactured Runs: Is ... (08/22)
Next Article >>
The Lineup Card: Seven... (08/15)

August 15, 2012

Manufactured Runs

The Importance of Imperfect Models

by Colin Wyers

the archives are now free.

All Baseball Prospectus Premium and Fantasy articles more than a year old are now free as a thank you to the entire Internet for making our work possible.

Not a subscriber? Get exclusive content like this delivered hot to your inbox every weekday. Click here for more information on Baseball Prospectus subscriptions or use the buttons to the right to subscribe and get instant access to the best baseball content on the web.

Subscribe for $4.95 per month
Recurring subscription - cancel anytime.


a 33% savings over the monthly price!

Purchase a $39.95 gift subscription
a 33% savings over the monthly price!

Already a subscriber? Click here and use the blue login bar to log in.

From the Twitters yesterday morning:

All the context you really need to know to understand that tweet:

  • He’s referring to the implementation of WAR on Baseball-Reference, and
  • The fielding metric currently in use there, DRS, has Barney’s defense rated at over twice what any other popular fielding metric does.

I leave the matter of Barney’s defensive rating as an exercise for the reader—that equine hasn’t stopped twitching, but I’ll hold off on the beatings for right now. No, I want to discuss something that came out of a long discussion on Twitter in response to that remark: What does it mean when we think WAR (or any other metric) is wrong? Can we still use it? Should we discard it until we’ve worked out all the imperfections?

Or another example – CBS Sports recently ran a piece entitled “Aroldis Chapman broke FIP.” The heart of the piece:

So, that's the short version -- and here's the fun outlier, Chapman's FIP for July is -0.99 -- that's right, that little mark in front of what would be an impressive FIP means it's silly good. (His xFIP is a slightly less -- but still completely certifiable -- crazy -0.73.)

I'd seen this somewhere and looked it up myself -- it's true. And to try to figure out what that menat [sic], I emailed one of the smartest baseball stat folks I know (and who will return my emails), Dave Cameron of FanGraphs.com. Here's what he had to say about it.

Basically, he's been so good he broke the formula. Obviously, it's not possible for a pitcher to have an ERA lower than 0, so a negative FIP just means that based on his walk rate, strikeout rate and home run rate, the formula expects him to have given up zero runs this month.

What does the real world think about this theory? Chapman's appeared in 12 games this month, throwing 11 1/3 innings and allowed no runs. So it works!

Unlike the WAR example earlier, we can confirm that FIP is wrong in Chapman’s case, since negative runs are impossible in baseball. So if FIP can be wrong for Chapman, is it possibly wrong in other cases? Does that mean we should stop using FIP?

The answer to those questions, if you’re impatient:

  1. Yes. I would even go so far as to say it’s “wrong” in all cases, or at least the vast majority of them.
  2. Not really.

It’s quite easy to see how FIP “breaks” here—it’s a linear model, and the slope of the line means that it will go below zero if the conditions are right. Unlike reality, FIP is not bound at zero at the lower end. If a pitcher’s strikeout rate, relative to his walks and home runs, gets very high, you will see FIP go negative. But FIP will bend before it breaks—there are going to be some above-zero pitchers who nonetheless have lower estimates than they would if FIP was realistically bound at zero on the lower end.

To be blunt, FIP is a model. What we can say about models is this:

  • All models are wrong.
  • Many are useful.
  • Some are more useful than others.
  • The utility of a model often depends on the question you are trying to answer.

Let’s take an example from physics. Most everyone is taught Isaac Newton’s theories of gravity is school these days, despite the fact that Newtonian physics has been shown to be at best incomplete, and in terms of working physics has been supplanted by Einstein’s theories of relativity and the notion of quantum mechanics. There are real, observed phenomena (like measurements of the rotation of the galaxy) that point out problems with Newton’s theories. So why do we still teach them and use them?

The answer is simple—because they’re still useful in predicting the behavior of gravity as we can observe it in our everyday lives. The extreme cases where it breaks down, either at the level of large galaxies or of minute quantum particles, are simply not relevant to us. Moreover, learning Newton’s theories can still teach us quite a bit about gravity.

(Some of you may be trying to reconcile this statement with my opinions about, say, batted ball data. I do believe that imperfect models, which is to say all models, can be useful. But that isn't to say that all of them are. When looking at batted ball data, there is evidence that suggests that the model isn't adding anything to our understanding, while adding to our complexity. When it comes to accepting a model, I tend to err on the side of parsimony, which is to say the idea that the simplest theory that explains what we're looking at is best. That doesn't mean that complexity is bad, but for a more complicated model to be embraced it should offer conclusive evidence that it's adding to our understanding.)

The world is not as simple as being right or wrong. As science fiction author Isaac Asimov wrote, "When people thought the earth was flat, they were wrong. When people thought the earth was spherical, they were wrong. But if you think that thinking the earth is spherical is just as wrong as thinking the earth is flat, then your view is wronger than both of them put together." As Patriot (whose blog is the sabermetric site with the lowest ratio of readers to useful insights out there, and which you owe it to yourself to read more often) put it:

The statistician George Box once wrote that “Essentially, all models are wrong, but some are useful.” Whether the context in which this was written is identical or even particularly close to the sabermetric issues I’m going to touch on isn’t really the point. Perfect models only occur when severe constraints can be imposed. Since you can’t have a perfect model, the designer and user must decide what level of accuracy is acceptable for the purposes for which the model will be used.

So let me tell you a little story.

I was in the kitchen, doing dishes. Suddenly, I hear the sound of a child’s tears from the living room, and as I look over my shoulder, I see my little girl standing in the doorway of the kitchen. In her left hand is the saucer section of the USS Enterprise, registry number NCC-1701-D. In her right hand is the engineering section of the USS Enterprise.

That particular model kit (which I had assembled all the way back in high school) was not intended to detach the two sections of the ship from each other. She looked up at me and said, “I broke your spaceship, daddy.”

Now, I am going to tell you what I told her that day.

Sometimes, these things happen. Sometimes models get old. Sometimes they’re brittle. Sometimes they’re built by teenagers living in their mother’s basement and they’re not built to last. That’s okay. It was put out there to be used, played with, and enjoyed. We’ll see if it’s still usable, or if we can fix it. If not, we’ll throw it away and get a new one.*

This is of course a fine line to be walked. We don’t want to keep models around after they’ve long been supplanted by superior ones. We want to keep progressing, and if we’re too accommodating, we’ll stagnate. (This is, if you think about it, one of the key reasons Bill James was able to have the impact he did—because by and large that’s what the generation of baseball writers before him had done.)

But if we wait around for perfect models, we will be waiting forever. This doesn’t mean we shouldn’t be humble about how we use our models, because we should. And we should be infinitely more cautious when using a model where we haven’t found flaws than using a model where we have found flaws. We should avoid false certainties, instead presenting our uncertainties and our uncertainties about our uncertainties. But we can do that only by playing with our toys more, not less. We learn so much more when we take the toys off the shelves and start to use them.

*In case you were wondering – the Utopia Planitia shipyards were unable to salvage the craft. In life, as in art, it has been replaced by the Sovereign-class USS Enterprise, NCC-1701-E. It lights up and makes cool noises when you press the buttons on it.

Colin Wyers is an author of Baseball Prospectus. 
Click here to see Colin's other articles. You can contact Colin by clicking here

Related Content:  Models,  WAR

17 comments have been left for this article. (Click to hide comments)

BP Comment Quick Links

Lindemann
(852)

Lovely piece, finds the balance between humility and hubris when it comes to statistical questing. Also I am glad you have a new Enterprise to play with.

Aug 15, 2012 06:14 AM
rating: 5
 
Ian in Chicago

I wonder if "all models are false," is really just another way of saying, "all models are models." A model is inherently an abstraction and necessarily a simplification. Thus labels of "true" or "false," "right" or "wrong," don't make sense. As you nicely put it, models can only be more or less useful for studying a given feature.

Aug 15, 2012 06:53 AM
rating: 4
 
Shaun P.
(676)

Excellent article with a great lesson and an even better ending. You're an excellent writer, Colin - how do we get you to write more articles?

Aug 15, 2012 07:48 AM
rating: 4
 
Dubey89

Very enjoyable read. Thank you!

Aug 15, 2012 07:53 AM
rating: 0
 
Richard Bergstrom

Models can be good as a framework of discussion provided the model outputs something meaningful/relevant. The trick is the model itself should be something stable/consistent. FIP, WAR, etc work in a lot of cases and rarely "break", so they make for a good "set of rules" to begin a discussion.

Aug 15, 2012 08:50 AM
rating: 0
 
BP staff member Russell A. Carleton
BP staff

Direction, then precision.

Aug 15, 2012 09:05 AM
 
Dan McKay

Really well done.

Aug 15, 2012 09:20 AM
rating: 0
 
Richard Bergstrom

"•The fielding metric currently in use there, DRS, has Barney’s defense rated at over twice what any other popular fielding metric does."

Btw Darwin Barney's 2012 FRAA is at 8.7...

Aug 15, 2012 09:25 AM
rating: 0
 
Peter7899

The DRS model, much like the Newtonian model of physics, does break down at the edges, yes. A great example is Lawrie playing rover in RF as a 3B.

I still don't see how the model broke down, per say, for a second baseman. What gravitational constant went haywire for Barney to have such a high defensive rating?

You can waive your hand and dismiss his rating by stating "models aren't perfect", but I'd find it a lot more interesting if you uncovered how, exactly, Darnwin Barney broke the model.

Aug 15, 2012 10:27 AM
rating: 4
 
smitty

Yeah, I agree with this. Figuring stuff out like that will help with understanding defensive stats I'd guess. This is really a fantastic article as well. BP has been bringing some heat this summer with several really fine pieces demonstrating the way SABERMETRICs are maturing as we are getting really good at understanding that there are still a lot of things to learn and a lot of things we still can't solve to certainty. That's what is the coolest thing about all these stats and studies. As Byrum Saam used to say many years ago. You never do know. And that's a wonderful thing.

Aug 15, 2012 12:04 PM
rating: 0
 
Richard Bergstrom

It's hard to tell what goes on under the hood with a lot of these defensive metrics. I remember last year looking at Barney's Range Factor and Fielding Percentage and that he only rated better than Weeks and Uggla among NL second basemen. And yeah, RF and PCT are flawed, but just from those "baseball card" stats it is hard to see why other "advanced" metrics give him a lot of credit since it's not clear what actually makes up those metrics.

Aug 15, 2012 12:29 PM
rating: 0
 
Matt

I second this request to inspect how the metrics measure Barney's year and why they diverge. Is it so important that the models are correct on Darwin Barney? No. But it should be very useful to know why they diverge so much, if at least from a curiosity standpoint. That can give us a better way to know what questions the models answer for us. I have wanted the same thing for that year FRAA diverged for Peter Bourjos.

Aug 15, 2012 13:05 PM
rating: 1
 
BarryR

I'll join the chorus on this one, I'd like to know how it happened. Of course, I have little faith in any defensive metric, so DRS "breaking", if it did, doesn't concern me that much, except that it holds other, perhaps more valid metrics (mostly offense) up to ridicule by the unwashed. BP has a defensive metric in which players defensive value is increased or diminished based on the OFFENSIVE ability of those who play their position, which makes no sense at all. And if Darwin Barney's numbers are off the chart, what does that make the numbers on Don Mattingly and Keith Hernandez BP cards, which show Mattinginly as having negative defensive value in 1985 and '86, and Hernandez being negative in 1987 and 1988. Barney's numbers can't be sillier than those.

Aug 15, 2012 17:26 PM
rating: 0
 
Richard Bergstrom

I remember how the last FRAA revision wreaked havoc on Jaffe's JAWS scores.

Aug 15, 2012 18:11 PM
rating: 0
 
BP staff member Colin Wyers
BP staff

Positional offense isn't a factor in FRAA at all.

Aug 15, 2012 19:49 PM
 
Richard Bergstrom

Why does FRAA rate Darwin Barney highly?

Aug 15, 2012 21:54 PM
rating: 0
 
Richard Bergstrom

?

Aug 19, 2012 20:30 PM
rating: 0
 
You must be a Premium subscriber to post a comment.
Not a subscriber? Sign up today!
<< Previous Article
Premium Article Future Shock: The Bund... (08/15)
<< Previous Column
Premium Article Manufactured Runs: Get... (07/18)
Next Column >>
Manufactured Runs: Is ... (08/22)
Next Article >>
The Lineup Card: Seven... (08/15)

RECENTLY AT BASEBALL PROSPECTUS
Fantasy Rounders: The Young and the Splitles...
Premium Article Minor League Update: Games of Thursday, May ...
Premium Article What You Need to Know: Bummed!
Premium Article The Prospectus Hit List: Friday, May 22
West Coast By Us: Day 1: In The Land Where E...
Premium Article Rubbing Mud: The Quarter-Season Odds Report
Going Yard: The Near Perfection of Pederson

MORE FROM AUGUST 15, 2012
Premium Article Sobsequy: What Your Team's Choice of Radio B...
Premium Article Punk Hits: The Fabulous Major-League Basebal...
Transaction Analysis: Mathis for Smart Peopl...
Fantasy Article Value Picks: Relievers for 8/15/12
Premium Article Collateral Damage Daily: Wednesday, August 1...
Premium Article The Prospectus Hit List: Wednesday, August 1...
What You Need to Know: Wednesday, August 15

MORE BY COLIN WYERS
2012-09-05 - Premium Article Manufactured Runs: How Much Team Age Matters
2012-08-27 - BP Unfiltered: Ethier's Interference
2012-08-22 - Manufactured Runs: Is the Answer to Imperfec...
2012-08-15 - Premium Article Manufactured Runs: The Importance of Imperfe...
2012-08-02 - BP Unfiltered: So You Wanna Work In Baseball...
2012-08-01 - Premium Article Transaction Analysis: Trade Deadline Non-Tra...
2012-07-31 - Premium Article Transaction Analysis: Dempster Doesn't Turn ...
More...

MORE MANUFACTURED RUNS
2012-09-12 - Premium Article Manufactured Runs: Searching for Fatigue in ...
2012-09-05 - Premium Article Manufactured Runs: How Much Team Age Matters
2012-08-22 - Manufactured Runs: Is the Answer to Imperfec...
2012-08-15 - Premium Article Manufactured Runs: The Importance of Imperfe...
2012-07-18 - Premium Article Manufactured Runs: Getting Shifty Again
2012-06-20 - Manufactured Runs: Does the Rockies' Four-Ma...
2012-06-13 - Premium Article Manufactured Runs: The Madness of King Bill
More...