Notice: Trying to get property 'display_name' of non-object in /var/www/html/wp-content/plugins/wordpress-seo/src/generators/schema/article.php on line 52
keyboard_arrow_uptop

The hardest part of explaining sabermetrics to someone who’s versed in traditional baseball stats is explaining that they’re different not just in degree, but also in kind. The definition of an RBI, for instance, hasn’t changed since it was made an official statistic in 1920. The stats created by sabermetricians are much more prone to revision. Some look at this as a bug, because they view sabermetrics only as potentially better versions of traditional stats.

But sabermetrics isn’t ever a finished product. (This is not, in fact, a bad thing.) So instead of expecting our stats to calcify, we should be expecting them to grow and change as we develop the ideas beneath them. So it is with WARP, which has undergone any number of changes over the years. And now we’re going to be changing WARP again. But we’re going to be throwing open the doors and letting you watch us while we work. So we’re kicking off a series of articles, running each Wednesday, where we’ll take you inside what we’re doing. There’ll be a lot of math, but also a lot of discussion about what WARP is trying to measure and the philosophy behind various choices.

We hope this will do several things. We think that talking openly about what we’re doing will help us build better metrics, because we’ll be getting more feedback earlier in the process. And we think it will help you all to understand the metrics we’re building, because you’ll have more insight not only into what is being done, but why. And this is not a finished product—the goal of this series is to have readers looking over my shoulder as I work. If what you want is a final summation, that will be coming down the road. If you want to watch the development process at work, this is for you.

There are a handful of goals we want to accomplish by doing this, which I’ll outline below. But before we get into that, I’d like to talk a bit about the goal of WARP. Why do we need a total value metric? What are we trying to achieve? What should it be used for, and where should we be cautious about using it?

WARP is an answer. To figure out how to arrive at it, we must pose a question. The act of posing one question entails not asking others, which doesn’t necessarily mean that they aren’t worth asking. Having said that, we are picking this question because we think it’s a useful one. And we’re going to ask a question like you should ask for a wish from a genie: with extreme precision.

What we want to know is: How do we estimate what a player has done to contribute to winning baseball games for an average baseball team? There’s really a lot packed into a small space there, so let’s unpack it.

  • We want an estimate. We are willing to accept a certain amount of imprecision (although ideally we’d like to know what that imprecision is). Importantly, this means we do not ask if we are right or wrong; we are very often both. Instead we ask how close we are.
  • We’re interested in an individual baseball player. It has been said that baseball is an individual sport masquerading as a team sport. It isn’t so. Even the individual records of baseball reflect team accomplishment—how many runs a player has batted in depends substantially on how well batters ahead of him have done at getting on base and advancing themselves or others into scoring position, for instance. We need to be careful at all points to ask how much a player’s teammates have contributed to the raw numbers.
  • We want to know what a player has done. To use the technical terms of statistics, we view a player’s performance in a given time period as a population, not a sample. If you redid that sample a thousand times, that player could have done a lot of things. If you look at other samples, it’s very likely that this player has done different things. It doesn’t matter. We aren’t interested in what a player could have done, but what he actually did.
  • We want to know about wins. That doesn’t mean that wins are the only thing in baseball worth caring about. However, it is about the only thing that’s worth measuring objectively—most of the rest is all either subjective or trivial (it doesn’t take a lot of math to count home runs). (Revenues, profits and the like are the exception.) It’s also an important element of baseball—winning or losing is the fundamental objective of the game, after all.
  • We want to know how a player would have helped an average team. Very likely any particular player we might be interested in will be on a team that is not, in fact, average. But to make comparisons between players, we want to convert everything into a common baseline.

There’s one other thing we want to consider, and that’s how we split the responsibility for events. Baseball’s scorekeeping system revolves around the concept of double-entry bookkeeping— every hit for the offense is also a hit allowed by the defense. Every run scored is a run allowed. Every win for one team is a loss for some other. This is intrinsic to how the sport is constructed. Others, like our own Russell Carleton, have attempted to build models that don’t follow from this premise. It’s interesting work, and it has its uses. But in terms of a system that attempts to explain wins and losses—a batter strikeout doesn’t explain more of his team’s performance than a pitcher strikeout of his team, even if we think that batters have more “skill” in striking out or not than pitchers do in getting them to do such. We leave such questions of skill to another place and time (again, recognizing that to pose some questions means not posing others).

A perfect model, with our objectives in mind, would reconcile flawlessly between what it says on offense and what it says on defense for any given event. We lack a perfect system. But at the very least, we have a goal to bear in mind. Falling short of that goal is unfortunate, but having a goal at all will help point us in the right direction.

So what are we trying to achieve with this WARP revamp? We have a few goals in mind. First, we want to spend some special attention on the concept of replacement level. It’s an important and widely-used concept in player evaluation. But it seems to be one of the hardest concepts to sell to those who aren’t fully on board with the sabermetric movement. And even among the casual adherents to sabermetrics, it seems to be one of the more misunderstood and debated topics. We want to look at what replacement level does, why it’s important, how it affects our metrics and how it changes over time.

The next thing we want to focus on is the interaction between pitchers and defense. It’s probably the greatest area of controversy between various methods of player evaluation. It’s also the area where our metrics seem to come up with the largest number of counterintuitive conclusions. We want to look at this with a fresh eye and see what comes out of it. We’ll be examining DIPS from a new perspective and seeing how well it holds up. And of course, we’ll look at how to measure fielding.

Lastly, and perhaps most importantly, we’ll be looking at how we assess our work. Sabermetrics is in many senses a scientific endeavor. But there are a lot of pieces to a total value metric, and not all of them have received the same amount of scrutiny. It is not enough for us to propose ways to measure something; we also need to propose ways to measure how well we’ve measured it.

One commonly proposed test of how well a total value system does its work is looking at how well it predicts team results. This is dangerous ground for sabermetricians to stand on; it turns out that old-school stats like RBIs and pitcher wins do a much better job of that than any total-value stat proposed. The entire point of this exercise is that we are willing to sacrifice the best possible accounting of team wins in order to do better at expressing an individual player’s contributions, isolated as best we can from those of his teammate. Using reconciliation to team results causes us to lose sight of that goal and provides very little insight into the quality of our work.

We do want to assess our work, however. So along the way, we’ll propose methods of testing various components of the work we’re doing. This will also give us the opportunity to truly assess the accuracy of our estimates, and to produce error bars for the work we’re doing. One of the steps forward made by PECOTA was that the point forecast has been accompanied by an estimate of the full range of performance for that forecast. Hopefully, we can make the same step forward with WARP.

Sabermetrics is a key part of a lot of modern debates about the MVP award and the Cy Young and the Hall of Fame and a host of other things. There is little that we of a sabermetric bent can do to make everyone open to our ideas. That’s going to take time. But there are people out there who, while not complete converts to a sabermetric worldview, are open to treating our ideas and comments with respect. We can do more to advance the discussion by moving away from stridency and certainty and moving toward an embrace of certainty. We’re numbers people; we measure things. It’s what we do. We can, and should, measure what we measure with as well.

So here’s your chance for feedback, on both our goals for WARP and how you think we should go about getting there. We look forward to hearing from you, and for the chance to try to do something new and exciting together.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
gcarbert
8/21
Wish my high school math teacher made things this easy to understand. Looking forward to see how this new sausage will get made.
sitdancer
8/21
I'm looking forward to the article series.

Have you or others ever thought about modeling how team WARP ends up as more than a simple sum of the players' WARP in order to get a better tie into predicting team outcome?
jfcross
8/21
I'm looking forward to this too. It would be good to see some discussion of openWAR:

http://www.math.smith.edu/~bbaumer/pub/jsm2013_openWAR_slides.pdf

since it seems like you guys have some of the same admirable goals (openness, conservation of runs and error estimates).
cwyers
8/21
I've been corresponding with Ben some over the past week or so, and I've gotten my hands into his source code a little bit. It's certainly something I'm open to discussing.
jdouglass
8/21
Colin, one of the things I thought was great in the openWAR work was the idea of error estimates.

I'll use a TV analogy. I loved the season premier of Breaking Bad because it addressed the elephant: Walt knows. Hank knows that Walt knows. Walt knows that Hank knows that Walt knows. Now the writers can get down to what's really important in the show, and not let plot manipulations dictate the last 7 hours.

Similarly, the folks out there who choose to poo-poo the versions of WAR often include in their argument that it can't be exact to a decimal point. But you know that, Sean knows that, Tango knows that, all of your literate readers know that. An error estimate tied to WARP, to me, would address the elephant. WARP knows that it's not perfectly exact. It knows that the--I don't want to say illiterate--not-as-literate reader knows that it's not exact. Now the not-as-literate reader knows that WARP knows that they know it's not as exact. Now we can move on to more important stuff, like how WARP, rWAR and fWAR differ, why that is, and why those ideas are important.

I'd love love love love to see a +/- column next to WARP in your stats pages that use that value metric.
cwyers
8/21
I think you might enjoy what we're doing next week.
gpurcell
8/26
Thanks for that link!
nicholj
8/21
I like your comments about double-entry bookeeping and never really thought of baseball stats that way. It would be nice if all baseball statistics could be so symmetrical but unfortunately they are not. For example, due to the 'error' there are some inconsistencies. A batter who reaches base via the error is credited with an out but the pitcher is not credited with recording an out. Even more absurd when the error occurs on a strikeout - the batter is credited with an out, the pitcher with a strikeout but the pitcher is not credited with an out creating a flaw in some pitcher stats like K/9. Also, this leads to other uneven entries like batter runs scored being undifferentiated whereas pitcher runs allowed are differentiated between earned and unearned and the total number of runs scored not equalling the total number of runs batted in.

Also there base-running errors where pitchers are credited with an out (despite not doing anything to earn that out) but batters are not credited with recording an out in such situations. A simple example is getting thrown out at second trying to stretch a single into a double. The pitcher is credited with the out despite giving up the hit but the batter is credited with a hit despite getting out. As a result there are more pitcher outs than batter outs.

I am very curious as to whether you will attempt to address these unsymmetrical situations to create a true double-entry system or accept the imprecision.
cwyers
8/21
In the official stats, that's true. In the play-by-play record, though, you can reconcile things very easily. WARP as it stands now accounts for ROE as a time on base, not an out, for instance.
TheRedsMan
8/21
While perhaps you've made a business decision to keep BP stats separate from the work done by Baseball Reference & FanGraphs (or perhaps to use ESPN as your distribution channel), as a stat-friendly, I find it a tad frustrating that BP seems to be off operating in it's own world without active interaction on the pages of its blog with the work being done elsewhere.

At the most basic level, "WAR" has reached the public market for stats, but BP insists on calling their version of the model WARP. I'm very, very interested in this series, I just hope it is additive to the work being done elsewhere -- and consciously so.
mgolovcsenko
8/21
Spot on.

I hear (on podcasts & increasingly so on TV/radio) references to fWAR & bWAR ... never can recall a reference to bpWARP.

If you build this hopefully better tree in a walled-off forest, there's not enough people to watch it stand (or fall).

You're not just trying to improve on the measurement of WAR ... but also build awareness and broader usage of whatever you end up with.

cwyers
8/21
I think in the long run, more harm than good has been done by giving two different metrics identical nomenclature. I don't see a reason to add to that confusion by creating a third name. Once you come to that conclusion, it really doesn't seem like there's a reason to abandon the name WARP.
dpease
8/21
Different stats being named the same thing is a bug, not a feature.
TangoTiger1
8/21
WAR is the framework or specification.

fWAR is *an* implementation of WAR
rWAR is *an* implementation of WAR

It's not identical, any more than Oracle's implemention of SQL92 is identical to DB2's version.

cwyers
8/21
rWAR and fWAR are unofficial nicknames, the official name for both is simply WAR. As Dave says, that's a bug -- it introduces a lot of unnecessary confusion where someone doesn't know what you're referring to when you say WAR. Hence, people have on their own come up with ways to differentiate the two from each other. As we already have a name that differentiates, there doesn't seem to be a reason to change.
TangoTiger1
8/21
I would prefer that Fangraphs and BR.com go with the names I've coined. Fangraphs orginally called it "Win Values" or something like that, presumably as a differentiator. They went to WAR at some point.

I don't know if fWAR/rWAR is a feature or not. I don't know if they both call it WAR, but have different methods of calculation, is a feature or not.

If this was a court case, we'd each take sides, and just explain one point of view. I think you can reasonably make a case either way.
bobbygrace
8/21
Fielding metrics seem to be less widely accepted and possibly less accurate than pitching and hitting metrics. Is that fair to say? And, if so, is there any consideration of giving them less weight in the equations for DIPS and WARP?
cwyers
8/21
That's actually a feature of the current WARP; we'll be talking more about that soon.
jwoodfield
8/21
I think this is the biggest concern with all of the different WAR(P) metrics; that the data going into calculating it may not be as accurate as you need (i.e. defensive metrics). While I applaud any tweaking and revamping to make a better statistic, I feel the real advancement will come when Field F/X is in all 30 stadiums and measuring defense becomes much more reliable and concrete. Defense is a critical part of the game and I worry about giving it less weight simply because the data is less reliable....but I understand it.
draysbay
8/21
An admirable endeavor, for sure, and thank you for eschewing the black box in favor of open collaboration. It would seem that the first step would be creating replacement level. There has been much discussion on this, but to me it doesn't matter whether that bar is set at a 35% winning percentage or 40% or whatever as long as it's consistent. Silly to get bogged down on this detail.

In this same vein, will defensive replacement value continue to be at 0? Correct me if I'm wrong, but it seems that most systems use performance vs. replacement level for offense and then performance vs. the average (0 runs) for defense. I have no idea where you would put replacement level defense and maybe it is zero runs, but it would be good to establish this early.

Another tangent in this line would be how much value you want to assign to the three pillars of batting, fielding, and base running? If batting is 60% of the value of a position player then would you consider fielding to be 30% and base running the last 10%? Have I/we missed something here that we're not factoring? If those percentages don't make sense then what does and is it possible to create a regression equation using historical data to derive this information. Would this research also allow for a passage to being able to watch the watchmen, so to speak, which is a solid and underrated point you've made about being able to compare your output to reality.

These are just some initial thoughts off the top of my head, and I look forward to hearing those of others as this is sure to be an exciting project. Thank you for making us, the community, feel a part of the process. It's quite refreshing.
cwyers
8/21
It's kinda begging the question, isn't it? If you start from the assumption that replacement level doesn't particularly matter, then it's silly to get bogged down in it. We're going to be talking a lot about how to self-assess, and one of the things we're going to asses is "how do we measure the accuracy of replacement level?" Once that's done, and once you figure out why you want a replacement level to begin with, then I think you can answer the question of how much it matters to get it precisely to one figure or another.

(It won't be the first place, we start -- it was going to be, but it turns out that to discuss replacement level well, you need at least a few other concepts in your head first, so that's what I've hit upon as the best place to start off.)

In terms of the relative value of hitting, fielding, and baserunning -- it comes down to two things, how many runs is it worth relative to X baseline, and how well can you measure it. We'll be getting into all of that.
TangoTiger1
8/21
"Correct me if I'm wrong, but it seems that most systems use performance vs. replacement level for offense and then performance vs. the average (0 runs) for defense. "

Since you asked: you are wrong.
draysbay
8/21
Thank you for correcting me. I haven't seen it directly stated, but the common currency is that 0 runs is average and then defensive runs are calculated against that baseline. Could you go into more detail about what level defensive replacement is set against? Or is it that 0 runs is replacement level?
cwyers
8/21
I think Tango's position is more reductive than it needs to be. If you look at how WAR is implemented on Baseball Reference, it seems to behave explicitly how you describe (hence oWAR+DefensiveMetricConvertedToWins = WAR). And Fangraphs WAR is implicitly designed along the same lines.
TangoTiger1
8/21
BR.com has a confusing presentation, one that many of us have expressed.

Fangraphs follows the framework I have, which is that every component is compared against the AVERAGE.

You can see that presentation at the bottom of any of the player pages. You can also see it at BaseballProjection.com, which is the precursor to References' WAR.
cwyers
8/21
The presentation may be confusing, yes. But the mouse-over glossary definition says "For this calculation, we use a replacement-level on defense is the league average." Perhaps you and Forman disagree on the definition aspects of this, but I think the original statement is basically correct. (Moreover, I don't see any practical difference between his position and what you advocate, which is why we can continue to have this argument.)
TangoTiger1
8/21
Well, Sean is wrong if he has that written.

Anyway, there is a difference in how I present it (each component at the league average, then one sweeping replacement level number at the player level).

The practical difference is that by keeping things as I'm presenting it, then we don't have to have these conversations about "replacement level offense", "replacement level defense", etc, things that don't in fact exist.

This was the #1 problem with the original WARP, which Clay finally agreed to in the end. And this would have been more obvious if we simply stuck to the presentation I advocate.

It's tiring that we need to constantly correct readers who are not as knee-deep in this, because the damage was done, and continues to be done.
cwyers
8/21
I trust Sean to describe what he's actually doing with his calculations, unless I see a demonstration that his definitions don't match with published values.

I think everyone can pretty much agree that a typical replacement level player is a below-average hitter for his position but an average fielder. If you want to call that "replacement level offense and defense" or not doesn't really change the results of a value metric any.
TangoTiger1
8/21
We're just going in circles. The end-result is the same, just as the end-result is the same if I say you have replacement-level fielding and average offense, or above-average fielding and very below average offense.

That's not the argument I'm making.

The argument I'm making is that you compare each component to the average, because that makes sense.

And replacement level is a concept at the PLAYER level, not at the component level. That's the argument. That's the way I've presented WAR and that's how I've sold WAR.

KDynan
8/21
I think what everyone really wants to know here is: will the P in WARP be staying?
TangoTiger1
8/21
On a related note: Colin, didn't you have error bars for nFRAA? I checked a couple of players, but I didn't see it. Were those removed?
cwyers
8/21
The error bars are used to regress FRAA, but they aren't currently being displayed themselves. So they're used as a WARP input, but they're not broken out separately. We've been taking steps towards error bars on various parts of WARP, but it hasn't really coalesced yet. We're going to be doing more of that now.
TangoTiger1
8/21
Good.

The seasonal error bars are not going to add up linearly for the career totals (should add up following RMSE). Have you figured out how to explain that for the masses?
cwyers
8/21
You're right about this. Bear in mind that this is sort of a lab concept -- this isn't a change to WARP that is finished, it's a design document for a WARP change that we'll be working on over time. So I certainly have some ideas there, but we're going to be interacting with readers and the larger community and seeing what works and what doesn't, in terms of how to explain and present what we're doing.
TangoTiger1
8/21
This is ultimately the problem that people aren't going to want to discuss.

You can have Andrelton Simmons in 2013 be worth +30 runs on fielding +/- 15 runs.

But, if he continues to pile up +30 run seasons, you'll be able to restate his 2013 season as say +30 +/-10, with the knowledge of future seasons.

Again, most people are going to not like this idea, thinking that all seasons should remain independent. But, the reality is, they are not.
cwyers
8/21
Well, that's two different discussions. One is, what's the margin of error of 2010 through 2013, cumulatively? The other is, what is the margin of error of 2013, given 2010 and 2011? It's a bit much to discuss in a comment, but we'll certainly be discussing that.
TangoTiger1
8/21
Right. You should devote an entire article on it, so it doesn't get lost in anything else you will be discussing.
tbunns
8/21
Is the time period for calculating the value of an average team one of the potential discussion points, or is it assumed to be a baseball season?
TangoTiger1
8/21
It's definitely a point for discussion, especially in terms of AL v NL being highly imbalanced.

And if you look at it historically, obviously the WWII years were lacking in huge talent.

So, yes, that should be on the table.

cwyers
8/21
I'm willing to discuss pretty much anything. Can you clarify what you mean here?
tbunns
8/21
If you are basing WARP on the difference between average and what the player did, the average of whatever has to be calculated for the group in a time period...whether that be a season, 3 years, 5 years, etc.

Mr. Tango's comment about the WWII years lacking in talent would be exactly the type of thing I'm curious about how "average" takes that into account vs. now for example.
TangoTiger1
8/21
For those who don't know, Baseball Reference followed the implementation used here:

http://www.baseballprojection.com/war/e/erstd001.htm

And that's based on the framework I've described on my blog: every component compared to average, with the replacement level treated as its own component.

TangoTiger1
8/21
Another point in favor of using the same WAR term (though different method of calculation) is that it promotes each individual person to come up with his own WAR implementation. If you don't like fWAR or rWAR, then come up with your own WAR:

http://www.insidethebook.com/ee/index.php/site/comments/everyone_has_their_own_war/
dethwurm
8/21
If we're making a wish-list, one thing I've wanted for awhile now (and have mentioned in several posts over the years) is a more in-depth discussion/justification of FRA and some of the apparent oddities in the WARP results it generates (e.g. Randy Jones in the 70s, Felix's Cy year). Will it or similar concepts be discussed in the DIPS section you mentioned?
TangoTiger1
8/21
I agree. I consider FRA to provide such dubious results that I discard it completely. On my blog, we discussed this issue, and it seems to be a GB-bias, but who knows really. Seeing the career totals of Felix, Maddux, and Doc, with no justification at all for why they fare so poorly, is enough for me.

And since FRA is the central component to WARP for pitchers, that means WARP for pitchers is useless to me.
elannon2
8/21
I'm probably stating the obvious, but for fielding, if a player's exact range was found, this would help with fielding wins.
Things like the Trout vs. Cabrera race should be at least temporarily solved, because, even with the stats Cabrera puts up being of the mostly nonsabermetric sort, there's still no way that Trout's that much better than everyone else. Trout should be a player of focus in reviewing this batch of WARP, to see if there is some major misstep allowing him to do this well or if he really is this good.
TangoTiger1
8/21
You are wrong in your position with Trout being solved with the margin of error. The margin of error goes BOTH ways, which means it's just as likely Trout is much better than he's being shown as he's much worse than he's being shown.

If Trout is +8 +/-1.5 runs, then that makes his range +6.5 to +9.5. And if Miggy is +7.5 +/-1, then he's at +6.5 to +8.5.

If anything, this kind of thing will reinforce Trout's greatness.

(All numbers for illustration purposes only.)
elannon2
8/21
I understand this concept which you are showing me, but I think you are either missing my point or you know much more about this process than I do, which I don't doubt. My point was that maybe there is some underlying statistic which is being under-, or overvalued, causing Trout's data to be seperated so greatly from the others. This said, I don't doubt the greatness of Mike Trout. Could you please reply back to this and let me know if I'm on the right track or completely wrong altogether? Thanks.
TangoTiger1
8/22
I'm not even sure I know what the question is.

Mike Trout is an above average hitter, above average fielder, and above average runner. And you can put "way" in front of any or all of those.

All I can say is that everyone should try to develop their own implementation of WAR. The WAR framework is there for everyone to use. The presentation at BaseballProjection.com gets it exactly right.

At this point, just work through it yourself, let's see where you end up, and we can take it from there.
cwyers
8/22
Assuming the error is symmetrical and unbiased. Which may be the case, but I wouldn't assume it. (The error bars for Cabrera are going to be bigger than the ones for Trout, too, I'm betting.)
Mooser
8/21
A couple of requests:

- Please make it easier to find WARP in the Statistics page. Right now I can find WARP, but to find the individual components you need to really search. I would love it to be on the front page as a continuous update similar to B-Ref and Fan Graphs.

- ARM ratings need to be part of FRAA (if they are not already). I know you have said they are part of WARP, but they are still not available on your site.

- Please leverage the work Max Marchi is doing on Catcher fielding (ie: framing / game calling) in FRAA. This will truly separate WARP from fWAR and rWAR.

Thanks
mattidell
8/21
Very much looking forward to this.
Grasul
8/21
Do any of these calculations take into account the competition profile within the season? I think this is one of the most unexplored issues with baseball statistics.

The most obvious example is NL pitchers facing the pitcher every 9th PA compared to AL pitchers facing a DH. Its been common for years to had a half run or some amount to ERA when trying to make an eyeball comparison between pitchers in opposite leagues.

But even more interesting is controlling for the statistics of players against the expectation of who they faced. Using a starting pitcher as an example; it should be fair to say a guy that gets 24 starts against the top 5 offenses in the league has a more difficult job than a guy that gets 24 starts against the worst 5 offenses (and of course this concept should be reduced down to specific pitchers and batters faced). The former pitcher is arguably a much better player than the latter with the same stats, and who is on the schedule is entirely outside the control of the players and in an ideal world should be controlled for.

Apologies in advance if this is the proverbial dumb question.
Grasul
8/21
Actually, in thinking about it, I think I'm asking a different question. You guys question tries to answer a value question, who did the most things that helped their team? My question is subtly different, who was the best player? Or, more exactly, who most overachieved expectation?
HPJoker
8/22
I have a few questions.

1) Wouldn't a full stadium view of the playing field be extremely helpful in judging defense? You could time an outfielder's jump, mark a player's exact position, and with some perspective judge the distance/difficulty of throws.

2) Why can't replacement level be the bottom 5th percentile of major league performers or something? I know this is overly simplistic thinking, but wouldn't something like that catch the essence of what a replacement level player is?

3) Will opponent factor into this at all? For example, if Matt Harvey shuts out the Red Sox at Citi Field, will he get the same amount of credit as if he shut out the Marlins at Citi Field? I don't know if this is already taken into account in WARP or the other total value metrics, but there is a discussion between Joe Sheehan and Brian Kenny about Harvey/Kershaw and one of Sheehan's points is that Matt Harvey has had the easiest schedule of anyone this year. Shouldn't that value into WARP, if it isn't already?

4) I hope the little things like the utility study by Mr. Carleton and the catch framing studies done by Mr. Lindbergh will be accounted for in this new version of WARP.

5) Will outside studies be used to help formulate the new version of WARP?

6) After this is all said and done, could you possibly make it easier to access WARP by maybe putting a leader board on the main page ala. Fangraphs & B-R?
Plucky
8/22
Several unconnected questions/musings (in all cases, I use "WAR" to describe the concept and "WARP" to describe the calculated statistic):

1) Both the play outcome --> run value and run value --> win value legs of any WAR methodology are things that ought to float from year to year (or at least era to era) based on overall environment. A run in 1962 is worth a lot more than a run in 1998. Similarly, the relative value of HR vs BB would be different given that in a lower offensive environment a walk would have a lower expectation of scoring. Is that something already accounted for in WARP? If not, is it something you plan to include? Would you make similar AL/NL adjustments? The presence/absence of the DH would have similar effects on runs->wins and outcomes->runs

2) The discussion above on offensive, defensive, and baserunning WAR above is a bit circular, but gets at an important point- Is "replacement level" going to be defined at the player level or skill-component level? That is to say, is a 0 WARP player someone who is not just a bad hitter, but slow and stonehands as well? Or is 0 WAR going to mean awful with the stick but not a disaster in the field or on the basepaths (or some equivalent skill mix that gets the the same place)?

2b) The component-level vs player-level definition of replacement level gets at the fundamental assumption of talent distribution behind WAR, which is that it's a pyramid (or some kind of highly skewed distribution like Pareto or the truncated edge of a normal curve, etc), and that such a distribution implies that there's a certain production level that is so ubiquitous as to have zero scarcity value. This is pretty demonstrably the case for hitting and pitching (probably pitching, I'm not 100% sold on it), but given that defensive stats are by no means mature, can we assert with any confidence that it is also true in the field? Even if it's true in some metaphysical sense, is it true in the population of players who hit well enough to potentially put on a 25-man roster?

3) I would think avoiding "average" in any mathematical formulation of WARP would be ideal if possible. The whole conception behind WAR is the skewed talent distribution, and a big part of the usefulness of WARP is that it measures that skew. By building in any reference to "average" into the definition you are inherently assuming a particular level of skew, which then creates a circular measuring-what-you-assume problem. A system that defines replacement level based on percentile-of-players is a much better fit with the concept or WAR. That said, going with a percentile-based (or ranking-based) definition presents its own set of difficulties, particularly when you start trying to measure position value.

3b) I put "average" in scare quotes because it's a very slippery word with several possible meanings in a player-value context. First of all, given that we're dealing with a skewed talent distribution, do we really mean "median" rather than average? Even if we are talking about an average rather than a median, what kind of weighting do you use? Averages based on league-wide aggregate stats are weighted by plate appearance, and for obvious reasons better players will have more PAs than worse players. The result is that "average" calculated from leaguewide aggregates will be higher than "average" calculated in a way where each player counts equally


WaldoInSC
8/23
This endeavor demonstrates such maturity. Congratulations to everyone involved in advance for your collaborative approach.