CSS Button No Image Css3Menu.com

Baseball Prospectus home
  
  

Search Article Archives

Find:   

Author:    Article: 
Search Date (From):    Search Date (To):   
Sort Results by:  Relevance
 Date
Show me  Results

   Show Article Summaries

New! Search comments:      
(NOTE: Relevance, Author, and Article are not applicable for comment searches)

TangoTiger
314 comments | 399 total rating | 1.27 average rating
Facebook Twitter email a friend
Share comments by 'TangoTiger' posted at
Baseball Prospectus http://bbp.cx/i/57181
Your search returned 313 results...
TangoTiger
(57181)
Comment rating: 5

Thanks Pizza, good job. I agree that Barrels is one subset, which is intentional. There's likely going to be six subsets, as shown here: https://twitter.com/tangotiger/status/781674636866191360 By doing this, we'll be able to create a "scouting profile" for a hitter. The question of being predictive would necessitate a slightly different approach.

 
TangoTiger
(57181)
Comment rating: 0

Link to AL/NL by team: https://twitter.com/tangotiger/status/778404393163063296

 
TangoTiger
(57181)
Comment rating: 0

First, a big {clap clap clap}. Just to confirm your numbers, roughly speaking (and understanding it's not linear, but focusing on the temperatures around 60-110), each 6 to 10 degrees Fahrenheit increases exit velocity by 0.1mph, correct?

 
TangoTiger
(57181)
Comment rating: 2

I think Pizza does a good job of laying out the groundwork as well as highlighting the potholes. And that we can't come to any conclusion given the data and research publicly available. MGL does a good job of noting all the additional variables that need to be considered, and lacking those, any conclusion is definitely too early to be stated.

 
TangoTiger
(57181)
Comment rating: 1

I also see that your inning parameter, which was so prevalent in your original model has all but disappeared, and limited to just the "extra innings or not". That was one where it was almost certainly an overfitting, especially when those values would change each year. So, it's good that you are using baseball knowledge in contructing your models, rather than relying on a regression/kitchen sink approach. In that respect, you'd probably want a "9th inning, tieing runner at bat or on base" parameter, which should be similar to the XI parameter you have. After all, bottom of 9th tie game is much closer to impact as XI tie game than 3rd inning tie game.

 
TangoTiger
(57181)
Comment rating: 1

Jonathan, thanks for the continued improvements. Just so I am following along, is your chart showing that the <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=IBB" onmouseover="doTooltip(event, jpfl_getStat('IBB'))" onmouseout="hideTip()">IBB</span></a> is dependent on the identity of the catcher (over and above the team that the battery is on)? If so, can you show us this "IBB impact" for the leaders/trailers for catchers? Also, the putout impact of the 1B is dependent on the putout impact of the 3B? If so, is that because the talent level of the 3B allows the SS to move over which allows the 2B to move over? But that if you only focused on the impact of the 2B, that that did not have any relevance? Fascinating if true.

 
TangoTiger
(57181)
Comment rating: 0

Pizza: I think if you split it up between home/road, then the effect is reduced substantially. So, the team that loses the lead is most likely to be: (a) below average, as your last column shows, but also (b) be the road team.

 
TangoTiger
(57181)
Comment rating: 3

It's not a strawman. We've talked about it on my blog on many occasions, and also as it applies to other sports. Here's the idea applied to hockey: http://www.insidethebook.com/ee/index.php/site/comments/compassionate_referee/ Another article applied to the base-out state: http://www.insidethebook.com/ee/index.php/site/comments/the_cruel_and_compassionate_umpire/

 
TangoTiger
(57181)
Comment rating: 2

No one ever says "always". That's a strawman. The point is that unless you can somehow know, then you have to rely on the averages. If you go with the averages blindly, you'll be right 60% of the time. If you go with the averages AND your gut, you'll be right 50-65% of the time. The odds are stacked against you.

 
TangoTiger
(57181)
Comment rating: 2

This is a great topic, thanks for continuing to talk about it. I also agree that regardless of the actual reason, the "marker" (good word) of times through the order is easier to sell than the nebulous concept of pitches. I'd also like to point out the research presented by MGL here: http://www.baseballprospectus.com/article.php?articleid=22156

 
TangoTiger
(57181)
Comment rating: 0

The scale is not "difficulty". It's "defensive impact".

 
TangoTiger
(57181)
Comment rating: 4

I think fan participation is crucial. No one might care about your opinion in isolation, since it seems like it could very well be just a fart in the wind (*), but when you are part of a collective, your opinion is extremely valuable. I encourage you to always participate when you can, and the effort is low. Your opinion has more value than you can possibly fathom (**). (*) Great movie. (**) Another great movie.

 
TangoTiger
(57181)
Comment rating: 4

My scientific observations has led me to the fielding spectrum I noted above.

 
TangoTiger
(57181)
Comment rating: 2

BPRo makes it easy to find the data. http://www.baseballprospectus.com/other/iba2014/index.html 2014 had 563 votes for AL MVP 2013: 644 2012: 654 I think it was probably they put out the pages late or had a smaller window time for voting maybe?

 
TangoTiger
(57181)
Comment rating: 3

"I went to a baseball site, and a political debate broke out!"

 
TangoTiger
(57181)
Comment rating: 0

Right, I was flip-flopping between the two to make the better example. Either way works.

 
TangoTiger
(57181)
Comment rating: 2

I've been talking about positional value for 15 years on my blog. It's one of the more intricate topics to discuss. http://www.baseball-reference.com/leagues/split.cgi?t=b&lg=MLB&year=2014#defp::none In 2014 <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=OPS" onmouseover="doTooltip(event, jpfl_getStat('OPS'))" onmouseout="hideTip()">OPS</span></a> by position: .724 LF .719 CF .715 3B In 2015: OPS .755 3B .739 CF .736 LF As I said, it becomes a fairly intricate and complex topic to discuss. You can check my old blog if you have interest in this topic. http://www.insidethebook.com/ee/index.php/site/categorylinks/

 
TangoTiger
(57181)
Comment rating: 2

Hmmm... I messed up something. Let me try again: C SS 2B/3B/CF {large gap} RF {tiny gap} LF 1B DH

 
TangoTiger
(57181)
Comment rating: 1

This is the scale: C SS 2B/3B/CF <big gap> RF <tiny gap> LF 1B DH

 
TangoTiger
(57181)
Comment rating: 3

3B and CF are considered equivalent in terms of their defensive impact.

 
TangoTiger
(57181)
Comment rating: 1

As I understand it, openWAR uses the base-out states in all its calculations. They look at the change in run expectancy, and then attribute that to the players involved, and this is true for hitting, running, pitching, fielding. In Fangraphs/BR.com parlance, this is the RE24 value. So, a bases loaded wild pitch with 0 outs would be counted much differently than a <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=WP" onmouseover="doTooltip(event, jpfl_getStat('WP'))" onmouseout="hideTip()">WP</span></a> with a runner on 1B and 2 outs. The other systems basically assume it's a WP that occurred in a random situation in a random game.

 
TangoTiger
(57181)
Comment rating: 0

[quote]LI records it as "He was in the game with an LI of 0.02, and we're going to weight that equally with the 5-out, came into the 8th inning with 2 runners on, up by 1 save he had last week."[/quote] That should read "WAR records it as". And yes, that is what it is doing, just like it's doing it for all pitcher and hitter stats. Now, you could decide to do the "halfway" with WPA instead. You'd take the LI at the <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=PA" onmouseover="doTooltip(event, jpfl_getStat('PA'))" onmouseout="hideTip()">PA</span></a> level you find at Fangraphs, and take it halfway to 1.0. Then divide WPA by that figure. Then add in the replacement level portion. But if you do this for a group of closers, you won't find any difference.

 
TangoTiger
(57181)
Comment rating: 4

Russell, you can easily show the "before" data, in terms of <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=IP" onmouseover="doTooltip(event, jpfl_getStat('IP'))" onmouseout="hideTip()">IP</span></a>, and K and <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=BB" onmouseover="doTooltip(event, jpfl_getStat('BB'))" onmouseout="hideTip()">BB</span></a> and <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=ERA" onmouseover="doTooltip(event, jpfl_getStat('ERA'))" onmouseout="hideTip()">ERA</span></a>, and you can show the "after" data in the same way, and split it based on 4 days rest and 5+ days rest.

 
TangoTiger
(57181)
Comment rating: 2

The premise to the article is great. The process to get the answer is great. The entire payoff was this: "The results? Nothing. Extra days of rest earlier in the season didn’t help a pitcher in September (or hurt him). Everyone pitched about how their seasonal stats would predict he would, at least in the aggregate." Not a single piece of data to hang our hat on. If Verducci did this, we'd blast him for it. Baseball Prospectus? Hugely disappointed. At a pure minimum, can't you just provide a link to the results of the data?

 
TangoTiger
(57181)
Comment rating: 0

"within the normal bounds of decency" That should actually be: within the bounds of the data at hand. So in your dataset, how many batters and innings did the top 10% in workload average?

Sep 30, 2015 11:24 AM on Let Him Pitch!
 
TangoTiger
(57181)
Comment rating: 1

Great, thanks for running it. Then really, if you get substantially the same results, with no one pitcher an outlier, then just choose whichever is easiest to implement and present.

 
TangoTiger
(57181)
Comment rating: 1

Can you show us the difference in results (say for the Cy contenders, meaning Greinke, Kershaw, Arrieta, Price, Keuchel) if you treat park effects as a fixed effect in one model and a random effect in the other model?

 
TangoTiger
(57181)
Comment rating: 2

The second paragraph should read "better than replacement".

Aug 26, 2015 9:27 PM on An Introduction
 
TangoTiger
(57181)
Comment rating: 2

It looks to me that <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=WARP" onmouseover="doTooltip(event, jpfl_getStat('WARP'))" onmouseout="hideTip()">WARP</span></a> is based on <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=DRA" onmouseover="doTooltip(event, jpfl_getStat('DRA'))" onmouseout="hideTip()">DRA</span></a>. And <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=RAA" onmouseover="doTooltip(event, jpfl_getStat('RAA'))" onmouseout="hideTip()">RAA</span></a> is based on a REGRESSED DRA. His DRA is 2.05, which is 2.92 runs per 9IP better than replacement. 2.92/9*172 = +56 runs better than average. Divide by 9.2, and you get 6.1 WARP. But in no way is 2.05 only +23 runs better than average. 2.05 is 35 or 40 runs better than average. However, the hidden thing that no one is talking about is that the data is being regressed. This is huge. So, they are regressing someone with 700 or whatever <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=PA" onmouseover="doTooltip(event, jpfl_getStat('PA'))" onmouseout="hideTip()">PA</span></a> at 30 or 40%, and so, bringing down the +35 down to +23.

Aug 26, 2015 9:20 PM on An Introduction
 
TangoTiger
(57181)
Comment rating: 2

http://www.hardballtimes.com/it-never-raines-in-san-diego/

 
TangoTiger
(57181)
Comment rating: 0

"He received no free agent offers the following off-season except from his former employer, the Montreal Expos. " Actually, he did. I think it was Padres, Astros, Mariners maybe. They were below market value, like 30-40% below market. He said he was packed, ready to go, but then the offer was pulled.

 
TangoTiger
(57181)
Comment rating: 7

This is a model as to how all saberists should approach their work. A terrific piece. *** Re: WOWY, I just want to point out that there is no particular reason(*) to limit the "without" part to same-season. And in all the WOWY that I do, I do not limit to same-season. So, it seems from the WOWY that's applied in this article, it's a self-imposed constraint, which leads to the massive overfit on occasion. (*) Except for aging. You would need to control for aging to some extent. The "Ripken" issue is really the "Ripken-Garland" issue, where Garland had Ripken almost exclusive as his SS. But other Orioles pitchers, lucky enough (for us) to have pitched on an non-Oriole team, would not have this issue. So, just wanted to clarify that.

 
TangoTiger
(57181)
Comment rating: 1

That includes arb and pre-arb players. You'd have to look at free agent average players only if you want to compare to free agent stars.

 
TangoTiger
(57181)
Comment rating: 0

I meant like the illustration you had of Trout, that because he was flat (in the past), that he'd be easier to forecast (in the future). MGL below says that his research DOES support your claim. I wanted to see those results. It's one thing to say we'll get smaller RMSE with players who have a lower standard deviation in past 3 or 4 years of wOBA. It's another to say that those guys will have an RMSE of .028 and the variable guys will have an RMSE of .029. So, I was looking to see to the extent that we're talking about in turning that into the english "easier".

 
TangoTiger
(57181)
Comment rating: 0

MGL: excellent point. I had meant it individual RMSE, but you make a good point in terms of the collective.

 
TangoTiger
(57181)
Comment rating: 0

Robert: you made a point of saying that a "flat" player is easier to forecast than someone who is up-and-down. Can you actually test that and report the results?

 
TangoTiger
(57181)
Comment rating: 0

Craig: can you add one column to each chart, and that is "Unlisted", meaning number of ballots where the player was not listed.

 
TangoTiger
(57181)
Comment rating: 0

One SD is around .008 outs per BIP, based on research done several years back. The point at which random variation is equal to the true talent spread (i.e., r=.50) is at around 3000 - 4000 BIP.

 
TangoTiger
(57181)
Comment rating: 1

10,000 games is not enough. You need one million games to get the rounding error to under 1 run per 162 games. Overall though, you can run a model here, add +.15 or subtract .15 runs to each of the values in the chart, and you will find a change of 0.4 runs per game as the impact of speed in terms of taking the extra base. http://tangotiger.net/markov.html Of course, speed comes into play in hitting and fielding and basestealing. When we talk about speed for hitting, it turns say a guy with a .300 wOBA into a .330 wOBA because of his speed, etc. So, you have to be careful how you frame the discussion. You may find this old article by Tom Tippett interesting: http://207.56.97.150/articles/ichiro.htm

 
TangoTiger
(57181)
Comment rating: 1

Wonderful!

 
TangoTiger
(57181)
Comment rating: 0

Yeah, I hadn't considered it either, and it's a huge thing. You can check my blog for more comments...

 
TangoTiger
(57181)
Comment rating: 1

I concur with MGL. I'm not buying these results in the least: "As an example, take a situation in which we might otherwise expect a 20 percent chance of a strikeout. For a returning Houdini, the chances of actually striking the hitter out fell to around 12 percent." Not to mention that since these results are based on both groups of relievers, those who inherited runners and those who placed runners, and since Russell concluded that those who inheritied runners weren't affected (i.e., 20% before would also be 20% post-Houdini), that would mean that the other subset (those who placed runners themselves) would be far lower than 12% post-Houdini. Like a 5% or something. Not buying it at all. How can you have an entire article like this and not show the data? Seriously. It does not have to be in the article. Just put a link to the source data, aggregated by pitcher.

 
TangoTiger
(57181)
Comment rating: 1

I believe that's spelled: Bone For Tuna.

 
TangoTiger
(57181)
Comment rating: 0

In that case, can you look at past years correlation of OBP year T to year T+1? Because an r of close to .50 with 480 PA is an extremely low number. It should be closer to .70.

 
TangoTiger
(57181)
Comment rating: 0

Russell: something looks odd to me. For the seasonal correlations, those look like r-squared results. But for the short-season correlations (those based on 130 or 100 PA), those look like r results. Can you confirm that (a) all the results in the table are of the same form, and (b) whether it is r or r-squared?

 
TangoTiger
(57181)
Comment rating: 0

Well, the last one is a given. The others, please, thanks.

 
TangoTiger
(57181)
Comment rating: 1

Can you tell me the average number of PA for each of these entries? Overall Seasonal OBP .474 Regressed Seasonal OBP .480 End of season talent estimate .380 Highest estimated OBP .302 Lowest estimated OBP .243 OBP in last 100 PA of season .297

 
TangoTiger
(57181)
Comment rating: 1

Zach: the NHL has one-day contracts. http://www.capgeek.com/faq/how-do-emergency-goaltender-tryout-contracts-work

 
TangoTiger
(57181)
Comment rating: 0

Unfortunately Eric, the logic won't hold. I had this discussion ten years ago, and it'll break down eventually if you follow that line of reasoning. If you want to have this discussion, please start a thread on my forum. I don't want to get into a tangent here. http://tangotiger.com/index.php/boards/viewforum/2/

 
TangoTiger
(57181)
Comment rating: 0

The league average LI is 1 by defintion. That won't apply at the team level necessarily.

 
TangoTiger
(57181)
Comment rating: 0

The derivation is the shortcut approximation of a much longer calculation called "chaining", where you refigure how the bullpen would do, when one of the guys is removed from the chain, and everyone's role is readjusted, as the new guy coming in, comes at the bottom of the chain. Whether Mariano is on the roster, is injured or is retired, the leverage opportunities will still exist. And someone pretty good will pick up some of that slack.

 
TangoTiger
(57181)
Comment rating: 3

Thank you for your diligence. Actually, it's a bit more complicated! With relievers, because the leverage is "bequeathed" to the manager, we only give the reliever a portion of the leverage, reasoning that the manager has to give the leverage to SOME reliever. Hence, if a reliever enters with an LI of 1.8, we give the reliever an LI of 1.4, for purposes of figuring his contributions. But for starting pitchers, it's not the same thing. They directly have a hand in creating their own leverage, as you properly note. If Felix or Cliff Lee or some other pitcher gets an LI of 1.1 because they pitch deep in games, they may deserve most if not all of that LI. So, it's a bit more nuanced. In terms of "worth it": it's easy for us to sit here and tell David at Fangraphs and Sean at Baseball Reference "uh, do it!". It's not our effort being measured here. It's definitely a valid point, and should be given its due consideration. Great job!

 
TangoTiger
(57181)
Comment rating: 1

Generally speaking, over the course of a season, the LI of every starting pitcher will hover around 1 (say, 0.95 to 1.05, with outliers just outside that). Hence, it's not worth the effort to try to get that precise. That said, you can definitely argue that maybe it should be that precise.

 
TangoTiger
(57181)
Comment rating: 0
 
TangoTiger
(57181)
Comment rating: 1

Russell is wrong on multiple levels here. WAR properly accounts for the leverage issue, by giving the pitcher credit for halfway between standard leverage (1.0) and actual leverage (whatever he got, say 1.8). And that's done for the very reason Russell notes: that a manager can indeed leverage his reliever in the future. BUT, we don't give full credit, because the next best reliever could do almost as good a job. Hence, once you "chain" all this, you give 1.4 credit in terms of leverage, and that's what WAR does. Secondly, his view of "sabermetric orthodoxy" is not at all accurate. No one suggests "that any idiot with a right arm can close a game", in any circle. It's a straw man. I posted my views on my blog.

 
TangoTiger
(57181)
Comment rating: 1

Wonderful research, great presentation.

 
TangoTiger
(57181)
Comment rating: 2

Can you post a table that summarizes the data along these lines: a - homeaway b - inning when offense scored to tie or take a 1-run lead c - number of times this happened d - number of times this team came into the next half-inning and allowed no runs e - d divided by c f - % of scoreless innings by this pitcher in the season (or at least his RA9, which we can run through to estimate this) g - difference between e and f

 
TangoTiger
(57181)
Comment rating: 0

Newsense is absolutely correct and explains it perfectly.

 
TangoTiger
(57181)
Comment rating: 4

I love your coin analogy. *** When I poll readers on my blog, there's an equal divide. Some are interested just in what happened at the time it happened, and others are just interested in the outcomes as if they happened in a vacuum. Basically, we always need to have multiple versions of whatever metric you create, because everyone is coming to the table look for answers to different questions. And we're not able to find "common" questions enough to allow us to present just one version.

 
TangoTiger
(57181)
Comment rating: 1

That's correct. I expand further on my blog if you are interested.

 
TangoTiger
(57181)
Comment rating: 0

Correct. Or in another words, RE24 uses a chart similar to this: http://www.tangotiger.net/lwtsrobo.html While Linear weights uses a chart similar to what Colin showed above.

 
TangoTiger
(57181)
Comment rating: 0

Perfect! *** As for how "small" small is for RE24, in some states (bases empty 0 outs), it's very tiny. In other states, it's larger than you might think. In some league-years, you'll have say the man on 2B state have a HIGHER run value than the man on 3B state. This is easily corrected by using Markov chains.

 
TangoTiger
(57181)
Comment rating: 0

Alan: I think this is part of the confusion with what Colin is doing. He's really presuming that the RE24 model is the target model, and he's presuming that he's unaware of Trout and Cabrera's performance in the 24-base-out states, and so, that's what the "error" term is about. Setting that aside, you are correct that if say all of Trout and Cabrera's hits were singles, and we were in fact doing an error term of the single the correct way (we'd end up with a value of something like .002), then their error terms (which at this point would be extremely tiny, less than 1 run) would move in the exact same direction. But, that's not what Colin is doing.

 
TangoTiger
(57181)
Comment rating: 0

This is handled by "base-out Leverage Index" (boLI), though in retrospect, I should have called it LI24. Baseball Reference tracks it, and you will see that there is not much deviation.

 
TangoTiger
(57181)
Comment rating: 0

Well, those on the leading edge need to support RE24 more. Google RE24 and you'll get some good articles. But, we need more people spreading the word.

 
TangoTiger
(57181)
Comment rating: 4

If the purpose is to track offensive impact by the base-out situation, then yes. That answers that question. If the purpose is to assume that the event would have typically occurred in a typical situation, then no. All you have to answer is this question: how much weight do you want to give a bases loaded walk, compared to a bases empty walk, if both occurred with two outs? If you want to give the same value, then use standard linear weights, and both get around .3 runs. If you want to give them different values, then go with RE24, where one walk gets exactly 1.0 runs and the other gets around .13 runs. It's a personal choice. No wrong answer.

 
TangoTiger
(57181)
Comment rating: 0

Sky is correct.

 
TangoTiger
(57181)
Comment rating: 0

I'm not even sure I know what the question is. Mike Trout is an above average hitter, above average fielder, and above average runner. And you can put "way" in front of any or all of those. All I can say is that everyone should try to develop their own implementation of WAR. The WAR framework is there for everyone to use. The presentation at BaseballProjection.com gets it exactly right. At this point, just work through it yourself, let's see where you end up, and we can take it from there.

Aug 22, 2013 2:34 AM on The Series Ahead
 
TangoTiger
(57181)
Comment rating: 1

You are wrong in your position with Trout being solved with the margin of error. The margin of error goes BOTH ways, which means it's just as likely Trout is much better than he's being shown as he's much worse than he's being shown. If Trout is +8 +/-1.5 runs, then that makes his range +6.5 to +9.5. And if Miggy is +7.5 +/-1, then he's at +6.5 to +8.5. If anything, this kind of thing will reinforce Trout's greatness. (All numbers for illustration purposes only.)

Aug 21, 2013 1:02 PM on The Series Ahead
 
TangoTiger
(57181)
Comment rating: 3

I agree. I consider FRA to provide such dubious results that I discard it completely. On my blog, we discussed this issue, and it seems to be a GB-bias, but who knows really. Seeing the career totals of Felix, Maddux, and Doc, with no justification at all for why they fare so poorly, is enough for me. And since FRA is the central component to WARP for pitchers, that means WARP for pitchers is useless to me.

Aug 21, 2013 12:58 PM on The Series Ahead
 
TangoTiger
(57181)
Comment rating: 1

Right. You should devote an entire article on it, so it doesn't get lost in anything else you will be discussing.

Aug 21, 2013 12:21 PM on The Series Ahead
 
TangoTiger
(57181)
Comment rating: 1

Another point in favor of using the same WAR term (though different method of calculation) is that it promotes each individual person to come up with his own WAR implementation. If you don't like fWAR or rWAR, then come up with your own WAR: http://www.insidethebook.com/ee/index.php/site/comments/everyone_has_their_own_war/

Aug 21, 2013 12:15 PM on The Series Ahead
 
TangoTiger
(57181)
Comment rating: 1

It's definitely a point for discussion, especially in terms of AL v NL being highly imbalanced. And if you look at it historically, obviously the WWII years were lacking in huge talent. So, yes, that should be on the table.

Aug 21, 2013 12:12 PM on The Series Ahead
 
TangoTiger
(57181)
Comment rating: 1

For those who don't know, Baseball Reference followed the implementation used here: http://www.baseballprojection.com/war/e/erstd001.htm And that's based on the framework I've described on my blog: every component compared to average, with the replacement level treated as its own component.

Aug 21, 2013 12:11 PM on The Series Ahead
 
TangoTiger
(57181)
Comment rating: 3

This is ultimately the problem that people aren't going to want to discuss. You can have Andrelton Simmons in 2013 be worth +30 runs on fielding +/- 15 runs. But, if he continues to pile up +30 run seasons, you'll be able to restate his 2013 season as say +30 +/-10, with the knowledge of future seasons. Again, most people are going to not like this idea, thinking that all seasons should remain independent. But, the reality is, they are not.

Aug 21, 2013 12:08 PM on The Series Ahead
 
TangoTiger
(57181)
Comment rating: 4

We're just going in circles. The end-result is the same, just as the end-result is the same if I say you have replacement-level fielding and average offense, or above-average fielding and very below average offense. That's not the argument I'm making. The argument I'm making is that you compare each component to the average, because that makes sense. And replacement level is a concept at the PLAYER level, not at the component level. That's the argument. That's the way I've presented WAR and that's how I've sold WAR.

Aug 21, 2013 12:05 PM on The Series Ahead
 
TangoTiger
(57181)
Comment rating: 1

Good. The seasonal error bars are not going to add up linearly for the career totals (should add up following RMSE). Have you figured out how to explain that for the masses?

Aug 21, 2013 11:56 AM on The Series Ahead
 
TangoTiger
(57181)
Comment rating: 0

On a related note: Colin, didn't you have error bars for nFRAA? I checked a couple of players, but I didn't see it. Were those removed?

Aug 21, 2013 11:48 AM on The Series Ahead
 
TangoTiger
(57181)
Comment rating: 0

Well, Sean is wrong if he has that written. Anyway, there is a difference in how I present it (each component at the league average, then one sweeping replacement level number at the player level). The practical difference is that by keeping things as I'm presenting it, then we don't have to have these conversations about "replacement level offense", "replacement level defense", etc, things that don't in fact exist. This was the #1 problem with the original WARP, which Clay finally agreed to in the end. And this would have been more obvious if we simply stuck to the presentation I advocate. It's tiring that we need to constantly correct readers who are not as knee-deep in this, because the damage was done, and continues to be done.

Aug 21, 2013 11:44 AM on The Series Ahead
 
TangoTiger
(57181)
Comment rating: 0

BR.com has a confusing presentation, one that many of us have expressed. Fangraphs follows the framework I have, which is that every component is compared against the AVERAGE. You can see that presentation at the bottom of any of the player pages. You can also see it at BaseballProjection.com, which is the precursor to References' WAR.

Aug 21, 2013 11:28 AM on The Series Ahead
 
TangoTiger
(57181)
Comment rating: 3

I would prefer that Fangraphs and BR.com go with the names I've coined. Fangraphs orginally called it "Win Values" or something like that, presumably as a differentiator. They went to WAR at some point. I don't know if fWAR/rWAR is a feature or not. I don't know if they both call it WAR, but have different methods of calculation, is a feature or not. If this was a court case, we'd each take sides, and just explain one point of view. I think you can reasonably make a case either way.

Aug 21, 2013 11:23 AM on The Series Ahead
 
TangoTiger
(57181)
Comment rating: 3

"Correct me if I'm wrong, but it seems that most systems use performance vs. replacement level for offense and then performance vs. the average (0 runs) for defense. " Since you asked: you are wrong.

Aug 21, 2013 11:12 AM on The Series Ahead
 
TangoTiger
(57181)
Comment rating: 2

WAR is the framework or specification. fWAR is *an* implementation of WAR rWAR is *an* implementation of WAR It's not identical, any more than Oracle's implemention of SQL92 is identical to DB2's version.

Aug 21, 2013 11:10 AM on The Series Ahead
 
TangoTiger
(57181)
Comment rating: 2

It used to be available on the Kindle and eventually will be again. It is available on the ipad. There is a link on my blog.

 
TangoTiger
(57181)
Comment rating: 2

Well presented and an inspired approach.

 
TangoTiger
(57181)
Comment rating: 4

One of the best research pieces of the year. {clap clap clap}

 
TangoTiger
(57181)
Comment rating: 1

MGL is correct. Which is why 99% of the people won't accept altering the quality of a SS's performance in 2013 based on how well he was estimated to play in 2011-2012. Furthermore, we'd have to constantly revise those estimate as future information comes in. We had no information with say Andrelton Simmons, and suddenly, we've got tons. What looks like favorable view of the quality of his chances in 2012 is now going to be changed to "fair" and possibly even "too harsh" in 2012, based on his performance in 2013. MGL got backlash when he updated how UZR was calculated in the off-season. Imagine the backlash as we have to update these things in-season.

 
TangoTiger
(57181)
Comment rating: 4

That's how Fangraphs shows it (bottom of every player page). The reader does have the choice of adding up the separate components (or picking and choosing), or letting Fangraphs automatically do it for them.

 
TangoTiger
(57181)
Comment rating: 3

5 points is 0.005, which times 600 is 3.

Apr 25, 2013 9:46 AM on Pitcher BABIP and Age
 
TangoTiger
(57181)
Comment rating: 2

Five points in BABIP is 0.10 in ERA.

Apr 25, 2013 8:54 AM on Pitcher BABIP and Age
 
TangoTiger
(57181)
Comment rating: 2

Mazzone was all the rage with WOWY a few years back. Can you break down his impact when he was with Braves and when he was with Orioles?

 
TangoTiger
(57181)
Comment rating: 2

Pizza: hate to burst your memories but, BR.com has his career BABIP at .232 (and Fangraphs at .230), which is even better than you thought! In addition, he was not "stable" inside any range. His performance followed a random distribution around that .232 mean.

 
TangoTiger
(57181)
Comment rating: 0

Right, if you include playing time (and on top of that a baseline), it's going to get funky. You are asking for a distribution of this equation, basically: PA x (TAv - baseline) That baseline is fixed, the TAv follows a binomial-type of distribution, and PA is going to be heavily skewed.

 
TangoTiger
(57181)
Comment rating: 1

TAv follows closely to OBP, which is a binomial metric, and so follows the binomial distribution. If your OBP is .330, we’re going to observe a slight skew given 600 PA, but, really, you won’t notice it. For ERA, it’s proportionate to the SQUARE of TAv (or similarly, the square of OBP). Just think of Bill James’ Runs Created which is, at its core, OBP x SLG. Which is kinda like saying OBP squared. So, if you take the square root of ERA, you’ll get something that is proportionate to OBP, and so, you should get that kind of distribution. Theoretically anyway. It shouldn’t be too hard for someone out there to prove me right (or wrong!).

 
TangoTiger
(57181)
Comment rating: 0

I think for all intents and purposes, TAv will be symmetrical (or close enough that you can barely notice it's not). ERA won't be (can't be). But the square root of ERA should be close to symmetrical. From the few I looked at, however, ERA looks fairly symmetrical.

 
TangoTiger
(57181)
Comment rating: 4

I don't really want to get into a semantical debate of evidence and claims. All I'm asking is for you to provide something that you've provided in the past, like this:

 
TangoTiger
(57181)
Comment rating: 4

Colin: you provided a conclusion with no evidence. You know how I feel about that. I'd like to see you redo the work you did here: http://www.insidethebook.com/ee/index.php/site/comments/pecota_percentiles_finally/ You provide the evidence, the conclusion follows that evidence, and the reader can feel comfortable that your conclusion is valid.

 
TangoTiger
(57181)
Comment rating: 3

"We’ve tested the percentiles against historical data, and we can report that they behave how you’d expect—80 percent of batters fall between their 10th- and 90th-percentile forecasts for TAv, for instance." Can you show those test results?

 
TangoTiger
(57181)
Comment rating: 11

I've half-joked that if I ever roll out my own implementation of WAR, I'd call it W/W (Wins over Willie, for Bloomquist), since he embodies the ideals of the replacement-level player, someone who manages to find a job, usually with an under .500 team, rarely gets to be more than a platoon player, and can play any position on the field.

 
TangoTiger
(57181)
Comment rating: 0

I'd consider MLS to be of some force in Montreal. Their attendance base is 20K, and they've drawn 60K at Olympic Stadium for their opener. I'd consider them in the same ballpark as Alouettes. http://www.cbc.ca/sports/soccer/story/2012/03/28/sp-mls-soccer-montreal-impact-stadium.html

 
TangoTiger
(57181)
Comment rating: 1

Maury: since you include MLS, then Montreal has the Impact. As for NY/NJ, I'd favor Brooklyn. If NJ, then somewhere closer to Edison, as it's the transportation hub in central NJ. Obviously, Yanks/Mets will fight regardless.

 
TangoTiger
(57181)
Comment rating: 3

More data is almost always better than less data. There are two kinds of biases: - random variation - systematic An example of random bias would be that someone mis-entered all teh fielding data for one month, and sometimes marked Beltre as the 3B, and sometimes Zimmerman, and sometimes Longoria, etc. Just no rhyme or reason to it. That data is pure junk (i.e., noise). We end up with 5/6ths of the data being valid and 1/6th of the data being junk. Even knowing that we have that situation, the overall data is still good. It simply adds a level of uncertainty. An example of a systematic bias is if data is always recorded differently if the fielder is named Deter and if another fielder is named Bryan. In this case, more data is WORSE because the systematic bias will rule the results. So, the question we have is how much systematic bias is there in the data. And can we somehow account for it? For example, Coors is a systematic bias if we look at Todd Helton's hitting stats. But, if we can account for that, if we know that Todd Helton in fact hit 50% of the time at Coors, and if we know how he did at Coors, then we can handle that. The issue with the fielding data is that we're not entirely sure how much systematic bias there is. Does that mean that because we don't know, we should just throw all the data away? Well, you can argue that. You can also argue that you can still use the data and increase your uncertainty level.

Jul 18, 2012 2:08 PM on Getting Shifty Again
 
TangoTiger
(57181)
Comment rating: 0

Maury, can't be too hard, can it? Put teams in various market size, find World Series over the years that have similar markets involved. Dodgers/A's played in both 1974 and 1988. But, Angels/Giants played in 2002. They probaby match up well, market-size-wise. Phillies/Royals in 1980 is smaller market than Phillies/Yanks in 2009. How much difference was the TV ratings? Cards/Rangers v Cards/Tigers v Cards/Royals v Cards/Twins v Cards/Redsox v Cards/Brewers. Did the ratings follow the market size? By how much? I think if you sit down and try different models, you can come up with something reasonable.

 
TangoTiger
(57181)
Comment rating: 0

Maury, can't be too hard, can it? Put teams in various market size, find World Series over the years that have similar markets involved. Dodgers/A's played in both 1974 and 1988. But, Angels/Giants played in 2002. They probaby match up well, market-size-wise. Phillies/Royals in 1980 is smaller market than Phillies/Yanks in 2009. How much difference was the TV ratings? Cards/Rangers v Cards/Tigers v Cards/Royals v Cards/Twins v Cards/Redsox v Cards/Brewers. Did the ratings follow the market size? By how much? I think if you sit down and try different models, you can come up with something reasonable.

 
TangoTiger
(57181)
Comment rating: 1

You can adjust for the teams involved, if you insist. But at the very least, the first step is to present the data. Is it the dramatic falloff that we see with the All-Star game (I'd bet no). World Series games still appear in the "top 10" events of the week. The All-Star game used to. Does it still? So, before we start to talk about how we need to adjust (and we do), the first step is to present the unadjusted data. Then, let's worry about how much the adjustment can really impact things.

 
TangoTiger
(57181)
Comment rating: 0

Maury, the "control group" would be World Series. So, can you show the ratings of the World Series (say Game 1) for every year, and show the % of that audience that the All-Star game got?

 
TangoTiger
(57181)
Comment rating: 5

This is a good sabermetric piece that simply makes it hard to comment on. I would hope that Max and his supporters see that the lack of comments doesn't signify anything other than that.

 
TangoTiger
(57181)
Comment rating: 0

Adam is one of the best there is. I enjoyed reading this article.

 
TangoTiger
(57181)
Comment rating: 3

For the record: Marcel adds 1200 PA of regression only if you use the 5/4/3 scheme. If you use the 1/.8/.6 scheme, then you would add 1200/5 = 240 PA of regression. In the example in the article, a player with 650 PA each year for 3 years will regress 11% toward the mean, not 40%. Marcel shows exactly how much each player was regressed, if you download the Marcel files. There's a column called "r", that shows the percentage of his performance stats that were used for the forecast. When you see r=.89, that means that regression was 11% toward the mean.

 
TangoTiger
(57181)
Comment rating: 8

Max is an outstanding saberist, in addition to being good people. He accepts constructive criticism in a very positive manner, and is always a pleasure to deal with. It's just a pleasure to see the next wave of saberists arrive, and Max is a great example of what you'd like to see. His work speaks for itself.

Jan 27, 2012 11:05 AM on Marking My Debut
 
TangoTiger
(57181)
Comment rating: 4

1. I agree that putting all the fungoes articles together would be good. Even just a PDF would be nice. 2. I'm surprised by some people's expectation of comments. Good comments drives out bad comments. Fools are intimidated by an intelligent discussion, and BPro subscribers have proven that to be the case for the most part.

 
TangoTiger
(57181)
Comment rating: 1

Presumably, the launch angle is directly related to the incoming pitch angle. That is, how much you can square the ball. In order for example to make best contact with a 12-6 curveball is to launch at a higher angle. However, that pre-supposes that you will actually make contact like that. For a submariner pitcher, you'd want a flat, to maybe negative launch angle. So, if the pitch angle is coming in at say +15 to -45 degrees, your launch angle should be say from 0 to +30 degrees. Something like that. Is that about right Mike (the general idea anyway, not necessarily the numbers)?

 
TangoTiger
(57181)
Comment rating: 2

I object (strenuously!, ala Demi Moore) to Bill's contention that a team's bullpen is so taxed to the limit that it couldn't find 17 innings somewhere. Let's first start with what could be the "limit" for a team's bullpen. Last year, the Braves led the NL in ERA for bullpen AND were #2 in most IP (at 522). Fine, the rookie stars on the Braves are an exception you say. The Nationals had almost as many IP (520), with an ERA of 3.20, 4th in the league. The Pirates led with most IP (526), and their reliever ERA 3.76) was a bit worse than the league (3.59). If you take the top 4 in the NL in IP by relievers (average of 521 IP), the average ERA for those teams was 3.36. It seems to me that we can say that the "limit" to a team's bullpen is 520-530 innings, and that's really being conservative. Since MGL is saying that we want to add 17 innings to the bullpen, then any team that had less than 500 innings on its bullpen would be able to figure out how to get those 17 innings. Nine of the 16 teams had under 484 innings, and therefore, would EASILY be able to add 17 innings to its bullpen. Reds, Cubs, Rox were all on the 500 inning bubble, so they'd have to stretch a bit to get to 17 innings, but even doing so, they'd still end up with fewer innings than the league-leader in Pirates. Only 4 of the 16 teams were close to that 520-530 limit that they couldn't add another 17 innings. I should also note that the league leader in IP in 2010 was the Nationals (at 546 innings) with a low ERA of 3.35. The Padres in 2009 led with 571 innings, and also a better than league average ERA (3.75). Therefore, you can make the case that every NL bullpen in 2011 had plenty of room to add 17 innings. So, what Bill is saying sounds nice, and seems reasonable, but when you look into it, you see that it's not a concern most of the time.

 
TangoTiger
(57181)
Comment rating: 12

I just love the outright candor of Steve's post, while accepting the legitimacy of the readers' opinions. Nothing to parse, and everything taken at face value. If only politicians were more like Steve has shown himself to be here.

Oct 04, 2011 1:54 PM on The O-Swing of Things
 
TangoTiger
(57181)
Comment rating: 0

It's the same metric, set to a different scale, if you want to get technical. *** For Fangraphs readers, it's akin to wRC+ on their site. All the metrics at Fangraphs and now at BPro courtesy of Colin, have as their basis Linear Weights (long live Pete Palmer). They differ basically in their park adjustments. You see that at the very top with Bautista at 181 at Fangraphs and 199 at BPro. Otherwise, the two lists are reasonably similar.

Sep 22, 2011 10:52 PM on Addition by Addition
 
TangoTiger
(57181)
Comment rating: 0

"Once you control for sequencing and defensive support via our new version of Fair Run Average" Actually, FRA is not controlling for sequencing. It *presumes* the sequencing (unlike say FIP, which ignores the sequencing). In that respect, it is much closer to actual runs allowed. If you think a pitcher has a men on base skill, then you should like FRA's idea. What FRA controls for is the the balls in play, by specifically removing the hits and outs, and replacing them with batted ball outcomes (GB, Pops, Flies), and estimate hits and outs. In any case, seeing the FRA numbers diverge that much among the big 3 doesn't leave me with a warm and fuzzy feeling that it's calculated properly.

 
TangoTiger
(57181)
Comment rating: 0

There's several articles linked here: http://www.insidethebook.com/ee/index.php/site/comments/value_of_a_called_ball/ I'm sure at least one will appeal to you.

 
TangoTiger
(57181)
Comment rating: 1

A crude way to think about the run value of a strike or ball is this way: The run value of a walk is around +.30 runs, and the run value of a strikeout is around -.27 runs. So, going from an 0-0 count to a 4-0 count means that each called ball is +.075 runs. Going from 0-0 to 0-3 means that each called strike is -.09 runs. That means that switching a called ball to a called strike is going from being at a +.075 run state to a -.09 run state, or around a .16 run swing. So, getting that one call every game for 150 games means .16 runs x 150 calls = 24 runs. This is just a quick crude way to try to frame the expectation.

 
TangoTiger
(57181)
Comment rating: 3

{clap clap clap} Great stuff!

Aug 04, 2011 4:54 AM on Why When You Go Matters
 
TangoTiger
(57181)
Comment rating: 0

I'm not sure why you are getting minused on your last 2 posts. SIERA purposefully limits itself to: 1. not using HR 2. not using prior years Those are constraints it imposes upon itself. Marcel purposefully uses HR and uses prior years, as does PECOTA. FIP purposefully limits itself to BB, HB, SO, HR, and current year. All these decisions are made because the metrics are trying to answer a specific question. So, don't expect any of them to change to be "better", because they have already decided that they want to be limited to some extent.

Jul 28, 2011 1:50 PM on Lost in the SIERA Madre
 
TangoTiger
(57181)
Comment rating: 9

I think Richard Bergstrom makes several fantastic posts.

Jul 25, 2011 5:57 PM on Lost in the SIERA Madre
 
TangoTiger
(57181)
Comment rating: 0

I only said that the findings of SIERA deserve further exploration. I didn't imply anything beyond that.

Jul 25, 2011 5:27 PM on Lost in the SIERA Madre
 
TangoTiger
(57181)
Comment rating: 3

My comment regarding nitpicky relates only to the ERA / RA9 testing. I agree that the presentation of SIERA obfuscates more than it enlightens. I also know that there are kernels of truth in there that would be much more powerful if it followed the model that Patriot is espousing. So, SIERA would benefit from further exploration and better presentation. It still deserves a place, and whether it's at Fangraphs or here, it doesn't really matter. I mean FIP deserves a place too, and it took until you came here to give it that place, which is a ridiculously long time to present such a simple stat.

Jul 25, 2011 4:03 PM on Lost in the SIERA Madre
 
TangoTiger
(57181)
Comment rating: 0

Colin said: "But by regressing against ERA, and including terms such as ground ball rate and starter IP percentage, SIERA perpetuates the other two failings of ERA that I note above." 1. The starter IP percentage acts as a proxy for the "Rule of 17" that I noted on my blog a year or two ago. That is, the more you pitch as a starter, the higher your BABIP, and the higher your HR/PA. Since SIERA purposefully ignores both BABIP and HR, then including starter IP percentage is an excellent parameter to use. That's not to say it's used in the best way, but that it's used at all at least shows that Matt has a great insight here. That SIERA is the only one to thought to even have it is a huge feather in its cap. As an example, it would be ridiculously easy for me to change the "3.2" constant in FIP to something else, like "3.3" for starters, "2.9" for relievers", and a sliding scale for the rest. 2. I agree that the true test must be against RA9. Though, in reality, any test against RA9 will deliver virtually the same result as ERA (if you select a random group of pitchers... but, if the only pitchers you select are all GB pitchers, then ERA will not be a good test). The distinction between ERA and RA9, with regards to what we are talking about here, is being very nitpicky. That's not a bad thing, but it's not a good thing either.

Jul 25, 2011 1:44 PM on Lost in the SIERA Madre
 
TangoTiger
(57181)
Comment rating: 2

And in any case, it doesn't matter to WARP. Felix's RoS WARP is wrong. There's a bug in there.

 
TangoTiger
(57181)
Comment rating: 1

The runs scored at home relative to away in 2011 is 93%, just like in 2009. In 2010, it was 81%.

 
TangoTiger
(57181)
Comment rating: 1

Since I was mentioned twice (for each side!), here is the thread I have on my blog. My explanation as to how to do RoS is in post 9. And, I agree in post 12 that you can't keep the weights constant. Dan chimes in in post 14, and if you look at that post and the link in post 10, it sounds like Dan implemented his weighting scheme (irrespective of actual PA) as a quick effort to get something rolling for this year. I'd expect Dan to improve upon it for next season. *** In any case, of all the things where we have disagreement in the saber community, weighting of performance by timeline is not one of them. The weighting of daily performance ALL will follow something along the lines of: weight=.9994^daysAgo for hitters weight=.9990^daysAgo for pitchers That is, the further back in time, the less weight. You can quibble about whether to use .9992 or something for hitters, etc, or, that you want it to accelerate faster, like: weight = .9998^(daysAgo^1.2) but, basically, we're all dancing around that scheme. *** I agree with Colin that all discussions should take place out in the open. It makes life easier, and 2000 heads are better than 2.

 
TangoTiger
(57181)
Comment rating: 0

I had considered that you have a different park factor, but WARP takes care of that, right? And so, having a forecast of 3.2 WARP on 98 IP is a rate that is far higher than his best season. That was the point of my showing it in tandem with his WARP in 2009 and 2010. Can you show the top 10 in RoS WARP (and their IP), both entering 2011, and right now?

 
TangoTiger
(57181)
Comment rating: 2

Colin, I think you have a bug with the "rest of season" (RoS) forecasts. I remember a while ago, Felix's RoS was 2.30 ERA, which was quite bold, considering that: a) his mean forecast entering 2011 was around 2.60 b) his season performance to then was worse than 2.60 So, given more information, his RoS should have been somewhat worse than 2.60. Now, his RoS is even lower at 2.19: http://www.baseballprospectus.com/card/card.php?id=HERNANDEZ19860408A That is even bolder since his current performance is basically a match to his career totals, and so, you'd need to have his mean forecast be higher than 2.60. His RoS is a 3.2 WARP, on RoS 98IP. His WARP in his Cy season was 4.4 (250 IP) and 5.6 in 2009 (239 IP). *** Also, the percentile forecasts are showing this timestamp: Last Update: 3/26/2010 14:48 ET Note the year (2010, not 2011).

 
TangoTiger
(57181)
Comment rating: 0

I don't know! What I do know is that every time I've done these kinds of studies, I worry if my conclusions seem too far-reaching. Here's a good example of what I'm talking about. (Do you get notified when we post?)

Jun 24, 2011 6:47 PM on Checks and Balances
 
TangoTiger
(57181)
Comment rating: 0

Any time you have a matched pair, you imply survivorship. That by itself implies that the first in the pair was produced by someone with more good luck than bad luck. That said, the pitcher also survives, so he also had more good luck than bad luck. Do those two things cancel out? I've done similar work that says: not. The pitcher benefits from more good luck than bad luck than the hitter, in terms of being allowed to survive into the next year. I mean, it's a pretty tiny effect, but we're looking at tiny effects to begin with. Just something to be very careful about.

Jun 23, 2011 11:06 AM on Checks and Balances
 
TangoTiger
(57181)
Comment rating: 0

Here you go: Elapsed Days

Jun 16, 2011 8:10 PM on Another look at ZiPS
 
TangoTiger
(57181)
Comment rating: 0

Perfect. Exactly the process I follow.

 
TangoTiger
(57181)
Comment rating: -1

daysAgo is human days, because that's how a player ages, based on real-time.

Jun 10, 2011 4:21 AM on Another look at ZiPS
 
TangoTiger
(57181)
Comment rating: 0

I've already provided that in the past. The weights is as follows: .9994^daysAgo for hitting .9990^daysAgo for pitching Pretty simple. You can try to fine-tune it with better empirical testing, and changing the weighting by the components. But, this is not such a big unknown problem. I've published the above several times over the past several years.

Jun 09, 2011 1:29 PM on Another look at ZiPS
 
TangoTiger
(57181)
Comment rating: 1

(Cross-posted from my blog.) This is a pretty simple issue as to what to do. Suppose you have the weights for the past seasons as 5, 4, 3, and a regression toward the mean weight of 2. You multiply each weight by the number of PA (with the regression part being 600 PA). That gives you the effective number of weight for your pre-season forecast. Your in-season forecast would get a weight of 6 or 7. That's it. That's pretty much all you have to do (with the understanding that weighting by component is better). So, let's take Bautista. His PAs: 2011: 249 x 7 2010: 683 x 5 2009: 404 x 4 2008: 424 x 3 Regr: 600 x 2 So, before 2011, you add up those PA (meaning 2010, 2009, 2008, regression), and you get a PA weight of 7503. If you include 2011, your total weight is 9246. Therefore, the pre-season forecast will have a weight of 7503/9246 = 81%. Guy is saying that ZiPS is using 78% / 22%? Doesn't seem outlandish considering I'd use 81% / 19%. By using 78% / 22%, that means that if you start with the 2010 and earlier weights (7503), then that implies a total amount of weight of 7503/.78 = 9619. That means 2011 is getting a weight of 9619 - 7503 = 2116. Given that he has 249 PA, that means he's counting 2011 as 2116 / 249 = 8.5. It is an overweight. I'm not sure I'd count it as a severe overweight. That would make the 2010 weight as 60% of the 2011 weight. That's fairly defensible. I use 80%, but I think best-testing by Brian Cartwright has shown using 70%.

Jun 09, 2011 10:37 AM on Another look at ZiPS
 
TangoTiger
(57181)
Comment rating: 0

The implication is that you'd have to apply the rule to all 4-pitch walks (and likely no-strike hit batters).

Jun 03, 2011 11:51 AM on Walk of Life
 
TangoTiger
(57181)
Comment rating: 0

The questions were all posted in the original thread. I answered, as best I can tell, all of them. No, this won't be a recurring feature. But, I do pretty much this on my blog, so you can hang out there if you like.

 
TangoTiger
(57181)
Comment rating: 0

The exchange is happening mostly on Bill's site (sub required). But I've also posted my research here: http://www.insidethebook.com/ee/index.php/site/comments/k_minus_bb_differential_or_ratio/ It's a fun read FWIW.

 
TangoTiger
(57181)
Comment rating: 1

The big thanks should go to the BPro readers, who came out with a great set of questions. I just filled in the holes.

 
TangoTiger
(57181)
Comment rating: 0

You should see some of the discussion here: http://www.insidethebook.com/ee/index.php/site/comments/batted_ball_puzzler/ Note that you may fall into the same trap as I did. If you limit yourself to the "A" dataset's 4 batted ball parameters, and ignore the other 3 performance numbers, specifically BACON_PRIOR, you are conferring an advantage to the "B" dataset. That's because the B dataset's 4 batted ball parameters are actually the other 3 performance numbers, but translated into 4 batted ball parameters. Therefore, you cannot focus on the labels, and presume that LD in A has any relationship to LD in B. And that means you can't discard the 3 performance numbers in the A dataset.

May 26, 2011 10:42 AM on A Batted Ball Puzzler
 
TangoTiger
(57181)
Comment rating: -1

This was supposed to be an impromptu poll, and you could only vote up (not down). I guess I should have made that clearer.

 
TangoTiger
(57181)
Comment rating: -1

This was supposed to be an impromptu poll, and you could only vote up (not down). I guess I should have made that clearer.

 
TangoTiger
(57181)
Comment rating: -5

+1 if you believe this statement "Inertia stops progress!"

 
TangoTiger
(57181)
Comment rating: -5

+1 if you believe this statement "Inertia is great!"

 
TangoTiger
(57181)
Comment rating: 0

That's about right. Say X = 5MM$, then ace pitcher gets 4 or 5X. The WAR for your 4 pitchers is 1, 2, 3, 5. There's nothing inconsistent with that you just said and what I just said, if you just dial back your "5 or 6" to "4 or 5".

 
TangoTiger
(57181)
Comment rating: 0

There's no roster spot issue. I said that given a choice between a pair of players, that a team will may each pair the same, if the sum of those is the same WAR. So +5 +1 = +3 +3 in terms of wins and salary.

 
TangoTiger
(57181)
Comment rating: 0

That doesn't happen. A team will pay as much for a 1 WAR and 5 WAR player as it would for two 3 WAR players.

 
TangoTiger
(57181)
Comment rating: 0

Nice stuff Vince. It looks like one of the things Boras does is get an extra year that other agents couldn't. But the way you do it, you are giving the other agent 0$ for that extra year. I'm sure if the other agent asked for some low number, but greater than 0, then he'd get that. Take Zito. I agree that 16 x 6 is what the average agent would have gotten, and so 18 x 7 is impressive. But if the other agent wanted 7 years, he could have asked for an extra say 5.5MM$, and it would have happened. So, 14.5 x 7 is what he could have gotten. To make the comparisons fair, you need to hold constant the number of years. *** Would it be possible to have access to your dataset used in this article? tom~tangotiger~net

 
TangoTiger
(57181)
Comment rating: 4

Great job!

Apr 08, 2011 12:21 PM on The Rookie Effect
 
TangoTiger
(57181)
Comment rating: 3

Beautiful! {clap clap clap}

 
TangoTiger
(57181)
Comment rating: 0

The Leverage Index of that situation is 0.4. No reliever has an LI as low as 0.4 for the season, meaning that that IS the situation in which you would want to use your mopup reliever. And, it's hardly "giving up the game". How is putting in a reliever that gives up 5 runs per 9IP a huge drop compared to a reliever that gives up 4 runs per 9IP? It's a drop, but it's not giving up the game.

 
TangoTiger
(57181)
Comment rating: 0

I posted a reply to Greg's paper on my blog, as well as posting his paper there.

 
TangoTiger
(57181)
Comment rating: 1

I'm a proponent of tandem pitchers. 96 runs would seem to be a huge exaggeration. I'd be happy to review Greg's research if he wants to email it to me: tom~tangotiger~net

 
TangoTiger
(57181)
Comment rating: 0

It's your internal ID number.

Mar 04, 2011 11:15 AM on Fourth Time's the Harm
 
TangoTiger
(57181)
Comment rating: 0

David, the way you said this: "Crisp, who has a background in the sweet science, told his side of the story ", and how everything was in the first-person, I presumed that this was an op-ed kind of piece, that Crisp was asked or offered to write up a first-person account. If instead this is you transcribing or interpreting what Crisp said, at the prompting of various questions of yours, you should make that much clearer. Nonetheless, I quite enjoyed the piece.

Feb 25, 2011 10:46 AM on Coco Crisp
 
TangoTiger
(57181)
Comment rating: 0

Nice work Jeremy. I posted a reply on my blog. Thanks for giving me more to think about.

 
TangoTiger
(57181)
Comment rating: 0

I agree, it would definitely have been a bug. When I do my valuations, I don't do a RP/SP split, and simply let the forecast and league structure dictate the value. And that value is what I said, that the top relievers would come in at 15-20$. Now, if the league dictated that each team select at most 2 relievers, then I don't see how you will get the top relievers at 17-20$. Maybe if you have 12 teams in an AL-only league? I'd have to work it out. The fantasy product is blocked at the office (am I the only one? maybe something other than "fantasy" can be in the URL?), so maybe you can try this for me. Re-run forcing 1 RP, 2 RP, 3 RP, 4 RP, and then report back how much the top 6 relievers get under each setting. Just guessing, but I'm thinking the numbers should be something like 7$, 10$, 13$, 16$ respectively. Something like that. It certainly cannot be fixed. Or, if you want to make it even clearer, do it with 1 SP, 2 SP, 3 SP, 4 SP. You should see a similar situation, where the numbers might be 10$, 14$, 18$, 22$ or something.

 
TangoTiger
(57181)
Comment rating: 0

Eric: are you sure this is how it works? I can understand for individual insurance policies, that's how it works. But, teams purchase group policies. My understanding, at least on the NHL side, is that teams will pay a premium based on their payroll, and the teams get to decide how to spread the coverage. For example, a team pays 10MM$ in premiums for 100MM$ in coverage, and the team gets to choose how to distribute that coverage, be it on 5 players or 15 players. (All numbers for illustration purposes only.) Or, are you suggesting that teams have the option to purchase either/or?

Feb 22, 2011 9:04 AM on Paying the Premium
 
TangoTiger
(57181)
Comment rating: 0

A "sample of 20 goals"? He didn't say that. He looked at ALL the goals for an entire season across the league, meaning he looked at over 6000 goals. The "15 out of 20" is the average per team.

Feb 22, 2011 4:17 AM on Scorecasting Review
 
TangoTiger
(57181)
Comment rating: 0

The very best closers should be worth 15-20$. I just ran the default PFM and Nathan was at 18$. Can you elaborate on what kind of league setting gets you a top closer at $3? Also, where exactly did Ben assert this, as clearly, it's got to have a very specific condition attached to it.

 
TangoTiger
(57181)
Comment rating: 2

If you sort the batters by descending WARP, and select the players with enough PA to get you to 180,000, you will get about 570 wins, which is just about correct.

 
TangoTiger
(57181)
Comment rating: 0

You know, presumably that Age 41 is where Bonds either was hurt, or had the huge dropoff from his preceding 4 other-worldly years. I think if you were to draw a more smooth line from his age 31 comps to his age 41 comps, you might see something more reasonable. Flat for 10 years simply tells me that you've got some unrepresentative player or players that skew the results. Basically, too much weight being placed here. Basically: age 31, 32, even 33 seems believable. Age 41 seems believable. Age 34-40 seems too optimistic.

Feb 18, 2011 9:43 AM on Projecting Pujols
 
TangoTiger
(57181)
Comment rating: 1

Colin: From age 32 through age 40, his OBP+SLG numbers are extremely flat. I also presume that of the 40 comps, Bonds makes up 3 of them (age 29, 30, 31), or 8% of the comp? What does it look like if you remove Bonds from the comp group?

Feb 18, 2011 8:29 AM on Projecting Pujols
 
TangoTiger
(57181)
Comment rating: 0

Fangraphs has just over 1000 wins above replacement, while Rally has it just under 900. That would set the win% level at close to .290 for Fangraphs and .320 for Rally (Baseball-Reference). With 724 wins, BPro is coming in at .350. Past articles by Matt and others made me think that BPro was set at .250.

 
TangoTiger
(57181)
Comment rating: 3

Love the Ichiro chart. Unique presentation and very descriptive too.

 
TangoTiger
(57181)
Comment rating: 1

I would actually like to hear from umpires. Specifically, will an umpire position himself more toward where the catcher is setup? If so, then he's got the "perfect" angle if a pitch comes that way, and he'll be offset at an angle if the pitch goes away from the catcher's target. Ideally, the umpire is positioned directly behind home plate in line with the pitcher's mound. But, since the umpire may be moving left/right pitch to pitch, we've got a potential bias in play.

Feb 16, 2011 6:25 AM on The Real Strike Zone
 
TangoTiger
(57181)
Comment rating: 11

{clap clap clap} Fantastic research. If commenting is sparse today, it's only because the readers are stunned.

Feb 16, 2011 4:58 AM on The Real Strike Zone
 
TangoTiger
(57181)
Comment rating: 0

I remember seeing Robin "Batman" Ventura get alot of play during his college streak.

Feb 14, 2011 6:39 AM on Baseball's Superheroes
 
TangoTiger
(57181)
Comment rating: 0

This post seemed reasonable enough to me. I see already two minuses, and then I saw the poster's name. I think the "minusing" system is a terrible system, acting as both "disagree" and as "disagreeable". Change the +/- to: + Worthwhile reading - Worthless reading (Pollution) In neither case do you need to agree or disagree with the point being made.

Feb 13, 2011 1:00 PM on Slash Lines
 
TangoTiger
(57181)
Comment rating: 0

I agree that minuses are reserved not only for tone, but if the reader simply disagrees with the comment. Ideally, enough people will disagree with this assessment by giving me a minus, and thereby proving my point!

Feb 12, 2011 6:03 PM on Slash Lines
 
TangoTiger
(57181)
Comment rating: 0

If that's the case, then I don't understand the point of Jay saying: "It's probably worth pointing out that whether you liked this piece or not, none of you paid money to read it - it was out in front of the paywall, free to anyone who stopped by." I don't see why it was worth pointing out at all other than to tell us to have a different standard.

Feb 12, 2011 6:00 PM on Slash Lines
 
TangoTiger
(57181)
Comment rating: 0

Just to be clear, I'm not speaking for myself. I personally have no opinion on the article. There are plenty of people that get here from their Google Reader. They won't see any indication that it's free or not.

Feb 12, 2011 5:58 PM on Slash Lines
 
TangoTiger
(57181)
Comment rating: 5

Actually, I thought that an intentional dramatic pause. It read much better that way. Giving us your director's cut is similar to getting the alien to shoot at Han Solo first. Ya killed it Craig!

Feb 12, 2011 7:20 AM on Slash Lines
 
TangoTiger
(57181)
Comment rating: -1

Jay: I don't always notice on the home page if there's a logo. And even if I do, once I click on the article, I won't remember if there's a logo there or not. If a reader's rights are somehow different for an article that is premium or free, then make it more clear on the article itself. Why not a logo on the article itself, rather than the page prior? Why not at the end of the article, saying "this page available for all and not part of your subscription". Unless you do that, then BPro should not expect a subscriber to respond differently in the comments section based on whether the article was free or not.

Feb 12, 2011 6:22 AM on Slash Lines
 
TangoTiger
(57181)
Comment rating: 1

Jay, I'm not sure how a subscriber is supposed to know if an article is behind the pay wall or not. I'm also not sure that because the article is available to everyone means that the subscriber has less "rights" in terms of voicing their opinion. I recognize the general sentiment of those offended that this article was an enormous change of pace, and completely unexpected in terms of, well, everything! A South Park / Jackass type of alert at the beginning would have certainly gone a long way. Then again, who knew?! Even the Jackass notice likely didn't occur on the very first episode.

Feb 11, 2011 4:58 PM on Slash Lines
 
TangoTiger
(57181)
Comment rating: 5

Comment of the year as far as I'm concerned!

Feb 11, 2011 12:35 PM on Slash Lines
 
TangoTiger
(57181)
Comment rating: 2

Actually, you DID AND DO have a "NSFW" tag on the home page in the teaser. I think if that had carried over to the main article, that would have helped (somewhat, though presumably someone will always say it's not enough). Then again, probably many readers might not even know what NSFW even means.

Feb 11, 2011 12:20 PM on Slash Lines
 
TangoTiger
(57181)
Comment rating: 4

I agree with the general sentiment from the readers about how uncertain some of the metrics are for the purposes here. What you can do instead is ask: what are the chances that this player really is above average in this category? So, a player at -1 in baserunning runs might be at 40%, and someone at +1 in baserunning might be at 60%. And you do that for every metric, and then multiply each of the 5. So, Pujols might be 100% for average, and 100% for power, and 55% for baserunning and 55% for functional arm, and you multiply them all and say: he's 25% chance of being above average across all 5 tools. Do this for everyone, and see what you get.

Feb 10, 2011 3:19 PM on Tooling Around
 
TangoTiger
(57181)
Comment rating: 0

I like what the other reader said about "functional speed" as opposed to pure speed. Since Ben include footwork and accuracy, we're talking about "functional arm" as opposed to "arm strength". By that definition, David Eckstein at his peak could have been considered to have a plus "functional arm". If readers don't like the answer, then they should change the definition! *** Setting that aside, polls certain are subject to biases. And there are three, all of which can unfairly push Pujols up: 1. Halo effect 2. Past performance / memory (i.e., denial) 3. Positional It's certainly possible how Pujols score of 57 in the arm strength category, where 50 is MLB average (neutral position) would be biased, and that it should maybe be a 40 or 45. Even so, he gets top marks in the other categories, so his "functional arm", as Ben defines it, would still seem to put him at least average. But, like I said, sometimes it's hard to judge Pujols' arm accuracy, if he's only making 30 foot throws. Is Phil Simms a more accurate thrower than Warren Moon? Maybe on short passes? Put Jay Bruce or Evan Longoria at 1B, and who know, Pujols might out-accuracy them.

Feb 10, 2011 12:55 PM on Tooling Around
 
TangoTiger
(57181)
Comment rating: 1

Ben: Good stuff. One option you can consider is to use standard deviations in the 5 categories, and simply multiply them all (setting negatives to 0). This way, a guy who is barely below in one category (Tulo gets 0) and barely above in another category, will both come out low. You can also try "standard deviations above replacement", and add 1 to each SD before multiplying them (and setting the rest to 0). Say for example Tulo is +2SD above the mean in 4 categories, and is -0.1 SD in the 5th. You give him 3,3,3,3,0.9. Multiply them all and you get 72.9. (If you want, put that to the power of one-fifth, and subtract 1 to get you to +1.36 "average" SD.) Someone who is +1 SD in each of the 5 categories will come out to an average of +1.00 average SD. Something like that...

Feb 10, 2011 7:37 AM on Tooling Around
 
TangoTiger
(57181)
Comment rating: 1

I remember that game! The Expos were leading 15-2... and Jeff Reardon ended up with the save.

 
TangoTiger
(57181)
Comment rating: 2

My response is on my blog.

Feb 08, 2011 11:49 AM on They're Here!
 
TangoTiger
(57181)
Comment rating: 0

SIERA excludes HR. So anyone with a HR skill will under/over. Brett Myers for one. FIP excludes batted balls. So anyone with a batted ball distribution skill will over/under. Felix maybe is one. Basically, whatever parameter is being ignored is a candidate for being over/under. These metrics purposefully ignore parameters because they want to, not because they necessarily think there's no skill there.

 
TangoTiger
(57181)
Comment rating: 0

I started a thread on my blog.

 
TangoTiger
(57181)
Comment rating: 0

Way ahead of you. Check my blog for FutureFIP.

 
TangoTiger
(57181)
Comment rating: 0

If there was a common thread, it would be identified and included as a parameter in the estimator.

 
TangoTiger
(57181)
Comment rating: 0

Beautiful! So, 7.5 LHP to 50RHP, or 13%. And since Ben said the general population is 10% - 15% LH, looks like we've got a reasonable model.

Feb 03, 2011 12:26 PM on A Little Bit Softer Now
 
TangoTiger
(57181)
Comment rating: 0

I would hope it's by pitcher. One "trick" I use is to use a pitcher's 25% fastest pitches and treat that as his "top speed". Other than a couple of knucklers who we would flag anyway, it works pretty well. Try it out Joe, and let us know what you get.

Feb 03, 2011 12:12 PM on A Little Bit Softer Now
 
TangoTiger
(57181)
Comment rating: 0

The error definitely can't be symmetrical. But, that's a vagary of using runs per out. If you instead used the square root, you'd get something closer to symmetrical. That is, if we treat runs as a multiplication of OBP and SLG (for illustration purposes), and if each of those has a symmetrical error, then multiplying the two won't give you a symmetrical error.

 
TangoTiger
(57181)
Comment rating: 0

I wouldn't use the word within, as I think it would imply +/-. In your case, you are saying that the *range* is 2.1 runs (i.e., +/- 1.05). But, yeah, ERA is notoriously difficult to estimate because of BABIP and sequencing.

 
TangoTiger
(57181)
Comment rating: 0

Mike: excellent stuff. Since the RH pitchers were selected for things other than their fastball speed, but the top 10 LH pitchers were selected solely for their fastball speed, it's not a surprise that those guys would post a higher fastball speed. In order to counteract this bias, Mike went to the top 23 LH pitchers to match on fastball speed. 23/(23+95) = 19% (rather than the 10% or so of the general population that is LH) Another way to say it is if Mike limited his pool to fastest 50 RH pitchers, then we might get a closer match to say the 5 or 6 fastest LH pitchers.

Feb 03, 2011 10:42 AM on A Little Bit Softer Now
 
TangoTiger
(57181)
Comment rating: 0

Good stuff Ben. *** Can you give us the following breakdown: - Start with a count of the number of RH starting pitchers (say that's 100). - Take the fastest 10 LH starting pitchers (or one-tenth the RH count) What is the fastball speed of these two groups? Will it be similar? Basically, we want to keep the proportion of pitchers equivalent to the proportion in the real-world. The extra LH pitchers are all those who are the softer tossers and are bringing down the overall average. *** Not exactly all this as fastball speed is not the sole determinant. In any case, if you show a breakdown like this, you might get something that is more illustrative.

Feb 03, 2011 8:15 AM on A Little Bit Softer Now
 
TangoTiger
(57181)
Comment rating: 0

Agreed.

 
TangoTiger
(57181)
Comment rating: 0

It's not clear at all "short term". If Livan has say a 2.50 FIP over the first two months, but a 5.50 SIERA over those same two months, and the question you are asking: "How will he do over the next 4 months", well, my answer is "Use his entire career." You are suggesting that if you intentionally limit yourself to only using two months of short-term data, and discarding the rest of his past data, then SIERA will do better. Well, giving that the batted ball distributions stabilize faster than HR rates, then you are correct. But, there is no reason to limit yourself to only looking at his first two months of data. What we have with Livan is a history, and you use that history. And this is exactly what you have shown, that if you look at all pitchers with a minimum of 400 IP, then FIP does a bit better than SIERA. That is, knowing his HR allowed (that's what's in FIP but not in SIERA) is better than knowing his batted ball distribution (that's what's in SIERA but not in FIP). So, if you want to argue that for guys with less than 200 career IP you prefer SIERA, then fine. *** Two more points: 1. I'll keep repeating this, but as long as you compare SIERA to park-adjusted future ERA, and you compare FIP to park-adjusted future ERA, you are biasing the results against FIP. You should no longer perform that test ever. If Ubaldo has a high FIP one year because of HR, he'll have a high FIP the next year because of HR, and you can't compare it to park-adjusted ERA (which presumes a flatter HR rate). 2. FIP is not meant to be predictive! FIP merely represents current performance. In no way should one even think that you would regress K rates the same as HR rates. If I wanted a "predictive FIP", I would probably do something like (5*HR + 2*BB - 2*SO)/PA + constant or something. I think anyone here can find a stat that predicts future RA9 better than FIP and better than SIERA by focusing only on HR, NIBB+HBP, SO. There's my next challenge to the community.

 
TangoTiger
(57181)
Comment rating: 0

Right, all legitimate questions. The test is the following: given all known information for each pitcher (his career past performance, his recent past performance, his batted ball distribution, his performance with men on base, the fielding talent of his past fielders, his parks, his past teams, his 2011 team, his 2011 fielders, etc), what will be his RA9 in 2011? Now, FIP is saying: "I don't care about anything, other than his BB, K, HR, HBP numbers. I'll make my estimate based solely on that." PECOTA, SIERA, et al would say: "My god, I definitely need all that past information. It's critical that I know all that. I'll make my estimate based solely on that." And when 2011 comes to a close, what's going to happen? I think you can make a decent case that all that extra effort may bring you very little, and perhaps will even be a negative (i.e., over adjusted). So, that's the real test. Until then, we're dancing around the entire issue with these various other tests, because they are all going to be biased to some extent toward one metric or another based on however you setup the various other tests.

 
TangoTiger
(57181)
Comment rating: 0

So given 400 IP or more, FIP wins? Even though you are comparing against park-adjusted ERA instead of actual ERA? This seems like a big deal, a huge deal, no? This is saying that the batted ball data is worse than just knowing the number of HR allowed. Am I misinterpreting?

 
TangoTiger
(57181)
Comment rating: 0

Good stuff, I didn't note the distinction in that table.

Jan 31, 2011 12:45 PM on Testing SIERA
 
TangoTiger
(57181)
Comment rating: 2

Ah, RA9. Perfect name. I keep using ERA and RA as pairs (each denoting a rate stat), and it's never sat well with me, since the term RA is also used as a counting stat. RA9. I like it. Now we just need all the saber-stat sites to use it.

Jan 27, 2011 12:43 PM on Testing SIERA
 
TangoTiger
(57181)
Comment rating: 2

Can you also post your dataset used to create all these charts, so the rest of us can do our own testing?

Jan 27, 2011 8:43 AM on Testing SIERA
 
TangoTiger
(57181)
Comment rating: 8

Also, can you test with unadjusted ERA, not the park-adjusted? After all, FIP itself is unadjusted, and you can't take Ubaldo's unadjusted FIP in 2009 and compare it to his park-adjusted ERA in 2010. This is clearly unfair to FIP. Alternatively, park-adjust FIP in 2009 to test to park-adjusted ERA in 2010. SIERA positions itself as being park-neutral, so comparing park-neutral SIERA in 2009 will obviously give it an advantage to park-neutral 2010 ERA. Finally: we really don't care about ERA by RA. The UER is some biased construct, as FB pitchers like Santana will give up far fewer UER than GB pitchers like Brandon Webb.

Jan 27, 2011 8:42 AM on Testing SIERA
 
TangoTiger
(57181)
Comment rating: 1

Good stuff Matt. Can you include Batted Ball FIP in your testing?

Jan 27, 2011 8:36 AM on Testing SIERA
 
TangoTiger
(57181)
Comment rating: 0

Is there a reason that BPro members can't email each other directly (we can setup our profile to allow us to be reachable or not). Anyway, BurrRutledge: email me a tom~tangotiger~net , and replace ~ as appropriate.

 
TangoTiger
(57181)
Comment rating: 1

Studes' solution here is one of the best ones actually. He should link to it...

 
TangoTiger
(57181)
Comment rating: 5

Jay, for the next 364 days, you can do no wrong in my book.

 
TangoTiger
(57181)
Comment rating: 0

David does some of the best interviews, and I always look for his articles.

Dec 31, 2010 12:30 PM on Best of Q&A 2010
 
TangoTiger
(57181)
Comment rating: -1

Just looking at his card, he had a two-year jump that is equivalent to the top dogs. Can you do that kind of list too?

Dec 30, 2010 9:32 AM on WARP Speed
 
TangoTiger
(57181)
Comment rating: -1

Eric: I was thinking Roy Halladay. How did he rank in the one-year category?

Dec 30, 2010 9:30 AM on WARP Speed
 
TangoTiger
(57181)
Comment rating: 0

Presuming 65, and they averaged 0.8 WARP and 2.2MM$ salary above the minimum, that's 52 wins and 143MM$ above minimum. Add to that the 38 wins from the multi-year relievers and their 215MM$ above minimum gets you to 90 wins and 358MM$ above minimum. That's an average of 4.0MM$ per win. Just last week, I posted that I estimated that the average marginal $ per marginal WARP win was 4MM$ per win. From this standpoint, the amount of dollars allocated to relievers is just right, but that there is an obvious inefficiency between single-year and multi-year. And this is true for all positions, not just relievers.

 
TangoTiger
(57181)
Comment rating: 0

Eric, good job! Can you tell me the total number of 1-yr relievers, and the total number of dollars paid and total number of WARP in Y1?

 
TangoTiger
(57181)
Comment rating: 0

I think what Kaiser's point is one that permeates everywhere: if you see a correlation of r=.10, then, that's it. It's r=.10. But, it's only r=.10 when BIP=400 (or whatever Matt used). Indeed, the r numbers are useless unless you also know the number of opps. If r=.10 when BIP=400, then r=.50 when BIP=3600. Unless you have a systematic bias, you can get r to approach 1 on almost anything, as the number of trials approaches infinity.

 
TangoTiger
(57181)
Comment rating: 0

I've mentioned this to Matt on my blog, but I'll repeat it here. FIP takes no position on the amount of skill level captured in any metric. The single and sole purpose of FIP is to capture the HR, BB, HB, SO observed results (a subset of a pitcher's results) and expresses it as a single number. (And, for ease of use, scale it to ERA.) This is no different than OBP treating BB and HR identically. That FIP includes persistent results like SO and less-than-persistent results like HR is irrelevant. That's exactly the case with OBP as well, as it includes a persistent result like BB and less-than-persistent results like singles. If you want to know how the observed results, things that actually happened, is associated to runs, then FIP gives you that. SIERA will not give you that. That's not a knock on SIERA, but neither are we going to hold it to the standard that it doesn't tell us what actually happened, when it doesn't purport to do that to begin with. Observed results is a combination of the true talent level, other biases, and random variation. If you want to know the pitcher's true talent level inside that observed result, then clearly you have to do something to FIP in order to use it for that. Back to Matt's article now...

 
TangoTiger
(57181)
Comment rating: 0

Even if the observed .550 is significantly different from .540, that does not mean that the true level is now .550. It just means that it's above .540. It could now be a true .541. After all, if you make the baseline .545, an observed .550 is no longer statistically significant, and therefore, we have to accept .545 to still be the true. In other words, the observed performance only indicates that the true has moved, but not that it has moved completely to the new observed level.

 
TangoTiger
(57181)
Comment rating: 1

Christina, thank you for that very balanced retort. This is exactly the kind of mindset to have in these discussions. *** To no one in particular: It's important to note the strengths and weaknesses (together) of any metric that we use, so that this doesn't become a political debate, where we advance our own agenda. Rather, the point is to advance knowledge, and let the reader be the one to be able to come up with his own conclusions, satisfied he's been provided with relevant data from both sides, even if presented by a single source.

Dec 01, 2010 12:39 PM on Short Shrift?
 
TangoTiger
(57181)
Comment rating: 0

Mike, Christina linked to Colin's piece to support her position: "The problem is that, substantively, where defense is concerned we're really no better off than we were a decade ago. " That is a conclusion. A conclusion that is unsupportable based on the evidence in that article. Colin's article is asking questions, about being a good saberist. Your own comments is also about how Colin's article was about asking questions. If Christina wanted an article to support her summary opinion, that article was not it. *** Yes, if you read Christina's comments with a broad stroke, and ignore her summary conclusions, then her points are somewhat valid. Her point would have been more honest had she made it with doubt and confusion, rather than the certainty of her conclusions that she included in her points. She threw in digs about "horribly flawed" ZR, but not about "horribly flawed" Davenport. I read her two paragraphs as her applying typical confirmation bias.

Dec 01, 2010 8:49 AM on Short Shrift?
 
TangoTiger
(57181)
Comment rating: 0

Christina said: "The problem is that, substantively, where defense is concerned we're really no better off than we were a decade ago. " That is a terrible conclusion to Colin's article. Christina echoed a reader in the linked article, who said: "Seeing as we haven't made much progress in 20 years with defensive metrics..." This is what I said in the comments section in reply to that statement: ================== Tangotiger =================== However, I am bothered that a reader, after reading Colin's piece, would come to a conclusion like: "Seeing as we haven't made much progress in 20 years with defensive metrics..." The only fair conclusion to make is that we don't know how much progress we have made, and not that we haven't made much progress. We've made "some". Is that a little? A lot? You can't say "not much". This is part of the nuance in Colin's piece that may be glossed over, if said reader is representative of a portion of the readership. ========================================== This is what Mike Fast said in reply to that reader: =============== Mike Fast ================= Colin said he wasn't sure and wasn't sure how to tell. You seemed to wipe away the uncertainty and conclude that we have made no progress. There is reason to believe we have made progress, certainly on the theory side. But until we can test, we won't know for sure. That's historically been the standard in sabermetrics. However, people who have developed fielding metrics will take your criticism very differently if you say, "I don't see how to tell how accurate your metric is" versus "Your metric is worthless." The former is a statement of fact that can be contested and explained, though it may raise some emotions. The second is a very value judgment that comes across as very dismissive and not focused on the examination of facts. ========================================== As for the +10 and -10 argument, that applies to everything, be it fielding, offense, pitching, hockey, football, car accidents... these are all results of sample observations.

Dec 01, 2010 7:21 AM on Short Shrift?
 
TangoTiger
(57181)
Comment rating: 2

If the old Barry Larkin could prevent the young Pokey Reese from playing SS, Derek Jeter could prevent any youngster except Ozzie Smith.

 
TangoTiger
(57181)
Comment rating: 0

You aren't missing anything. Ideally, we always want to express things on two dimensions: .600 win%, 30 decision, or 18-12, etc. The problem is when we insist on expressing things as a single dimension. Do you show that as +3 wins above average? Do you show that as +6 wins above replacement? If everyone has 30 decisions, then it won't matter. But, suppose you have someone who is .800 win%, 15 decisions, or 12-3? Do you show that as +4.5 wins above average? Do you show that as +6 wins above replacement? Basically, in an ordered list, do you want to see 18-12 appear before, after, or tied with the guy who is 12-3? (Note: I'm using W/L as a proxy for a pitcher's ERA or FIP or your favorite pitching stat.)

Nov 05, 2010 10:24 PM on Replacing Replacement
 
TangoTiger
(57181)
Comment rating: 0

Well, I wouldn't be surprised. Virtually every first or second year player is going to be paid 400,000$ to 500,000$, and that includes all the great rookies and sophs. In no way did I mean to suggest that MLB min = replacement level. I DID mean to suggest that a free agent paid the MLB min IS replacement level. That's two completely different things.

Nov 04, 2010 1:27 PM on Replacing Replacement
 
TangoTiger
(57181)
Comment rating: 0

As someone who has done his part in creating wins above replacement (WAR), I'm all for doing the math myself, and then giving out the answer. I'm trying to give a different angle to those people who don't buy into the replacement concept that we can still meet on Canal Street if they get off at Lafayette instead of Broadway. That two sane people would value (in wins or dollars) Ben Sheets expected to pitch 81 innings the same as an average starter expected to pitch 200 innings, without one guy's framework necessarily being better than the other's.

Nov 04, 2010 1:25 PM on Replacing Replacement
 
TangoTiger
(57181)
Comment rating: 0

I should also point out that studes uses "Wins Above Bench", arguing that the bench player is the replacement player (similar again, to the way we'd think of the NBA). The only issue there is that the bench player is not paid the league minimum, so, that's not the true zero point. However, given the salary paid to the bench player (say 750,000$), then it would be easy enough to extrapolate that to 400,000$, by reducing his win impact slightly downward. Again, this goes to the idea of the defacto replacement level player (the talent level at which a team will not pay for a player).

Nov 04, 2010 11:42 AM on Replacing Replacement
 
TangoTiger
(57181)
Comment rating: 0

But that value in terms of wins is analogous to the value in terms of dollars. Having Ben Sheets for 81 innings is equivalent, in impact of wins, or dollars, to an average pitcher with 200 innings. You don't need to have the replacement concept to show that, since I showed that you can make the comparison without it. The replacement concept makes it clearer, certainly. It's just not a necessity. As others have mentioned with NBA, do we really need to compare to the "13th man"? Or can we get there in different ways, especially when you consider "chaining"? It's not the 13th man that replaces the starter, but the 6th man. And his spot is taken by the 7th man, and so on.

Nov 04, 2010 11:40 AM on Replacing Replacement
 
TangoTiger
(57181)
Comment rating: 0

I put in my two cents here, which I'll recopy here: [b]The need for the defacto replacement level[/b] It doesn’t matter if you show a pitcher is a .650 pitcher, and another is a .500 pitcher, or you show then as +.250 and +.100 above some baseline. They are still in the same order, and in the same degree of difference. What is missing is the playing time component. So, suppose you have the .650 pitcher with 81 innings (9 full games), and the .500 pitcher with 202.5 innings (22.5 full games). If you use the .400 baseline as the “zero” level, then these two pitchers have the same value: they will be paid by their teams the same amount. They both have 2.25 WAR (that is, +.250 x 9 = 2.25; +.100 x 22.5 = 2.25). They will both get paid about 9MM$. And you get that in the marketplace, if you think of Ben Sheets and an average pitcher. We didn’t HAVE to have the .400 baseline as the comparison point. You could have broken it down into two steps: the marginal dollars over average, and what you are paying for average. The above pitchers look like this: .650 win%, 9 full games = +.150 x 9 = 1.35wins over average .500 win%, 22.5 full games = 0 wins over average Say for example that a pitcher is being paid 4MM$ for each win above average. So the first pitcher is getting 5.4MM$ over average, and the second one is zero. Now, the question is: how much do you pay for average innings? Let’s say that we are paying 0.4MM$ for each complete game. So, the Ben Sheets guy, the 81 innings or 9 games, will get 3.6MM$ if he pitches average. Added to that is his 5.4MM$ for pitching above average, and he’s paid 9MM$. The second guy is getting 0.4MM$ x 22.5 games = 9MM$. And with zero wins above average, he still gets to 9MM$ total. See? It’s the EXACT SAME THING. All the replacement level does is let’s you merge the two steps into one. If you look at JC’s book, he breaks it down into these two steps exactly like I’m doing it. Except he assigns only 1MM$ per marginal win (more or less), and then he has an insanely high value of MM$ per playing time. So, when you try to combine the two steps into one, he ends up with a defacto replacement level that is insanely low. Therefore, by thinking in terms of replacement level, we are exposing exactly where the zero point is, the point where it doesn’t matter how many PA or IP you give, no team will pay you for that crappy level of talent. Indeed, this is EXACTLY how teams think. “He’s a .300 pitcher? Don’t even talk to me about him being a workhorse… he’s costing us wins.” So, this is why replacement level is important: it reflects reality. That’s what economic theory is supposed to do, it has to reflect reality. And if it doesn’t, then it has to tell us why reality is wrong. We don’t need replacement level. But, by having a replacement level, we are making our lives much easier, and we are ensuring we don’t end up with ridiculous results. Until you know exactly what you are doing, use replacement level.

Nov 04, 2010 8:03 AM on Replacing Replacement
 
TangoTiger
(57181)
Comment rating: 0

Great, thanks, I appreciate the quickness and thoroughness.

 
TangoTiger
(57181)
Comment rating: 0

Mike, I don't think talking about "perfect is the enemy of the good" is worth bringing up. That's like talking about PED and someone uttering "won't someone think of the children". It's not relevant to the discussion because that's not the point being discussed. I think what would be helpful for us neophytes is understanding the choices being made by the technologists, and what drawbacks the other considered and discarded systems have. For example, how can we, or not, apply the tennis-line technology? Can we put attach sensors on home plate, presumably making it a fixed point? What kind of miscalibration can we have? Will these help in conjunction with the rest of PITCHf/x? And can Trackman complement, or usurp, PITCHf/x? How about the FoxPuck on a baseball (presumably the baseball is too soft, compared to a puck). Basically: what are our choices, what do they do, what don't they do, and this is why ultimately they are not helpful. A followup article would be cool...

 
TangoTiger
(57181)
Comment rating: 0

The last two links on my site gives you the best estimate of run scoring distribution. Keith used this in the BPro annual from a few years ago.

 
TangoTiger
(57181)
Comment rating: 0

When would the range of the 0th to 10th percentile ever be smaller than the 40th to 50th percentile as an estimate of the true mean? It's like saying this: 50th: .300 40th: .290 30th: .280 20th: .270 10th: .260 0th: .255 It's not going to happen. This is how it would look like: 50th: .300 40th: .290 30th: .278 20th: .262 10th: .242 0th: .212 (or .000 technically) All numbers for illustration purposes only.

 
TangoTiger
(57181)
Comment rating: -1

Agreed, it *must* be wider. And PECOTA instead has it narrower.

 
TangoTiger
(57181)
Comment rating: 0

Note: all numbers for illustration purposes only.

 
TangoTiger
(57181)
Comment rating: 0

"the ranges are nice to see -- if they work." and "If you want to say that "most" starting regulars should have fairly similar ranges, I'd agree. But not "all" or anything close to it." In reality, you are right. Insofar as what the data can possibly tell us, then our ESTIMATES will have their ranges virtually all similar (beyond whatever their past number of PA would indicate). Only cases like Ben Sheets or other other players with injuries will be exceptions. Otherwise, I would be shocked that the 90th percentile of every player is not something like mean TAv + 1.15 to 1.20 and the 10th percentile is not TAv -1.25 to -1.30. Something along those lines. If someone is arguing that you are going to have some players at TAv +1.10 and others at TAv +1.40, then I don't think your expectations are going to be reasonable. (Again, presuming we are looking at similar past PA for the players in question, and injuries notwithstanding.) You might get a skew based on age, but again, that would apply across the board to everyone at that age. Anyway, let's see what Colin will discover with the refreshed PECOTA, let him make his claim, and then just test it.

 
TangoTiger
(57181)
Comment rating: 0

But the range will be virtually the same for all starting regulars, and all starting pitchers. It's not going to differ by any amount that would be of any help to anyone. If you are talking about rookies and guys with limited playing time, sure.... but bench players you don't care about, and all the rookies will have such wide ranges as to be useless as well. Same thing for relievers... they'll all have similar ranges. So, I see no practical use for a Fantasy player for the ranges. What you DO want to have the ranges for is playing time. That's where the value is.

 
TangoTiger
(57181)
Comment rating: 1

The *idea* behind having uncertainty of your estimated forecast is good. Indeed, we devote several pages in The Book on not only the need, but the method, to calculate the uncertainties. When I publish the Marcels, I include a "reliability" figure, which acts in a similar way. Colin is accepting the position I've held, and MGL reiterated, and, really, what any stats professor would tell you, and that is that the uncertainty of your estimate is based on the size of your observed sample. What has been frustrating for me is that this is so obvious and commonly accepted that I was getting push back on it (not Colin). Now, Colin is going to be novel about it, and include more to the uncertainty by looking at the kind of player you have (maybe there's more uncertainty in the mean of old players, or fast players, or whatever). That's good, but more importantly is to get the basics down, which is what he is going to do. Now, is it necessary to publish the 10th and 30th and 80th percentiles? Why not just say: Pujols .330 +/-.030 (where that's one standard deviation) Why does this help? Because you can then do this for Pujols' PA: Pujols 610 +/- 70 The way the percentiles are currently laid out, it tries to give you both, but it's not really. As Colin noted, it "infers" all the component stats based on the TAv stat. Why not do: Pujols (K/PA): .07 +/- .02 (BB/PA): .18 +/- .03 And so on. Wouldn't that convey far more information, while using up the same amount of real estate? (Note: not all things are symmetrical. You can get away with that on the rate stats, but not on playing time. On that one, and that one alone, I would LIKE to see the percentile forecasts.) You can also follow the thread at my site, where MGL made a good point.

 
TangoTiger
(57181)
Comment rating: 1

This is Felix Hernandez: PCT ERA EqERA 90o 3.23 3.31 80o 3.22 3.30 70o 3.27 3.35 60o 3.30 3.39 50o 3.54 3.63 40o 3.57 3.66 30o 3.68 3.77 20o 3.75 3.85 10o 3.87 3.97 Three points: 1. PECOTA is already giving us "EqERA", which is the peripheral or component or luck-free ERA we've been talking about. 2. In addition to that PECOTA is giving regular ERA (which should be much wider because it includes more luck from sequencing events, etc). 3. Look at Felix's forecast at the 80th and 90th levels. Obviously wrong. Look how wide it is at the 50-60 level, and then, how tight it is everywhere else. You are naturally going to capture more players in the 50-60 level if you are putting in estimates that are much wider at those levels.

 
TangoTiger
(57181)
Comment rating: 1

On my blog, someone asked me this: "what do you make of the large clustering of outcome in the middle decile? " I responded: I just took one guy to see what the shape looks like. This is ARod 90o 0.323 80o 0.315 70o 0.309 60o 0.298 50o 0.288 40o 0.282 30o 0.280 20o 0.277 10o 0.273 Look at the gap between 50th and 70th: 21 points. That's way wider than anywhere else. So, the reason that PECOTA is capturing so many players in the 50-70 range is because it provides such wide latitude at the 50-70 range. It won't catch much in the 30-40 range, because, well, look, there's almost no gap there. I don't know if ARod is an example or an exception. But, given that I've seen funny stuff, like Felix having a WORSE forecast at the 90th level than the 80th level, I think there is a serious programming bug as well.

 
TangoTiger
(57181)
Comment rating: 1

"And that is what is interesting about the histogram. If PECOTA works right, the results *should* cluster around 50th-percentile predictions, and indeed, they do." No. For it to work right, the percentiles should remain the same, BUT the estimate of the percentile levels should be much narrower. For example, PECOTA would give this: Pujols: 10th .290 50th .330 90th .370 (Or whatever). IDEALLY, the best forecasting system would give something like this: 10th .310 50th .330 90th .350 That is, the estimate of each level is as tight to the 50th as possible. However, the histogram *must* show 10% of players (of whatever the population it's based on) in each 10 percentile grouping. *** You seem to be saying that we should keep it like this: 10th .290 50th .330 90th .370 And then be happy that 95% of the data falls between the 10th and 90th points. Well, from that standpoint, why not set the percentile ranges so wide to ensure that 95% of the data falls between the 45th and 55th points? *** I think you are conflating the issue of accuracy, with the issue of bias. The histograms here speaks only to the issue of bias. It says nothing about accuracy (of the mean forecasts). It only say something about the "accuracy" of setting appropriate ranges.

 
TangoTiger
(57181)
Comment rating: 0

Can you forward to tom~tangotiger~net (replacing ~ as appropriate). I can't check my Yahoo account from the office.

 
TangoTiger
(57181)
Comment rating: 0

"HOWEVER: In terms of how well **PECOTA** performed" But this is not the subject of this article or that histogram. That histogram is about tracking the information in your paragraph here: "Then players 1 and 4 would be binned in the 90-100 percentile bin in terms of how they did relative to PECOTA projections, i.e., they grossly overperformed what PECOTA expected; players 2 and 3 would be similarly in the 0-10 bin; and player 5 would be somewhere around his own 60th-percentile performance -- the exact percentile he achieved would be dependent upon more details of the PECOTA projections, but it would be somewhere above the 50th but well below the 90th." The counts would be: n, percentile 2, 90-100 1, 50-60 2, 0-10 That's what the histogram would show from your example.

 
TangoTiger
(57181)
Comment rating: 1

"From this thread. " Well, you should, because how do we get resolution to problems unless we see the problem. And maybe it's not a problem, but a misinterpretation. As it stands, you pointing "something" out means nothing at all, since we (I) have no idea what you are talking about.

 
TangoTiger
(57181)
Comment rating: 0

I don't think I am following you. Suppose you have this player: mean forecast: .330 90th percentile: .370 10th percentile: .290 Actual performance: .375 This player would count in the 90-100 bin. Are we agreed so far?

 
TangoTiger
(57181)
Comment rating: 1

"But there's enough finger pointing, accusing, and comparisons going on to make it really annoying to a casual reader." I think you should be more explicit by pointing to actual examples. I will grant you that as a casual reader who might be giving cursory views to comments, it may seem combatative. But, once you go deep into it, we're all a happy sabre family.

 
TangoTiger
(57181)
Comment rating: 0

"Those in the first hump (players who vastly underperfromed their projections) are those who missed significant playing time due to unanticipated injury or got sent to the minors." I highly doubt it. But even if that's the case, then what in the world does "10th percentile" forecast mean? If you want to say that 50 of the 250 players (or whatever) of the players with at least 300 PA had a very down year, then why are you setting the benchmark so high that 20% of the players reached a level that you said only 10% of the players should reach? That is, vastly underperforming, or sent to the minors, while still reaching 300 PA is not a phenomenon limited to the year 2010. You are *starting* with the position that only 10% will reach some baseline (hence the 10th percentile). Then you have to ask: "how much below my mean will that be?". And maybe the performance level of a group of such players that you thought should have had a .270 TAv did in fact reach only .230, then that's where the 10th percentile forecast should have been made, and not the .240 or .250 level that IS being made, such that 20% (instead of 10%), get below that level. (All numbers for illustration purposes only.) It goes back to exactly what I am saying: once you decide on the parameters of your subpopulation, then it's at that point that you test for the 10th and 90th percentile. As it is, we have no way to test, because we are not being told what subpopulation to test against.

 
TangoTiger
(57181)
Comment rating: 7

Please guys, don't "minus" Bill's comment. Just because he is wrong in his position, doesn't mean we should not read it. It's a view that many might have shared because they didn't think of it the right way, but it's worth discussing.

 
TangoTiger
(57181)
Comment rating: 1

Right. To put it another way: under what conditions should we see 10% of the players exceed their 90th percentile forecasts? That is, what subpopulation of the 1000 batters in 2010 are we looking at? And once you look at that subpopulation, do we also see 10% of them going below their 10th percentile forecasts? I would bet that there is NO subpopulation that you can select where the percentiles come anywhere close to 8%-12% in each 10% bucket.

 
TangoTiger
(57181)
Comment rating: 0

Right, but not totally. If you compare relievers to starters, you will see that the range are similar. And, the variance around the true expectation of starters should be much smaller than that of relievers. You see that in some cases (Felix, CC), but most of the time, that's not the case. So, it's two failures: a failure on the true estimate, and a failure on the performance.

 
TangoTiger
(57181)
Comment rating: 0

Then you would get a right-skew. Instead of it being: 10, 10, 10, 10, 10, 10, 10, 10, 10, 10 It would be: 4, 5, 6, 7, 8, 10, 12, 14, 16, 18 Or something. And no way do we see anything like that. But, the point still stands, if PECOTA is saying: "I expect this player to exceed his 90th percentile 10% of the time", then how are we to evaluate that? Is it that we are to look at all 1000 batters in MLB, and have no PA minimum? Is it that the claim will only exist if the player is allowed to have 300 PA? PECOTA is the one making the claim. Therefore, let's see what the conditions are in which we expect the 90th percentile to be exceeded, and let's test it based on that basis. *** In any case, the #1 problem with the percentiles is that the uncertainty range has to be based virtually almost entirely on the sample size of the player's past performance. And this is not at all what PECOTA has been doing. Colin himself acknowledges it exactly: "This is a relatively simple fix—the uncertainty in a forecast is largely a function of the amount of data you have on a player."

 
TangoTiger
(57181)
Comment rating: 1

Ditto.

 
TangoTiger
(57181)
Comment rating: 0

Great stuff Matt. One more check if you can: ok, they improved their walk rate, but did their overall performance get better as well? If the reason they had a lower walk rate because they give up more first pitch strikes, and those first pitch strikes are easier to hit, then that might be the reason: it was a trade of fewer walks for more hits. Can you check into this?

Oct 01, 2010 6:44 AM on Pitch Data and Walks
 
TangoTiger
(57181)
Comment rating: 2

Bill: no, it has to be uniform. If PECOTA is saying that something is going to be between the 70th and 80th percentile, then we'd expect 10% of those something to occur between the 70th and 80th percentiles. You might be thinking of say between the 1.0 and 1.5 standard deviations or something (scale of SD not percentile), and in that case, you would be correct.

 
TangoTiger
(57181)
Comment rating: 5

The chart is here:

 
TangoTiger
(57181)
Comment rating: 4

Let there be no doubt that Colin is now taking PECOTA by the b-lls. After several years of me shouting from the rooftops and my padded room that the PECOTA percentile forecasts are highly suspect, and providing probability proof, we now have empirical confirmation. Colin shows us how often players who have at least 300 PA had their TAv land in each of the percentile ranges (10-20, 20-30, ... 80-90). In a perfect world, you'd have 10% of all players in each 10% group. In a real world, we'd expect say 8-12% across the board. But, this is not at all what we get. While Colin showed the numbers for the 10-90 group, he did not show the 0-10 and the 90-100. We do know that the total of these two groups is 36.5% (Colin reports that the 10-90 group is 63.5%). So, here is what that chart looks like if we just split the 36.5% evenly in the two extreme groups: image (Note to Colin: I definitely think you should update your chart to reflect the 0-10, and 90-100 numbers. I think this makes it far more clear, considering that the area above the 10% line has to equal the area below the 10% line.) That's for hitters. For pitchers, it's even worse. 50% of the pitchers (min 70 IP) had an ERA outside the 10-90 percentile ranges, whereas we would have expected just 20% total. It's an alarming total. When Felix Hernandez's 90% percentile is 3.20, and he, for two years in a row, achieves an ERA below 2.50, then you know something is dreadfully wrong. Now, Colin makes a good point that ERA includes sequencing, something we've talked about alot here in the past few weeks. The equivalent to a hitter's TAv would be a pitcher's peripheral ERA (component ERA, BaseRuns ERA, or whathaveyou). If we do that, we get for pitchers something similar as for hitters. Therefore, if the test is not going to be against ERA, but peripheral ERA, then the PECOTA percentile page should show the header as peripheral ERA. Nonetheless, a huge issue. Thanks Colin. You should be proud for doing the right thing.

 
TangoTiger
(57181)
Comment rating: 2

Fantastic!

Sep 30, 2010 6:16 AM on Aches and Pains
 
TangoTiger
(57181)
Comment rating: 0

Have you read The Book?

 
TangoTiger
(57181)
Comment rating: 0

An excellent point, and very well said. It would act as a great mission statement. As long as we can get away from "mine is better than yours", and into "mine works best here, and yours works best there", that would go a long way. Unless of course in those situations where something is deficient and should be supplanted. So, Chone, Marcel, ZiPS, PECOTA can all live happily together, each having its own strengths, with none deserving to be discarded. Great job Randy, great job.

 
TangoTiger
(57181)
Comment rating: -1

Oh, and no slight intended to anyone else I have read. Always a danger when you write a list of names, you forget someone.

Sep 29, 2010 1:15 PM on After The Knife
 
TangoTiger
(57181)
Comment rating: 2

I agree, compared to the start of last year, it's a talent increase. Compared to the start of this year, yes, there hasn't been enough replacements. How many quality writers do you need in order to justify your subscription? I think for me, that number is around 5. "My five" are Colin, Matt, Eric, and a bit here and there from Jeff, Ken, Tommy as the mood strikes me. All came on board since 2009 I think. Jay too from the old-timers. I've never really gotten into the non-saber articles here, but, I'm obviously in the minority. So, rather than looking at who has left, look at who is here, and decide if that is enough.

Sep 29, 2010 1:11 PM on After The Knife
 
TangoTiger
(57181)
Comment rating: 1

I've never seen a misuse of "Gold, Jerry, Gold!" yet, so never apologize for it. It's one of those terms that is.... uh, gold.

 
TangoTiger
(57181)
Comment rating: 2

Shandler makes an excellent point with his article. As I've shown, there's a half-dozen legitimate ways to test for accuracy, all depending on exactly what it is that you want. In one test, I had Marcel as #1 in a group of 22 forecasting systems. In another, it was middle of the pack. My preferred method is to run all half-dozen ways, report the results, and let the reader choose which way most closely aligns with his needs.

 
TangoTiger
(57181)
Comment rating: 0

From that standpoint, it's like Howard calling himself the King of All Media, and the rest in the media parroting Howard's claim, even though it was a joke. And since it's a big joke, there's no need to support the claim, nor defend it from others as they object to it. Therefore, kudos to Steven for the genius of it all. You got me.

 
TangoTiger
(57181)
Comment rating: 0

I'm one of the people who is irked. It's when people believe this press release that it's the problem. People start acting like PECOTA is the leader, when tests after tests shows that it's possibly above-average, and possibly below Marcel. It's not something to be boastful about. If the intent was limited to a quasi joke, why do you have it here: http://www.baseballprospectus.com/subscriptions/ "Complete depth charts and forecasts for AL and NL pitchers and hitters using Baseball Prospectus' deadly-accurate PECOTA projection system--the same one used in MLB front offices."

 
TangoTiger
(57181)
Comment rating: 0

{clap clap clap}

Sep 17, 2010 8:21 AM on Russell Branyan
 
TangoTiger
(57181)
Comment rating: 1

When I talk about replacement level, that's exactly my example: [quote]You guys know how much I like the guy. He is the prototypical replacement-level player. I would even say we should change the name of Replacement Level to Bloomquist or Willie. It’s not WAR, but WAB or WOW (wins over willie). [/quote] He also has the advantage of playing every position.

 
TangoTiger
(57181)
Comment rating: 0

Another thing that I love about WAR is that you can put all players, pitchers, non-pitchers, starters, relievers, etc, and list them on one scale, like say on the 1994 Expos. You might get a surprise here or there, but overall, it conforms to expectations. And if it does that, it's alot easier to trust for teams you are not too familiar with.

Sep 14, 2010 5:23 PM on Missing the WAR
 
TangoTiger
(57181)
Comment rating: 15

I agree wholeheartedly with Colin. When I was developing the framework for WAR, it was all about breaking it down by components, so that we can see how it works, and, if one so chooses, replace the calculations of one or more components with other sets of calculations. WAR is a framework that is easy to follow and accept. As an example, look at the way Fangraphs lays it out for Ryan Zimmerman. We see that he's +31 runs above average in offense, +16 runs above position average in fielding, +19 runs for playing time, +2 runs for his position, for a total of +67 runs (rounding issues notwithstanding). The conversion to wins makes it +6.9 wins above replacement according to Fangraphs' implementation of the WAR framework (fWAR). Now, suppose you don't like the fact that fWAR uses UZR. You are a Total Zone maven. Well, guess what, you simply move one number in, and move one number out. It doesn't invalidate the rest of the metric. Suppose you think replacement level is set too high, or too low. Well, change that too. Suppose you think Linear Weights makes no sense, and prefer BaseRuns. Well, go ahead, knock yourself out. Suppose you think that 3B is easier to play than 2B. Change that too. The important point is that you have a FRAMEWORK. Create that, adopt that, follow it. That's WAR. Now, once you have a framework, you need an implementation. You can be lazy and let Fangraphs (fWAR) and Baseball Reference (rWAR) figure that out for you. Or, gulp, you can do as Colin says here and think for yourself. What you can't do is just throw your arms up and say the solution is too difficult AND THEN proceed to give us your opinion as to who is the most outstanding player! If it's too hard to find the solution, then your opinion becomes irrelevant. It's a bullsh!t opinion, because it's a summary opinion without evidence. So, this is what sabermetrics is about, the journey, the thought process, the critical thinking. Do it, because we can never have enough people doing this.

Sep 14, 2010 6:58 AM on Missing the WAR
 
TangoTiger
(57181)
Comment rating: 1

To get around the issue of 90s and 00s and where to put Pettitte and David Cone, etc, I go by birth year. So, 1962 (Clemens) to 1971 (Pedro) gives us those 9 pitchers you listed (which is exactly the same 9 I always give, plus Mariano Rivera). The next group would be pitchers born 1972 to 1981. Unfortunately, it's pretty early to call it. For example, if we rewind 10 years to 2000, and see how the 1962-1971-born pitchers stood, you have David Cone (b. 1963) at 56 WAR and Randy Johnson (born the same year) at 57 WAR. Then RJ just got even better. However, all 9 of our studs were in the top 14 in WAR. Given the pretty lackluster group in comparison, Pettitte will almost certainly come in the top 10 for pitchers of the current generation. He'd be the David Cone pick basically, without having to face the Pedro/Clemens/RJ/Maddux quartet. I think, Eric, that you may have been trying to compensate for your personal bias maybe? Three others that deserve honorable mention that in 10 years could be part of the group: Javy Vazquez, Barry Zito, and Brandon Webb, all depending of course if they can put up 2-3 dominant years.

 
TangoTiger
(57181)
Comment rating: -2

Ken presents this better than anyone else. He acknowledges the arguments from both sides, admits he is almost definitely wrong, but clings to whatever he can to ensure his belief system remains in place. For better or for worse, his belief is the very thing that sabermetrics is trying to obliterate. In some ways, sabermetrics has ruined certain aspects of baseball, especially if you don't fill that void with other aspects of baseball that sabermetrics has taught us.

 
TangoTiger
(57181)
Comment rating: 1

{clap clap clap}

Jul 29, 2010 12:05 PM on Manufacturing Promotions
 
TangoTiger
(57181)
Comment rating: 0

Great job Jay. Only one issue: "[Raines] was better in the field" Dawson was a legitimate Gold-Glover in CF during his Expos years, well-deserving of his Hawk nickname. Raines was possibly an above-average fielder in LF. And I'm saying this as the biggest Raines fan around.

Jul 23, 2010 9:22 AM on If Hawk, Then Rock
 
TangoTiger
(57181)
Comment rating: 3

Number of DL trips, all players, 2002-09, by month: 1 0 2 0 3 419 4 784 5 634 6 562 7 490 8 531 9 93 10 1 11 0 12 0 So, yeah, you have a definite bias in DL assignments. July has about 10-15% fewer games (because of the All-Star game), and so, it's no surprise that DL assignments in July is about 10-15% less than June and August. The average of March/April is 600, and that's in-line with May/June. All-in-all, nothing there.

 
TangoTiger
(57181)
Comment rating: 2

It should also be noted that even if a recorder accurately describes a batted ball, there are errors in the transcription. That is, some 1% or 2% (or higher in some park) have clearly "impossible" data points. For example, the computer operator will select the wrong position (player) from the drop-down list, but mark the batted ball in the correct location. So, it might look like Jayson Werth ranged into LF to catch a ball.

 
TangoTiger
(57181)
Comment rating: 3

I really don't disagree with anything in this post. *** By the way, if we look at how much progress we've had over the last 20 years for offensive metrics, the answer will be: much less than fielding metrics. Palmer's Linear Weights and Run Expectancy matrix holds up fantastically well. And where the gains have been made (baserunning) ends up having limited to no impact to most players.

 
TangoTiger
(57181)
Comment rating: 0

I guess this is part of the nuance. Colin said this: "The short version: I’m not really sure that we’ve gotten any further than where we were when Zone Rating and Defensive Average were proposed in the '80s. And if we have gotten further, I’m not sure how we would really tell." So, he's asking two questions: "Have we gotten any further?" "How can we tell if we have?" Rather than specifically making it questions, he's wondering. But, he's not concluding. His actual conclusion was questions: "To me, this opens up a simple question—how good are our defensive metrics? Are they useful? How useful?" And that's where we are. We're in the investigation stage. And in the noted thread, I said this: "Room for improvement and discussion, as long as you start with the [recorded, not estimated] data. "

 
TangoTiger
(57181)
Comment rating: 8

I had already responded to Colin's point regarding the comparison of correlation for offense and defense. I will reproduce here: What is the relevance here? You are taking known hits, known extra base hits, and known outs, and you have one system that arranges it one way and another that arranges it another way. The correlation would have to be in the high r=.9x. With fielding systems, you are taking known outs (in some systems), estimated outs (in another), and estimated hits (for all systems), and trying to find the correlation. *** Otherwise, I share Colin's general skepticism of subjective data being treated as objective data. He fairly asks legitimate and nuanced questions regarding the advancement level of fielding stats. However, I am bothered that a reader, after reading Colin's piece, would come to a conclusion like: "Seeing as we haven't made much progress in 20 years with defensive metrics..." The only fair conclusion to make is that we don't know how much progress we have made, and not that we haven't made much progress. We've made "some". Is that a little? A lot? You can't say "not much". This is part of the nuance in Colin's piece that may be glossed over, if said reader is representative of a portion of the readership.

 
TangoTiger
(57181)
Comment rating: 0

I agree, just a fantastic interview. Indeed, one of the few interviews I will actually re-read. (Btw, it's David Appelman at Fangraphs, not "Jason".)

Jul 15, 2010 8:56 AM on Jeff Ma
 
TangoTiger
(57181)
Comment rating: 0

I seem to remember reading about a team telling its minor-league hitters that they need to walk at least once every 10 PA. (I don't necessarily agree with that kind of hard rule.) I agree, "lower FIP" wouldn't even be said.

Jul 15, 2010 8:32 AM on Business Casual
 
TangoTiger
(57181)
Comment rating: 1

See Eric's Rule #2.

Jul 14, 2010 9:11 AM on K/BB Ratio Redux
 
TangoTiger
(57181)
Comment rating: 4

{clap clap clap} An "on pace" article that doesn't simply extrapolate blindly. Excellent. We can see why most people don't do this: it's alot of work. So, be thankful Eric did all that work.

Jul 14, 2010 7:23 AM on K/BB Ratio Redux
 
TangoTiger
(57181)
Comment rating: 23

"I'm sick of listening to you act as if you've had 1% of the success the people you criticize have had" I didn't realize that my success level was required to offer my opinion. Joe, why is BPro asking for opinions of the rest of us little people then? In any case, I made no insulting or disparaging remark. "Stop jumping in here and cheap-shotting a business that you've never comprehended on your best day." First, don't tell me what to do. Secondly, exactly what did I cheap-shot? You read whatever you wanted to read into what I said, and decided to use that as a launching pad to tell me whatever you wanted to tell me. And, for some reason, rather than send me an email, you needed to tell everyone else this. I hope BPro leaves your post here. *** Just before Joe posted this, I put a link on my blog to Joe's newsletter. Joe is one of the best writers around, and I stand by my position to support him, and I will certainly leave that link there.

 
TangoTiger
(57181)
Comment rating: 4

Fangraphs has forums, and they don't charge their readers. Primer has forums, and they don't charge their readers. You've got to have a better reason for not having a forum considering that you are already charging readers.

 
TangoTiger
(57181)
Comment rating: 7

By the way, it's ridiculous that because dawhipsaw's post was given a "-5" that now every comment threaded to his is also collapsed. There's good comments in that mini-thread by Richard, pobo and evo, (along with ok ones by me) and they are hidden. Anyway, I've going to give dawhipsaw a +1 just to keep it visible. dawhipsaw makes reasonable enough points, and just because we disagree doesn't mean we should hide it. Readers here are way too into the rating on the "niceties" of the post, rather than on the merits. dawhipsaw makes worthy comments. That there are several of us that are responding to it should count for something.

 
TangoTiger
(57181)
Comment rating: 0

"Kind of like the difference between an academic presenting a research paper at a conference instead of chatting about it in the faculty lounge?" Excellent analogy.

 
TangoTiger
(57181)
Comment rating: 1

You originally said this: "His last article was the very worst I've ever read on this site" What if one of the BPro authors chose one of your comments and said: "dawhipsaw's last comment was the very worst I've ever read on this site". Are you suggesting that the financial motive of the site should allow the readers to be far harsher than authors in terms of peer-to-peer interaction? If that's the case, then it's no wonder why ESPN authors rarely, if ever, are found in the comments section of their own articles. And it's no wonder that BPro authors are so tempered, and not often seen, here. On the other hand, if we ignore the financial motive, and simply allow the authors the right to interact at the same level and with the same rules as the readers, then what Matt said, and the way he said it, was in-line with the original commenter. This is really an editorial decision by BPro as to how much they want to handcuff their authors. I would also think that to the intelligent readers of BPro that the financial motive is benign. Otherwise, you are suggesting that as customers, you are allowed to insult those who are providing you with a service, as long as you are paying for it. And the service provider has to take it because you are paying them. Matt responded in-kind, and therefore, there is no issue. *** As for the bias: given that I've had my own somewhat unpleasant exchanges with Matt, I don't think I'm necessarily predisposed to argue for (or against) him. In any case, he contributes to my blog as a commenter, much as yourself. I don't know that that means I'm territorial around him or other commenters at my blog. I'm territorial around him because he provides value, regardless of where I've read him.

 
TangoTiger
(57181)
Comment rating: 2

This issue is really a matter of opinion, not fact. So, neither of us can be "wrong". I for example would respond sternly (though not necessarily harshly, though that may have happened) on my blog, and as long as I'm fair about it (not personal), then no one really has an issue about it. Perhaps at BPro, there's an expectation (from some? many? few?) readers that authors should be much more tempered in their responses. ESPN for example won't ever let their authors strongly challenge their readers. To me, that simply makes it a less colorful atmosphere. I would much prefer to see Rob Neyer openly challenge his readers, and even take some of them down a peg or three. You get rid of the riffraff, and you end up with a better setting. I find it bothersome that to get to appreciate Colin and Matt's personalities, you'd have to read their comments on my site or at Primer, and not on this site. It's an (implied) editorial position I don't agree with, but again, I may be in the minority here. IMO.

 
TangoTiger
(57181)
Comment rating: 6

Matt is one of the best things, if not the best, about BPro. As for his "hostility": I find that the response should be a bit more timid than what he's responding to. That first comment, which I presume is what you are talking about, was there with guns blazing. So, Matt was fine to be harsher in his comments than he is, because the context warranted it.

 
TangoTiger
(57181)
Comment rating: 0

Matt, just to make sure I understand: the difference between pick 1 and pick 30 is 10.92 wins over 6 seasons? I agree that there is incentive to tank (Mario Lemieux v Kirk Muller), and in some sports, there is a huge gap in how much more value the #1 player has over #2 or #4. But, as your numbers are showing, this is not the case in MLB. It's alot more subtle. Given that, the kind of tanking-type efforts MLB teams engage in are also subtle enough in return. No MLB team will get into a big tanking, because there will hardly exist situations where tanking is going to give them the returns. In any case, I agree on changing the draft just on principle, because of the unAmerican principle of it. If subtle incentives to tank is one reason, then I'm happy to add that to the arsenal.

 
TangoTiger
(57181)
Comment rating: 0

Right, the correct answer would have to be between the two. But, I would think it's far closer to me, because I would be shocked if you can find 10% of starting pitchers who were removed mid-inning who did not allow a runner to reach base. So, the reality is that virtually all mid-inning removals occur when the inning is no longer perfect, and therefore, you have to count that as "1" opportunity, and not "1/3" or "2/3" as your model would specify.

Jun 10, 2010 6:54 AM on Perfection
 
TangoTiger
(57181)
Comment rating: 0

You are right about your first point, and I made a response yesterday on my blog to that effect. I also put in more calculations on my blog that are more enticing.

Jun 09, 2010 10:27 AM on Perfection
 
TangoTiger
(57181)
Comment rating: 0

Agent is absolutely right, as evidenced by followup comments. In order to have a 38% chance of a 1-2-3 innings, the average OBP of the three hitters is .275. This is going to happen on a combination of good pitchers and bad hitters. In a followup inning, you will still haev the good pitchers, but you will have much better hitters. So, it would be better if you look at a perfect 9 outs, so that you take away the lineup order as an issue. Indeed, you can start with just who started the game with a perfect 9, count the number of times that happened, and divide that by the number of starts made. Let's guess this number is going to be 3%. That is, 3% of the time, you start the game with a perfect 9 outs. You raise that number to the power of 3, and you get: 0.00003 That's about one in 35,000. And with 350,000 starts since 1900, that gives us the odds at 10 perfect games expected.

Jun 08, 2010 1:37 PM on Perfection
 
TangoTiger
(57181)
Comment rating: 0

Beautifully well-said. So if you have 9 guys with the following OBPs: 0.290 0.300 0.310 0.320 0.330 0.340 0.350 0.360 0.370 the chances of getting a perfect three innings is .027. And three groups of those 3 perfect innings is: 0.0000197 Compare that to 27 outs of a .330 OBP hitter: 0.0000201 As you can see, it doesn't really matter how the individual players are spread out. I have other relevant calculations on my blog for those interested (post 5).

Jun 08, 2010 1:11 PM on Perfection
 
TangoTiger
(57181)
Comment rating: 0

Colin, I meant for the denominator (opportunity). The way I read Tommy, if someone pitched three 6.1 innings, that would count as 19 innings, when it should count as 21.

Jun 08, 2010 1:00 PM on Perfection
 
TangoTiger
(57181)
Comment rating: 2

Tommy, I think you might have a problem with your 40% or 38% figure. You are probably counting three one-third of innings as one inning, rather than three innings, to begin with. As for the chance of a perfect inning, why not do it the even easier way, and take OBP (the real OBP, one where reaching on error is a good thing, etc). Chance of a perfect time at the plate for a pitcher is around 66%, so a perfect 1-2-3 is .66^3 = .287. A perfect 27 is .66^27 is around 1 in 75,000. Presuming that perfect games occur in slightly more conducive settings, say a true OBP of .300, that makes the above calculation as 1 in 15,000. With about 350,000 starts in MLB since 1900, that works out to 22 expected perfect starts.

Jun 08, 2010 6:58 AM on Perfection
 
TangoTiger
(57181)
Comment rating: 0

Jeff, can you find out more about Wakefield? His special covenant is that he gave his team a perpetual option at the same salary (i.e., the reserve clause). This was changed when Wiener took over, probably because this covenant didn't look like a benefit to the player. Readers at my blog however suggested that rather than Wakefield being worth the 4MM$ he was paid every year, that he was really worth 3MM$, and that the extra 1MM$ a year was in exchange for the perpetual team option. Kind of iffy in my book, but plausible to some extent. I'd just like to know if basically Wiener came in and laid it out and made the Redsox change it to something more conventional, like he did with the Marlins, etc.

Jun 03, 2010 9:25 AM on Restructuring Deals
 
TangoTiger
(57181)
Comment rating: 0

Jay, if you have a certain degree of uncertainty regarding BPro's fielding metric (or any fielding metric for that matter), you can look at other metrics to see where he stands. Fangraphs has three such metrics. Total Zone (created by Baseball Projection) has Rolen at +143 runs (+97 since 2002). UZR (since 2002) has him at +95 runs. Dewan's Runs Saved (since 2002) has him at +90 runs. So, calling Rolen around a +150 for his career sounds ok. Clay has Rolen at +184 runs, which is certainly believable enough, in light of the above numbers, and in light of actually seeing him play.

 
TangoTiger
(57181)
Comment rating: 0

You have these two catchers facing 1000 runners on first base for the season: IRod, 20 SB, 20 CS Carter, 80 SB, 60 CS Piazza, 140 SB, 100 CS Which one is more valuable? For the moment (just for the moment, for this one moment), presume that DP, Hit&Run rates, opening the hole between 1B and 2B, and taking extra bases on singles and doubles are non-factors. (Just for the moment.)

May 18, 2010 9:09 AM on Caught Quantifying
 
TangoTiger
(57181)
Comment rating: 0

I took Matt's data for the 3-yr and 4-yr players, parsed it, added a few columns of my own (to match the discussion point above), and have posted it here. If there are any errors or additions, please email me. This is what sabermetrics is all about.

May 17, 2010 12:58 PM on The Cost of OPP
 
TangoTiger
(57181)
Comment rating: 0

Focusing on the 19 guys with the 4-yr deals, 7 were pitchers, and 12 non-pitchers. The 7 pitchers had only one pitcher resign (Hudson, 47MM$, 11 WARP, 4.3MM$ per win). The 6 pitcher who were new-teamers averaged 39MM$, 7 WARP, 5.6MM$ per win. Again, pitcher inefficiency once more. The 12 nonpitchers had 5 same-teamers (38MM$, 8.9 WARP, 4.3MM$/win) and 7 new-teamers (40MM$, 11.6 WARP, 3.4MM$/win). This last one goes against what Matt is saying. *** Merging the three year and four year players, this is what we get: 35 pitchers, 25.5MM$, 3.8 WARP, 6.7MM$ per win 32 nonpits, 27.0MM$, 7.8 WARP, 3.5MM$ per win So, a huge source of bias is how much pitchers are getting paid. This should come as no surprise to anyone. The breakdown of pitchers: 9 same-teamers, 27.6MM$, 6.2 WARP, 4.5MM$ per win 26 new-teamers, 24.9MM$, 3.0 WARP, 8.3MM$ per win The breakdown on non-pitchers: 12 same-teamers, 29.4MM$, 9.4 WARP, 3.1MM$ per win 20 new-teamers, 25.7MM$, 6.9 WARP, 3.7MM$ per win 11 same-teamers, 28.7MM$, 8.3 WARP, 3.5MM$ per win (excludes Chipper Jones) *** The strongest conclusion is: pitchers are severely overpaid relative to non-pitchers. The second strongest conclusion: teams signing someone other than their own pitchers to long-term deals are taking a huge risk. Thanks Matt for the data. This second one really surprised me to the extent it's happened.

May 17, 2010 12:42 PM on The Cost of OPP
 
TangoTiger
(57181)
Comment rating: 2

"Tango, it appears you've selected on the dependent variable. " Matt, I obviously didn't make myself clear. Of course you CANNOT do what I did. I said this: "again, I'm ONLY using this as a proxy, until Matt can generate this critical piece of data" I shouted that out! Therefore, there's no reason for your first two paragraphs, since I already conceded that you can't do what I did. *** I'm trying to show if there's bias in the data. I would like to have the previous season (or PECOTA actually) WARP and PA or IP. You are suggesting that a good proxy of that is the salary signed. That's a good enough suggestion (though that is still a proxy for what we want). If we break them up the 48 3-yr players in thirds, we can put the 16 guys who signed for more than 24MM in the "great player" pile, the 16 at under 16MM in the "average player" pile, and the other 16 in the "good player" pile. Here's what we have: GREAT players 8 same-teamers, 29MM$, 8.2 WARP, 3.5MM$ per win 8 new-teamers, 31MM$, 2.1 WARP, 14.8MM per win (!!!) Six of the 8 new-teamers were pitchers, only 2 were position players. GOOD players 4 same-teamers, 23MM$, 6.4 WARP, 3.6MM$ per win 12 new-teamers, 19MM$, 2.7 WARP, 7.0MM per win Eight of the 12 new-teamers were pitchers. Three of the 4 new same-teamers were pitchers. AVERAGE players 3 same-teamers, 14MM$, 7.0 WARP, 2.0MM$ per win 13 new-teamers, 12MM$, 3.2 WARP, 3.8MM per win Yes, very damning, if the salary paid can be used as a proxy for talent. *** It seems to me that pitcher v nonpitcher deserves its own breakdown. I count 28 pitchers, 20 non-pitchers. PITCHERS 8 same-teamers, 25MM$, 5.5 WARP, 4.5MM$ per win 20 new-teamers, 20MM$, 1.8 WARP, 11.1MM per win (!!!!) NON-PITCHERS 7 same-teamers, 23MM$, 9.7 WARP, 2.4MM$ per win 13 new-teamers, 18MM$, 4.4 WARP, 4.1MM per win First thing we see is: pitchers are way overpaid. So, that's going to be a huge source of bias, and therefore, if you have any subsample that doesn't have the same pitcher to non-pitcher split, that's a bias. Even so, we still show a huge split. So, this still goes toward Matt's point. Chipper Jones is one of the 48 players, generating 21 wins and being paid only 37MM. Now, it's certainly not like the Braves knew something that everyone else didn't here. But, any sample that Chipper is a part of will end up looking really really good for that sample. That said, the most striking thing in the sample, and this goes toward Matt's point is this: of the 8 pitchers who generated the most wins, 6 did it with the same team, and 2 with new teams. Of the 20 pitchers who generated the fewest wins, 18 did it with new teams, and 2 with the same team. Here's the list of pitchers, with an "x" meaning same-teamers: x Jason Isringhausen x Livan Hernandez x Brad Penny Jason Marquis A.J. Burnett x Ryan Dempster x Kelvim Escobar x Freddy Garcia Braden Looper Bob Howry Chad Bradford Jon Lieber x Vicente Padilla Tom Gordon Kyle Farnsworth Scott Eyre Danys Baez Jamie Walker Miguel Batista Scott Schoeneweis Armando Benitez Matt Morris Esteban Loaiza x Odalis Perez Eric Milton Jaret Wright Jason Schmidt Adam Eaton Notice the clumping. The non-pitcher list certainly looks much more random than this, though a bit of clumping at the bottom: x Chipper Jones Rafael Furcal x Carlos Guillen Omar Vizquel x Randy Winn Mark DeRosa Bengie Molina x Brian Giles x Geoff Jenkins x Melvin Mora Jacque Jones Aubrey Huff x Michael Barrett Shawn Green Alex Gonzalez Frank Catalanotto Dave Roberts Adam Kennedy Juan Encarnacion David Dellucci So, I think Matt has a great point, at least as it pertains to pitchers. And, I think, it's easy to explain that. Pitchers get injured far more, their talent hinges on their arm, and the teams know far more on their own pitchers. I will disagree with Matt with regards to Ryan Howard, for three reasons: 1. Howard is not a pitcher 2. Howard is young 3. Howard was signed two-years out Therefore, his discovery here, which looks great, won't apply very much, if at all, to Ryan Howard. Basically, Howard's representative group is so extremely limited, that it's hard to look at these results to try to infer something about the Phillies and Howard. Matt's research stands well so far, on its own, and it doesn't need any tie-in to Ryan Howard to carry weight.

May 17, 2010 11:00 AM on The Cost of OPP
 
TangoTiger
(57181)
Comment rating: 0

Two pieces of information that I would like to see: 1. the number of PA or IP in the year prior to signing 2. the WARP in the year prior to signing The reason is to see if there is a bias there, that perhaps the overpaid players are the scrub players, not the star players. This is critical. Just for fun, I'll try to approximate that. I took your list of 48 players that signed 3 year deals. If in any of the three following season they had any season with at least 2 WAR, then I will *presume* they were one of the good players in the season prior to resigning (again, I'm ONLY using this as a proxy, until Matt can generate this critical piece of data). We had 20 players who signed a 3-yr deal at the age of 30 or under. Of those 20, 13 were "good" (based on the above definition) and 7 were not. Of these 13 good players aged 30 or under, 8 resigned with the same team, and 5 signed with new teams. The 8 same-teamers averaged a 21.5MM$ deal, for a total 3-year WARP of an average of 6.5 for each player. That's 3.3MM$ per win. The 5 new-teamers were paid an average of 25MM$ and generated an average of 6.0 WAR. That's 4.2MM$ per win. Obviously, this is only 8 and 5 players, so, we've got huge uncertainty bands here. But, this is the way I'd like to see the analysis progressing. The focus should be on players who are considered good and who are no older than 30 (maybe 31). Under this more focused view, though with limited sample, Matt might be showing something interesting. More work needs to be done.

May 17, 2010 9:29 AM on The Cost of OPP
 
TangoTiger
(57181)
Comment rating: 1

Great stuff, and thanks for posting all the research. This is just how I like to see this done, and this will be fun to go through.

May 17, 2010 7:23 AM on The Cost of OPP
 
TangoTiger
(57181)
Comment rating: 1

Matt: sure, Rivera's Fangraphs' WAR looks low. From 2002-2009, Rivera's WPA was +29 wins (which becomes the upper boundary of what you can assign him). His WPA/LI (which removes all traces of leverage) was +15 wins. So, I would give him +22 wins. Fangraphs' WAR is somewhat lower at 19.5 because it uses FIP, which, while great for Rivera, is not the obscenely-great his BaseRuns would be. So, I agree with you on Rivera (specifically, not relievers in general) that his fWAR is too low. This is not a chaining issue though. It's a FIP / BABIP issue, and as I said, I agree with you there, which is why I split the difference. *** Yes, we would need to look at the service class for the relievers and starters, since that may bias the salary. I'm pretty sure, though not positive, that the ages were the same. Pretty sure anyway. *** Your conclusion is fine that this is a hard topic. That's pretty much the only summary opinion we can agree with.

 
TangoTiger
(57181)
Comment rating: 0

According to the 2009 team total on fangraphs for pitchers and non-pitchers, I get these totals: 593 wins nonpitchers 461 wins pitchers 1054 wins Total The total wins (1054, or 35 per team, implying a .283 win%) is pretty close to where I like to see it. Pitchers got 43.7% which is right around what I give. So far, so good. Unfortunately, it's not apparanent in the WAR calculation how much is derived from relief innings and how much from starter innings. We can try to work it backwards. Fangraphs does show the RAR (runs above replacement) as 3588 for starters and 868 for relievers (total of 4456). A straight 10:1 conversion would imply 446 wins, so there's an extra 15 wins unaccounted for, which could be for the leverage portion of the ace relievers. If you make it 86.8 wins for the relievers, unleveraged, and another 15 wins for the leverage portion, that gives us 101 WAR for relievers, which is 9.6% of all wins. IIRC, my prior research shows that relievers get 10% (maybe 11%?) of salaries. I don't see much of an issue here, so I don't see the basis for saying that relief WAR on Fangraphs are much too low. The evidence would suggest it's spot on.

 
TangoTiger
(57181)
Comment rating: 1

FIP: I agree with you, as I have similar reservations. I usually split the difference, and take it half FIP and half (component) BaseRuns. Chaining: I don't know what Colin's criticism of the creation of leverage. If it's that someone like K-Rod forces his own leverage to be higher, then I agree. However, there are different ways to calculate Leverage Index (not to be confused with BPro's similarly named, though completely different animal LEV), and you should use the one that makes the most sense. In this case, you use "gmLI", which is the Leverage Index of the game when the pitcher entered. As for the "bumping" up of the leverage, such that everyone moves up a notch once the top dog goes down: yes, that is pretty much presumed to happen. Absent an actual study, this would have to be the status quo, no? That if Mariano Rivera goes down, Joba Chamberlain takes his place, and every other reliever moves up a rung, and the minor league reliever becomes the mopup guy. The same applies let's say with NHL defensemen. I still don't see the "such low win values" though. I also don't see any evidence that Fangraphs says that each of the 30 teams are overpay for their relievers. Can you cite your reference or work for that claim? When I do my WAR (tWAR if you will), I get 10% of all wins going to relievers, and, 10% of all salaries also goes to relievers. I'm pretty sure I verified that against rWAR (Rally's WAR), though I haven't against fWAR (yet, though I would guess this to be true as well). Can you tell me what the total WARP is for relievers, relative to all players?

 
TangoTiger
(57181)
Comment rating: 1

Matt, can you explain why you think the Fangraphs numbers come out "very low" for relievers? You are making a summary opinion while presenting no evidence. You know where I stand on people who do that. If it's the use of FIP, then I can certainly see your point. If it's the chaining, then I don't (obviously, since I'm the one who supplied the chaining algorithm). BaseballProjection doesn't use FIP (I think he uses runs allowed after removing the fielder-effects), and chaining (I think my version). Do you agree with them, or do you have an issue there?

 
TangoTiger
(57181)
Comment rating: 0

As noted by the readers at The Book Blog, the value of the "reputation" comes not in shutting down the running game (which, as MGL has shown, is already handled by the SB, CS numbers), but on the other parts of the running game, like taking the extra bases on hits. For example, if you have Superman behind the plate, and no one tries to steal, AND they also take a shorter lead off first base, this might prevent the runners from taking an extra base on singles and doubles, and might get them doubled-up more often. Setting that particular point aside (perfectly valid, but is not really the particular trait being discussed here), I'd encourage the readers who are skeptical to actually work out the numbers specifically. By the way, this is exactly the same situation with the baserunning numbers (baserunners). Dan Fox introduced it in the annual a few years ago, and Dan did it exactly like MGL says we should do the catchers and arm numbers. They all work identically.

May 16, 2010 8:32 PM on Caught Quantifying
 
TangoTiger
(57181)
Comment rating: 1

MGL is right in the grand scheme of things. If you wanted to break down his baserunning game to "profile" him, you can include his deterrance. But, in terms of his overall value or impact, it makes no difference. This is no different than an outfielder where everyone runs on him, but he manages to throw out 20% of the runners and an outfielder where no one runs on him at all. Overall they are equivalent. The interesting thing is to profile them, but that's an aside.

May 14, 2010 4:27 AM on Caught Quantifying
 
TangoTiger
(57181)
Comment rating: 0

I'll recommend these articles. Mine: http://www.tangotiger.net/catchers.html Plus the followup in THT 2008 Annual on Google Books. Max Marchi: http://www.hardballtimes.com/main/article/two-dimensions-of-catching/ Chris Dial: http://www.baseballthinkfactory.org/files/primate_studies/discussion/cdial_2003-01-29_0/

May 13, 2010 11:10 AM on Caught Quantifying
 
TangoTiger
(57181)
Comment rating: 0

Wonderful, thank you.

 
TangoTiger
(57181)
Comment rating: 0

I think what would be interesting is if you break it down further by WARP class in the preceding year. So, anyone with a WARP of 3+, WARP of 1.5 to 3.0, and WARP under 1.5. Something to make sure that the two classes you are comparing are differentiated only on the basis of being re-signed or not. As it stands, the bias in talent level might be fairly strong in explaining the discrepancies. Or, include dummy parameters (for WARP class and age class), run a regression, and let's see what you get. I'd like to see the regression equation, especially if it shows a 0.5 or higher for "re-signed with same team".

 
TangoTiger
(57181)
Comment rating: 0

Matt, I presume you mean that the list of free agent players was the proprietary list that someone compiled for you? The actual salary they signed is possibly proprietary or possibly from Cot's Contracts? The WARP data is obviously from BPro. Did I get that right? If so, then the list of players and salaries from 2006-2010 can be obtained here: http://sports.espn.go.com/mlb/features/freeagents?season=2006 The pain in the butt is always in the linking of the ESPN ID to the BPro ID. Obviously, you linked your source's ID to BPro's ID. Anyway, I'm trying to understand how far you can give out the data within whatever arrangement you agreed to. It seems that for us to replicate and extend your work, someone would need to simply link ESPN to BPro IDs.

 
TangoTiger
(57181)
Comment rating: 0

Matt, I think it would be helpful if you can provide a data dump as I requested, so we can look for possible bias, and perhaps some of us can do our own studies based on the same dataset. Is this a reasonable request?

 
TangoTiger
(57181)
Comment rating: 1

Matt, thanks for the data. Looking at the data, it becomes imperative to find matching pairs to avoid selection bias. For example: 26-31 YEARS OLD Re-signed: 1.64, 2.48 (N=9) Newly signed: -0.34, -0.18 (N=5) Those newly signed are pretty much useless. I don't see what their WARP was in the year prior to being signed (which I hope you can also include), but I'm going to guess it's going to be also low. They were probably injured. Something about them explains how they can sign a two-year contract and become replacement level. Now, if you tell me that BOTH groups were a WARP of say 1.80 in the year prior to signing, then that would be VERY interesting. (Again, presuming no health issues.) Look at this data: THREE YEARS DEALS 27-30 YEARS OLD Re-signed: 1.99, 2.11, 1.59 (N=9) Newly signed: 1.13, 1.43, 0.73 (N=12) Again, there's no way these are two similar groups of players. The newly signed must have been alot of platoon players, while the re-signs are regulars. Unless of course you tell me that their WARP in the preceding seasons were both 1.80 or something, which I doubt would be the case. And here: FOUR YEAR DEALS 27-30 YEARS OLD Re-signed: 1.47, 4.17, 2.87, 1.03 (N=3) Clearly, 3 players, with that kind of jump, that means that there was one superstar season in there that skews it all. *** If I do a weighted average of all the 27-30 year olds in your list, this is what I get: 21 players re-signed, with first year WARP of 1.77 and 2nd year WARP of 2.56. That's pretty interesting, for sure. The question is why. Is it because of insider knowledge? Was it that they were 2.10 players who DROPPED to 1.77 and then rose to 2.56? We desperately need more context, specifically their WARP in the preceding season. And PA (plate appearances) would be good too. 23 players signed with new teams, with a first year WARP of 0.91 and 2nd year WARP of 1.20. It looks to me that players who signed multi-year deals with new teams are alot of platoon players. Given the small number of players (44 players 30 and under) who signed multi-year deals, a couple of big seasons could skew things. So, more information please, more context, as noted above, and I think we can move forward better. Thanks, and I love it when others do all the hard work and roll up their sleeves. Everyone benefits.

 
TangoTiger
(57181)
Comment rating: 3

Matt, interesting stuff. Can you publish your data, something like: playerid,ageSigned,yearSigned,DollarsSigned,NumYears,WARP0,WARP1,WARP2,WARP3,WARP4 (for each year they were signed, with WARP0 being the WARP prior to signing)

 
TangoTiger
(57181)
Comment rating: 1

Matt, good job. I like seeing stuff like this. The numbers seem fairly high (both the WARP numbers and the MORP). Can you show the sum totals by the three service classes, for WARP3, MORP, and Cost?

 
TangoTiger
(57181)
Comment rating: 3

Russell is one of the best saberists out there. I encourage everyone to read all his works, be it here or at MVN. *** One of my issues with regression at this granularity level is when I see something like this: ".742 * inning breaks " Well, we know that inning breaks are 2 to 3 minutes each, depending which TV network is involved. So, what the regression is saying is that there's some 1.5 to 2.0 minutes that it's removing from what we know, and distributing it to other variables, even though, in this particular case, it should be completely independent. That is, the between inning break has no relationship whatsoever to any other event. But the regression is finding some relationship. *** Cutting one minute in the non-action between inning will save some 17 minutes of game time. The players loaf around too much by their own admission. But, as one of the players recently admitted on his blog "we got to gets paid". So, this is really the issue: how can you cut down on game time while not touching the non-game time. Which is a very weird thing to try to optimize from a fan experience. Indeed, what's to stop MLB from increasing between inning game time, even if we reduce the actual game-time, so that we are always going up the same hill? Sisyphus anyone?

May 03, 2010 4:30 AM on Why Are Games So Long?