CSS Button No Image Css3Menu.com
New! Search comments:
(NOTE: Relevance, Author, and Article are not applicable for comment searches)
Reducing the bonus for having your QO player signed by another team is logical, for the same reason that reducing the penalty for signing a QO guy is.
Basically, if I get a 1st round pick for someone else signing my just-over-$17-million-a-year guy, that means I need to be able to sign him for like 4 years at $50 million to make forgoing the $20 million compensation pick worth it. This reduces the market for the player, just like having to forfeit a $20 mill pick did (though by only 1 team), so clearly it's something the players would be against. It also really strongly works against the whole principle, which is about small market clubs having their players "stolen" by larger ones.
The $50 million minimum contract and once-per-player QO rules make these sorts of marginal gambles unappealing. If a guy is possibly going to sign for less than $50 million, maybe it's better to just pay him and hope he gets better enough to be worth a bigger deal (and a QO) when the contract ends. Likewise, if the team can't make a QO a second time, the player might be more willing to grab a just-under-market $17 million and go for a more lucrative deal the next year. So now the team has incentive to sign guys like that and a disincentive to gamble on the QO for guys they don't want to keep.
Having been an avid baseball fan when Bonds was at his peak, I will say that I was both amazed by what he was doing and perhaps under-appreciating it. One aspect of that is probably just that it's very difficult to revere a thing that's actively occurring - much easier to write and appreciate legends after the fact, and I'll bet that most of the best players of all time were under-appreciated at their peak (though most of them didn't "retire" while still at that peak).
In the case of things like the single-season <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=HR" onmouseover="doTooltip(event, jpfl_getStat('HR'))" onmouseout="hideTip()">HR</span></a> record, though, McGuire had just set it, so while Bonds accomplished something that may last for a generation or two, it seemed almost inevitable that the record be pushed upwards at the time. For the career HR record, people were already aware of the steroids allegations and it all seemed pretty damning. For any of the other measures by which he was amazing - a lot of those measures weren't really mainstream. Only those of us following sabermetrics at the time had any idea.
I will say that Bonds was an impossibly dangerous hitter, even before the juice (though I guess we don't know exactly when that started). I remember watching <span class="playerdef"><a href="http://www.baseballprospectus.com/card/card.php?id=17635">Buck Showalter</a></span> call for an intentional walk with the bases loaded and thinking that move actually made perfect sense (Bonds was basically averaging a double per hit, which means any hit he got was likely to clear the bases, therefore the slightly lower chance that <span class="playerdef"><a href="http://www.baseballprospectus.com/card/card.php?id=1388">Brent Mayne</a></span> would get a hit seemed worth it).
The second Wild Card isn't there to fix the playoffs. It's there to fix the regular season - the more teams that are in contention late in the year, the more fans are engaged with the sport all summer. Baseball's problem is that the season is so long it's difficult to keep people interested in a bad team, and the records are so well-established and important that shortening the season isn't really an option.
The Wild Card game is a pretty big disadvantage for the 2 teams that have to play in it, even if it leaves them on (mostly) the same footing once they win.
I do think one of those "sterile" things about the mathematical recommendations is that they don't incorporate the fans. Or, more accurately, they assume that winning is everything and that the fans agree. Clearly this isn't true given things like the effects of celebrity and of team loyalty on ticket sales, but that stuff is generally hand-waved. If we get the wins-based value right, we can always adjust for the star-power of a player after the fact, right?
But inherently, people watch these sports because of the drama and the human storylines. They aren't watching to see two spreadsheets generate random numbers at each other. Teams like the traditional closer because it creates a good story - the dominant alpha who comes in and locks down a win when you need it most. It sucks when your closer gets beat, but it's a much better story than when you are watching some scrub blow the game while the star sits on the bench.
The same is true about pulling the starter - you could pull the start after 2 turns through the order, but you'd lose the myth of the big man who carries the team on his shoulders, so you need to have a really significant increase in win probability over the course of the season to lose out on all those great stories of the starter who guts out on more inning to save the bullpen from a long day the day before.
So all of that is about cultural shift - you'd need to find new storylines to replace the old ones, and they'd have to be compelling as well, not just "I'm going to call because I've got pot-odds to draw to my flush" but "I'm going to call because I think you're bluffing."
Are you looking for something beyond removing the frozen players from your pool, adding up the frozen costs, and then redistributing the leftover cash amongst the players in the same way as he would if his initial pass had left him underbid?
Great article, thanks Mike!
Some quick questions:
1) How do you set the initial bid for someone with little to no prior production (i.e. minors only or just a cup of coffee), or someone coming back from injury? Do you look for comps and set it based on them, or wait for an industry consensus?
2) How much effort do you spend preparing to monitor your categories during a draft? In other words, you mention mentally shifting away from buying closers if you've gotten a couple and thus looking to shift money into other categories, but do you do anything in particular to help track this for, say, <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=WHIP" onmouseover="doTooltip(event, jpfl_getStat('WHIP'))" onmouseout="hideTip()">WHIP</span></a> or steals? I'm thinking of something like associating players with specific stats either on the + or - end.
3) You refer to the bids as "limits" but then emphasize that they are a guide / frame of reference, so I was wondering how you view prices using them - are you just trying to grab a bargain early in the draft, then shifting money into the categories/positions you need later? Are you keeping a rough note of whether the market is tending to over-bid relative to your limits? This sounds like a draft strategy question, but I'm really trying to get at the bid limit philosophy - in other words, if you put $24 on <span class="playerdef"><a href="http://www.baseballprospectus.com/card/card.php?id=53395">Todd Frazier</a></span>, then notice that the consensus has him at $28, will you push him up to $26/$27 in order to better reflect the total market (and then choose not to draft him at $25 because you think that value is too high), or will you try to make your limits reflect prices you are actually happy to pay (so that any player you can get at or below the limit is worth grabbing unless you're saturated in a category/position)? This seems like a sticky issue, because you don't want to be too far away from the market price on players you don't like (or the rest of your prices will be too high), but you also don't want to set a bid limit you won't be happy about and then have to second-guess the limit if the market turns out to be soft.
I do wonder if this in-a-vacuum approach misses one big feature of what makes a big-leaguer: the ability to adjust. Of course, if you can pull that out of the statistics alone, then great, but it seems like people have had a difficult time with that. So therefore, having both a history of looks and a history of stats that informs an assess of adjustments being made seems like it would help the evaluation. Of course, I think that work-ethic and ability to learn & adapt should be considered tools and graded at the same level as the other ones, instead of just mentioned when the evaluator thinks of it. Putting a grade on it means making an effort to evaluate the evidence and these "makeup" tools are probably the hardest to judge.
Which is why I think they need to agree ahead of time on a system of punishment and then just apply it. In other words, if the only case-by-case decision is "does this constitute a violation of the code of conduct?" it stops being some arbitrary call where Rice gets 2 games and Hardy gets 10, for example, or where you have to look at those numbers and say "the league thinks that N games is a sufficient penalty for domestic violence". Of course, critics will criticize, but it's much easier to simply say - "We take this seriously, we don't want any players behaving in this way, and we are applying the predefined penalties."
Which of course implies that the assumptions are unrealistic so a deeper analysis probably would need to really nail down player variance due to random luck. It would be interesting to see an analysis of that, though, and its effect on the relative value of a lottery ticket based on a team's expected performance. Might have big implications for bringing up young players during a season where you are on the cusp (a la the Mets and Cubs this year).
Great article, I wanted to explore this bit:
"Suppose that a team felt that it had a roster with 84-win talent and was choosing between a player who was a guaranteed 2-win upgrade and one who might bring them nothing or might bring them four wins. Eighty-six wins probably doesn’t secure a playoff spot, but 88 might, and it’s only by signing the higher variance guy that they have a chance to make the playoffs."
If you had a team where replacement level was 48 wins, 12 players were worth ~1 win (stdev of 0.1), 12 were 1-3 wins (mean 2, stdev 0.5 let's say), and the 25th man was 0 wins with 0.1 stdev., you have a mean wins of +36 (=84 expected wins), total variance of 3.13 wins (so stdev ~= 1.769), and your odds of hitting 88 wins are ~1.2%. Replace the 0-win guy with a 2 win guy (at 0.1 stdev), and your odds go up to ~12.9%. Replace with a uniformly distributed 0-4 win guy (mean is 2, variance is 1.333) and the odds are now ~17.2%. So the extra variance is effectively worth ~.324 wins.
However, if we now replace a 1-3 win player with a 6-8 win player (upping our mean to 89 without changing the variance), the higher-variance player is now worth ~.485 wins less than the more reliable 2-win player. So what this means is that by committing to a lottery ticket early in the offseason, you are hurting your chances of building a truly competitive team in the interest of building a possibly-competitive one.
Still, gaining 2 wins is worth something, even if that something changes from an effective 2.324 to an effective 1.515 as your team gets better. I also didn't account for how variance of the number of wins needed to make the playoffs would affect the outcome (and I assumed that there is no covariance between player outcomes, which is not really true, even if you've already factored in the team context). The effect was quite a bit larger than I expected, though.
This isn't the problem the article is highlighting. The problem is precisely that public opinion and perception of reactions (by the league, the team, etc) are immediate, but due process is slow. If you are convicted of a crime, there are legal penalties applied, and it is easy for the league to mandate additional contractual or eligibility penalties. Would that be enough?
None of those penalties will be triggered for months or years, and the effectiveness of the justice system at assigning guilt is quite dubious (in effect, the system is designed to assign innocence as often as possible). So we are left with a dilemma - innocent until proven guilty is a great way to protect individuals from others who would seek to abuse the system, but it disproportionately favors the powerful and respected over the weak and unknown. Those who require protection find themselves fighting to receive it, rather than being granted it by default.
So how does this apply to baseball, and why should the league be involved at all? Well, the answer is basically that the major sports are, as the article says, charged with cultural responsibility that comes from their visibility. The virtues the sport trumpets and the vices it punishes must reflect the values of society, or the sport risks losing its place in the pantheon. The same is true for movies, TV, music, youtube channels, games, celebrities, and so on - to be mainstream, you must celebrate mainstream values and come down hard against transgressions. You do this not just to follow the trends to but to help reinforce the values that your fans believe in.
So, what should the standard be, and how do we punish transgressions? I don't know, honestly. Probably the best course is to create a system of punishment similar to the steroid abuse system - first code of conduct violation is a hefty suspension designed mostly as a public shaming, second violation is a year suspension, third is a lifetime ban. The standard of guilt should be "the preponderance of evidence" as opposed to "beyond the shadow of a doubt" - you are punished by the league if it seems more likely that you did it than that you didn't. If you look at how the ARod stuff played out, that seems like a reasonable model to me - the team is off the hook for payment while the player is suspended, but they are then saddled with any remaining contractual obligations to a player who has lost significant star appeal and marketing value, which gives them an incentive to help keep their players from breaking the code of conduct.
I think it worked really well to convey the feeling that many of us have about confronting these issues. It succinctly expresses the idea that this is one of those things you want to ignore until goes away, but you know you can't.
The problem is that attaching the draft-pick compensation to the player effectively steals value from the player - other teams will be giving up a pick to sign him, so they will offer less money. Since the team basically gets to harm the player's value in this way, it isn't just an "offer".
That said, even pushing the value to $21 million doesn't solve the problem, it just narrows the number of players affected. But affecting fewer players isn't really the best way to do it, because then even fewer teams are getting compensated for the loss of their best players. And you will always have someone who is right on the border, so does it matter who that is? I mean, if there's a player who thinks he will get more than $15 million dollars of 2016 value from a free agent contract despite the draft pick, should we feel bad that he is being denied $5-10 million?
Here's a better system: leave it as a 1-year, average-of-the-top-125-salaries contract, but instead of transferring a draft pick from the signing team, calculate the AAV difference between what they sign the player for and the QO, then award the team that loses the player a competitive balance pick (with appropriate budget) and a budget bonus equal to that difference. The team gets to spend more in the draft, they get an extra pick of the appropriate value no matter how many players they lose or another team signs, and it doesn't arbitrarily adjust the cost of certain free agents - there is no extra cost to the signing team. Plus the team that lost the player can trade their bonus pick. If I team makes a QO to a player who just wants more years, they still get a pick when he signs elsewhere, they just don't get any bonus budget.
Good to know that it doesn't matter whether the pitcher gets on. Of course, you could argue that being put on base specifically to degrade his performance might have psychological effects on him - would he have tried to strike out to make sure he didn't get on base? Would he have "stolen" second to get thrown out? Would he have just taken the base like normal and then been angry about the idea that they did that?
In retrospect, of course, it was worth the shot since we know that he didn't give up 2 runs. In the moment, though, there would have to be a pretty strong effect to give up that extra runner. It seems obvious now that trying anything to get Davis off his rhythm would be desirable, but then again, would anyone have expected Familia to blow 3 saves? If he had shut the Royals down each time he came in, would the narrative have been that the Mets had the obviously, inevitably invincible closer (like he had been for half a season leading up to the World Series)?
I think it's either the right play all the time or none of the time (sounds like none of the time, based on this analysis). It was at least as likely that Davis would blow the save as it was that they'd end up needing him to in the first place.
Great article! I think the problems it points are really interesting ones, and I had never thought about how the salaries at the low end of the scale for the Front Office might affect the diversity of the people in those positions.
That said, I don't know how you can enforce diversity of hiring for singular positions. It's one thing to penalize a team that fails to employ certain percentages of minorities and of women overall, but if you mandate that for the singular GM or managerial positions, how do you enforce it? If every team interviews at least 1 woman an 1 minority for their GM position, and then they all decide on the white guy, which teams were being racist/sexist?
So the bottom-up approach is better - increase pay at the bottom to attract more talent, impose reasonable standards for hiring diversity at the lower levels, and maybe even get to the point where people are being recruited into this type of work at the college level as preparation for the profession. Some day there could be a stathead draft alongside the Rule 4 draft in June!
One positive of the stathead trend is that being white and Ivy league isn't a prerequisite for being a good statistician. Once teams have broken the barrier of having people who never played minor league ball as managers and coaches, it becomes a lot easier to open the door to women as managers, coaches, GMs, etc. It probably won't happen with this generation, but given that the current crop of managers seemed far-fetched in 2000, perhaps by 2030 anything is people.
This piece deserves a closer editing pass:
- in the 5th paragraph, it says "affect such an approach", but I think it means "effect" (that is, I don't think the intent was to say that they are only looking for people who give the appearance of the approach, or that they are looking for people to change the approach, but rather that they want someone who will make the approach work).
- In the 6th paragraph, "It’s not owners, presidents or GMs down the forfeiting wins" should be "down there", I guess? Or "down with"?
- Then the 7th paragraph has "looked up and saw almost virtually no one"
It's weird that the Royals had heart and team chemistry only in games 1, 2, 4, and 5.
I don't think there's anything wrong with using Familia in Game 3, but I do think he should have started the inning with him in the 8th or else less Clippard have the whole inning. Bringing him in to face Hosmer doesn't make any sense. That said, Clippard did mess up pretty bad by walking 2 guys, so what can you do?
If Duda gets a hit (or a walk-off homer, god forbid!) in the 9th, this story has a section on the questionable choice to use Davis to face the bottom of the order in the 8th, giving the middle of the order the chance to get to him in the 9th. Instead, it's a brilliant move that saved the bullpen.
If that same play happens in Game 4 of the NLCS you shrug and go "oh well, tough hops happen" and then you complain that Collins left Clippard in too long and that Familia seems to have lost it at worst possible time (the game was only tied after the error).
If it happens to a fielder known for his fielding, you shrug and go "oh well, tough hops happen" and then you complain about Collins and Familia and maybe Cespedes. Or you start some elaborate speech about how the Royals play the game right because they don't strike out *cough* <span class="playerdef"><a href="http://www.baseballprospectus.com/card/card.php?id=18203">Harold Reynolds</a></span> *cough*.
It happens to Murphy and the fun storyline is how he's cost them the World Series... because I guess if he makes that play then Familia strikes out Moustakas rather than him singling in the 2 runs? Or something like that.
Murphy screwed up... but so did Familia, Clippard, Collins, and Cespedes.
Well, if they wear pitchers down, all the more reason to pull pitchers out of games before that 3rd time through.
The real problem with that plan is the lack of arms capable of getting enough innings to make it work. A lot of starting pitchers struggle in the first inning of the game, so is it really better to bring them in then?
I mean, if you are going to yank one of the best starters in the game in the 5th inning to bring in a mediocre starter for a turn, which is worse? Is <span class="playerdef"><a href="http://www.baseballprospectus.com/card/card.php?id=46468">Jon Niese</a></span> better on his first turn through the lineup than deGrom is on turn 3? I think I'd need to see those stats to be convinced. It's one thing to say pitchers get worse the 3rd time through, it's another thing to say that all pitchers are better for an inning that the best pitcher is for his third time through the lineup.
Or, put differently, would you rather lose because Cueto outpitched Jacob deGrom, or because he outpitched Jon Niese and <span class="playerdef"><a href="http://www.baseballprospectus.com/card/card.php?id=66336">Hansel Robles</a></span>?
Ah nevermind, I see that Brooks Baseball uses 55' as the release distance.
Great stuff! One question: what's the typical release distance for pitches? Is it the same for all pitches or does it vary? I assume that it varies per pitcher, but what's the range of that variance?
I ask because the effect of the dragless gravity adjustment seems to be pretty different for different pitch types - slow curves have [V_Mov] - [Dragless V_Mov + Gravity] = 53.5, whereas for 4-seamers it's 23.8. Some back-of-the-envelope calculations imply a release distance of ~51 feet for the slow curve vs. ~38 feet for the 4-seamer, so clearly I'm leaving something out (probably an initial downward velocity produced by the arm motion, release snap, and plane of the pitch) so I'm wondering how the adjustment is calculated (does Pitch/fx know the release distance of each pitch?).
Lindor, Russell, Seager - let's say you can have 2 of these guys at minimum salary in a mixed keeper with 240 players rostered and salary considerations (flat salary increase for each kept player; if all players had the same salary, you'd have to drop 1/3 of them to make cap the next year). Which 2 are you going for?
So the <span class="playerdef"><a href="http://www.baseballprospectus.com/card/card.php?id=471">Jason Schmidt</a></span> things brings up an interesting philosophical question: if you have a personal catcher, and that catcher does a terrible job of framing, do you bear no responsibility for the effects of that framing? In other words, how much of catcher framing "ability" is driven by the particular pitchers? My guess is that it's a lot easier for the numbers to be skewed by some quirk of a pitcher's pitches if you are only catching one pitcher in a season. Maybe Schmidt had such great deception he was fooling the umps sometimes. Maybe he had nasty late movement that the umps called inconsistently. Maybe CSAA already accounts for this stuff...
Just wanted to say that this article was great, really glad you guys did it.
What would be cool and would certainly be too time-consuming to really do, would be to work up a board for each team. Of course, that would have to be preceded by content about how boards are created and maintained, what each team values this year, how they are thinking about managing their draft money, etc.
Would be really cool, though, since you could "sim" a bunch of mock drafts to see how different picks would play out.
This is a really cool project.
Obviously this gets at some additional philosophical (and measurement) issues, but there are some factors that stand out to me as potentially missing, the most obvious being that teams are doing a lot more defensive positioning adjustments - is that adequately accounted for by <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=FRAA" onmouseover="doTooltip(event, jpfl_getStat('FRAA'))" onmouseout="hideTip()">FRAA</span></a> in a general sense? Even if it is, I wonder if the tendency to shift or not-shift against certain hitters in certain game situations could be accounted for? Ideally you would know whether the shift was on, but I'm not sure if that info exists right now...
Another thought that occurred to me were that players get tired as the at-bat, inning, game, or season continues. The SPP thing accounts for this to some extent (since relievers are fresher in a typical at-bat), but I wonder if accounting for pitch count (both in the game as a whole and in the inning so far) would make a meaningful difference? Would <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=IP" onmouseover="doTooltip(event, jpfl_getStat('IP'))" onmouseout="hideTip()">IP</span></a> so far this season make a difference?
Related to that, there's the factor of 2nd and 3rd time through the line-up and pinch-hitting. Would it make a difference to account for how many at-bats the player has already had in the game, and for how many times the pitcher has already faced him?
Anyway, those are probably all minor things.
Yeah, this struck me as a problem also - being much closer to RA/9 isn't actually a good thing, is it? It would be interesting to see how RA/9 correlates with itself relative to <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=DRA" onmouseover="doTooltip(event, jpfl_getStat('DRA'))" onmouseout="hideTip()">DRA</span></a>'s correlation.
So then the question is, what are you trying to measure and how can you tell if you successfully measured it? I think the answer would be pretty hard to come up with. You are trying to measure how many fewer runs your performance should have resulted in, compared to an average performance in the same context. If we took the "should have resulted" runs from every pitcher in the league and added them up, should that equal the "actually resulted" runs? No, because we are laying some blame on the fielders and the catcher, and the hitters, and so on. So I guess that means that it should really be compared to RA/9 - (sum of all non-pitching sources of runs)/9, assuming that you trust those other metrics (like <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=FRAA" onmouseover="doTooltip(event, jpfl_getStat('FRAA'))" onmouseout="hideTip()">FRAA</span></a>, <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=BRR" onmouseover="doTooltip(event, jpfl_getStat('BRR'))" onmouseout="hideTip()">BRR</span></a>, and <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=BRAA" onmouseover="doTooltip(event, jpfl_getStat('BRAA'))" onmouseout="hideTip()">BRAA</span></a>).
If the average runs-allowed in season A is 3 and in season B it is 4, then a guy who generated -0.5 runs-allowed-above-average would have a 2.50 <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=DRA" onmouseover="doTooltip(event, jpfl_getStat('DRA'))" onmouseout="hideTip()">DRA</span></a> in season A, but a 3.50 DRA in season B. Even if this was done on a percentage basis (i.e., you measured his skill by saying he allowed 20% fewer runs than average) you would end up with different numbers (2.40 vs. 3.20). Conversely, a guy with a 3.00 DRA in season A is an average pitcher, whereas that same DRA in season B is well below (i.e. better than) the average.
I think this visualization is pretty awesome, and the suggestions above (especially the hexagonal shape to de-emphasize one-out situations, and the use of the diagonals / dividing line to denote how the out was made) would improve it. I also think that the spiral idea is a potentially interesting one, emphasizing the flow of the plays.
One way to add the type of outs might be to extend each box into the adjacent space (perhaps with an extra trapezoidal section in the case of the hexagon layout) to visually tie the indicator (F, <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=G" onmouseover="doTooltip(event, jpfl_getStat('G'))" onmouseout="hideTip()">G</span></a>, K) to the <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=AB" onmouseover="doTooltip(event, jpfl_getStat('AB'))" onmouseout="hideTip()">AB</span></a> during which it occurred. This would also help display mid-AB outs like pick-offs or <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=CS" onmouseover="doTooltip(event, jpfl_getStat('CS'))" onmouseout="hideTip()">CS</span></a> (you could extend the color of the AB across both edges, with a CS or <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=PO" onmouseover="doTooltip(event, jpfl_getStat('PO'))" onmouseout="hideTip()">PO</span></a> in the spot for out type) and double-play outs (though I suppose that those could be represented with a gap on the corresponding edge - in other words, if there's a <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=DP" onmouseover="doTooltip(event, jpfl_getStat('DP'))" onmouseout="hideTip()">DP</span></a> with 1 out, then there was no AB with 2 outs).
It's true that this display makes it tough to pick out the performance of an individual. I wonder if it would be useful to cycle through some set of background textures for the boxes to allow the shading to be more distinct. For example, if you have empty texture, a light cross-hatch, and a light pinstripe or herring bone, then you could have 3 shades of gray / black, and perhaps switch to a dark blue for subs. Then even a pinch runner could be displayed by swapping the color/texture combo in the middle of a box.
So given all of that, perhaps the display would now be too cluttered. It would also still fail to indicate how and when runners advanced on plays like a wild pitch, sac fly/bunt, <span class="statdef"><a href="http://www.baseballprospectus.com/glossary/index.php?search=SB" onmouseover="doTooltip(event, jpfl_getStat('SB'))" onmouseout="hideTip()">SB</span></a>, etc. In most cases that's not too important, but it certainly loses some of the cool details of a game if it isn't there. There's also the question of reached-on-error vs. hit.
If you really want to go nuts, you could potentially treat each AB as a timeline, putting a little tick on the top/bottom for each pitch (to indicate ball/strike), and drawing a line of colored squares to indicate base-state at the start of an AB and at any moment it changes during an AB. So an AB might look like |,'',** where the line at the front shows the base-state, the commas are strikes, the quotes are balls, and the two stars indicate a double. Of course, this could become quite cluttered and downright horrible if there was, say, a 7-run first inning; this would work better as the "expanded display" of an AB.
So perhaps the real answer is that an interactive display is the only way to keep the nuances but leave a clean presentation - if something looks odd, you can drill down, but otherwise you leave the detail out.
I don't see a clear argument here for why its bad to rely on pure <a href="http://www.baseballprospectus.com/fantasy/pfm/">PFM</a> (provided you do the normal adjustments for injury and what you know about your league mates). The only actual downside you list is about getting cocky once it tells you your team is amazing.
So, what are the actual problems with <a href="http://www.baseballprospectus.com/fantasy/pfm/">PFM</a> that you are trying to correct by making your own limits? Do your limits start from the <a href="http://www.baseballprospectus.com/fantasy/pfm/">PFM</a> and then adjust in specific ways you find helpful? Are those ways actually helpful or do you just distrust the machines?
I don't know if agreements are needed unless the countries impose restrictions on their citizens. I mean, what agreement governs the current system? All this would be is the teams agreeing on a standard for who is and is not subject to the draft. Presumably, professional baseball players from other countries would remain free agents with respect to MLB.
This sounds like an argument for the Offensive / Defensive team. Which isn't that bad an idea - the main reason the pitcher slot is particularly bad at hitting and thus the DH is reasonable is that pitchers can only pitch every 5 days or so (and even relievers tend not to get into more than half the games), so they can't really get consistent at-bats.
Why not just allow teams to choose which players on their active roster bat, and which ones take the field, without any limits (except the limit of at most 9 players on the field at once)? The two "lineups" could be managed separately (so you could pull someone out of the field, but leave them in the batting lineup), but with the same "once you're out, you're out" rule. I suppose you'd also have to add rules about conditions under which you can change the number of "slots" in the field or at the plate. Late in the game, down by several runs? Pull out all your slots except your top two hitters! Actually, the rules would probably have to stipulate that you are not allowed to have fewer than (6 - current number of outs) hitting slots, to avoid silliness with a man on base coming up to hit.
Anyway, this would add lots of interesting choices for managing and for roster construction, and it would mean that two-way players are that much more useful (since you can actually slot in pitchers who aren't pitching to get them at-bats). You'd have to make trade-offs about offense vs. defense vs. pitching, since you only have 25 roster spots. This would probably be even better for salaries than the current system with the DH, since every single slot is potentially an "everyday" player in one or more capacities (instead of having 5-6 "back of the bench/pen" slots).
Does it have to be virgin? LA, NY, Philly, Boston - I bet they could all support another team, though it would sure be an uphill battle for a Boston one.
Yes please to this article and more like it.
By taking the average of the rankings across all years, aren't you basically skewing the top prospects' ranks downward (and thus raising the curve overall)? Would be interesting to see this but with only the last year used (to remove the bias of the scouts themselves skewing towards ETA)
To use a true compromise effect, you would offer the reliever you liked more and preferred not to trade for someone that is more valuable than your target or in a package for your target and a more valuable guy. You'd basically say, "Well, Y is about as good as B" or "Y's a little worse than B, but if I threw in Z for C, that'd be about even, right?" - this is the $95 porterhouse, your guy that's priced to high to really consider. This doesn't have to be an outright fabrication - you can basically choose a guy you'd rather hang on to and legitimately claim that you want a premium for him because you like him. The point, though, is to offer up the guy you like not just to show that you're willing to trade valuable guys, but to make the smaller deal (X for A) seem like a good compromise.
Just wanted to say that I really liked these articles.
Or, can it be added as a column in the PFM?
Does the $17 for Kenley include the injury, or should we assume an update is coming?
How long does it take to rerun PECOTA on a sample like this? In other words, could you tweak the effects of memory (like drastically reduce the weights of previous seasons), and see what the projections look like for these specific players to see if that makes the effect go away (or reverses it)? Would at least give a sense of whether the memory is a good or bad thing, and perhaps a scale of how to tweak it to make it more responsive.
Cuddyer at 1 star? Is that purely because the move from Coors to Citi is terrible, or is also assumed that he will be injured?
I know that it might be harder to make a clear separation, but I wonder if would be reasonable, for next year's lists (or a mid-season update? Hint, hint...) to actually rank every guy at every position he's eligible at.
You'd have to clear define what you mean, but for redraft leagues it would be reasonable to say something like, "If you are looking for someone to fill your 3B slot, where does this guy rank?" That's a reasonable question, whether the guy is 2B eligible or not (though it will often be superfluous, since a guy will be drafted based on his best position and not available to stick at a lesser one).
For keeper leagues (which I assume is the target for the 3-year rankings) you could pose the question as if you already have the guy - "If I keep X to play position Y next year, where does rank against other options for that position (and how soon do I need a new option)?"
For dynasty, it's perhaps clearer to think of it in terms of the position a player will be eligible at over most of the next 3-5 years, instead of one-off best positions, but the waters are still muddied by guys like Zobrist or by uncertainty (ahem Baez). In the case of Zobrist it makes sense to rank him everywhere with the idea being "what is he worth as an X, given that I can also use him to back up Y, Z, and Q?" In the case of a Baez you could rank him as "What is he worth if he ends up at short?" vs. "What is he worth if he ends up at 2b?" etc, and just also give a take on which scenario is most likely.
I understand why you guys did it the way you did it, and there's logic to it, I'm just saying that it is also logical for people to ask the questions they are asking - the general idea that 3B is worth less than 2B, for example, doesn't help you decide whether to trade Rendon or Seager if you are looking to have only one 3B at the end of the year. Likewise, the overall rankings don't help with this question, because they will presumably include the value of Rendon as a 2B this year.
Reyes last year: .287 / 94 / 9 / 51 / 30
Reyes projection: .286 / 84 / 11 / 60 / 29
Escobar last year: .285 / 74 / 3 / 50 / 31
Escobar projection: .259 / 73 / 4 / 46 / 30
Alexei last year: .273 / 82 / 15 / 74 / 21
Alexei projection: .266 / 62 / 9 / 59 / 21
Hanley last year: .283 / 64 / 13 / 71 / 14 (512 PA)
Hanley projection: .265 / 74 / 18 / 71 / 18 (580 PA)
Seems like PECOTA thinks only Reyes can keep his BA from last year, but almost everything else will stay the same (except that Alexei is going to have less power and a harder time scoring). How much of that $ value for Reyes comes from his projected BA? If you dropped it to Alexei's projected level, he'd still be 20 runs and 8 steals better, but that only gets him to like $14-15, right?
Which brings up a different question - if, instead of Reyes failing to hit .286, you instead assumed that a bunch of these other guys would also hit the same as last year (except Tulo) - that would bring up the BA available at SS, but would it significantly reduce the value of Reyes, or only slightly reduce it?
This is exactly the flaw, though - the PFM is being forced to put a $ valuation that adds up to spending all money on the 168 best guys, but when you compare it at the end of the season, a huge number of those guys earn 0 (because they were outside the top 168). I mentioned this last year, but basically PECOTA is making projections and then treating them like end-of-year values, and it shouldn't be.
To make a simpler example, let's assume a points league (though I think this still applies, in a more complicated way, to the composite or stat-level valuations in a 5x5). So I have my projection system and it says that on average across all outcomes, player A is going to score 550 points. So, now we look at all the projections and we see that the #169 hitter is projected at 510 points. We go through and add up the difference between the top 168 hitters' projections and 510, and get a grand total of 10800 points above replacement, or 5 points per $ (to keep it simple). So we value player A at (550-510) / 5 = $8.
Our system came to its overall projection, though, by projecting a bunch of different possible outcomes, say scores ranging from 450-650, and giving each a likelihood (or a likelihood of being in it's neighborhood). So if we divide up the range from 450-650, perhaps it believes that every 10 point bucket along that range has an equal chance (5%) - it wouldn't be uniform like that in reality, but I don't think the real distribution changes the argument. That means that there's 30% chance our hitter will produce $0 or less (the buckets 450-460, 460-470, etc up to 510), then for each bucket above 510, we are adding $2 of production with a 5% chance, so we have $2 times the sum of all numbers from 1 to 14 (the 14 buckets up to 650) times 5% = $210 * 5% = $10.20. So the player earns an average of $10.20 in our post-season analysis.
Now you might say "well, if we let his value go negative we account for this," but that isn't true, because the value of a dollar is set based on this production. If we assumed that literally everyone else in the league exactly nailed their average production (which is, on average, a good assumption), then we are saying that 30% of the time player A is replaced in our list by player #169 (who earns 510 points), and player #170 becomes the new replacement level (let's say he earns 510 also) leaving us with 40 fewer points above replacement, so decreasing the points-per dollar to 4.98 or so, basically distributing player A's $8 across the rest of the top 169 in proportion to their previous price. When he hits that 650 high-end and earns $28, he is pulling that extra $20 out of all the other players. So his effect on prices is only proportional to his performance when his performance lands him in the top 168 - the rest of the time it doesn't matter if he earns 508 points or 8 points.
So this is how the PFM should do the calculations - it should calculate the averages, then for each player it should take a set of projections, and calculate the $ value of that projection against the weighted averages for the rest of the league (but with a min of $0) and then multiply the projected $ values by the weights for the projections to come up with a weighted average $ value.
This will give positive value to a whole bunch of guys beyond the top $168, which is accurate (those guys have some chance of being worth paying for). I won't be possible for your league to spend its money that way (unless you have really deep benches), but what it will do is give you an accurate sense of how much someone is projected to earn. It will still miss on some players, but there is no reason for it to miss so wildly on aggregate groups when the stat projections themselves are accurate.
Can I flag my post as inappropriate for failing to use serial commas?
If you look at the blue dots for Donaldson, Seager and Johnson, or the red ones for 'Nado, Santana and Castellanos, I think the idea is that the white of the 2015 bid changes the color of the dot if it falls on top of the previous bid or previous production.
I wish these graphics were more effervescent. Could you add a shimmer of some sort, and perhaps a few bubbles? The bubbles could represent projected playing time.
How did you decide on the organization of the vertices for the radar plot? Personally, I would have put AVG at the top, SB in the bottom left, HR in the bottom right, R in the top-left, and RBI in the top-right. So your typical slugger would have a triangle heavy to the bottom-right, your pure hitter would have a blob at the top, your speedy guy would have a blob to the left, your Dee Gordon would have a triangle to the bottom-left. That's just a guess at what would be more readable, though, so I was wondering if you had tried out different layouts and discovered this one was better? I guess the default way to do the layout would be to maximize the correlation between adjacent categories, but I'm not sure that's the best way in this case, where there are distinct player profiles. In other words, you'd ideally cluster all the players first, then make sure to set up the vertices so that each cluster had a unique shape.
I like having the lines there to show where you're at on the scale as a plot grows or shrinks, so I guess at the high level of "big shape = more value" the plots are useful.
Curious about how you weight the loss of eligibility in these rankings. For example, Odor and Profar - it's possible that they will both hang onto 2B eligibility, but is it likely?
For catchers, it's kind of obvious that even a really good one loses a ton of value if the C goes away (don't we all?), especially since it typically means 1B or DH-only, but going from 2B to 3B or SS isn't necessarily a hit to your team (ahem, Rendon), though of course, it can create a logjam if you also have someone good at that position.
On TINO you guys have asked questions like "is he an OF3 if he's only OF eligible next year?" Is being playable at the new position enough to not discount value, or are you factoring in a 0 times the probability the guy moves off the position? Or is it somewhere in between (some approximation of his value at the new position)?
Yeah, I think this is a key point - no one was going to take exactly the right guys in reserve round of their draft, but by hitching your wagon to a rookie who may or may not be up in a few months, you prevent yourself from picking up the guys that will inevitably come out of nowhere.
Another factor that might go into these calculations is that the maximum upside isn't as high as it is in, say, a business portfolio. You can't take on 10 risky bets and make out well if only one of them hits, because your risky guys probably produce between -1 and 1 WARP most of the times they fail, but are unlikely to get above 4-5 WARP even when they succeed spectacularly.
Now, if making the playoffs was a winner-take-all cut-off, you could see that distort the calculation, and in fantasy leagues it often does - coming 4th doesn't help if there are only prizes for 1-3, so if you are stuck in 5th, you might as well roll the dice and hope you hit a hard-way. In real baseball, though, a team that wins 86 games but was in the playoff hunt the whole way does worse that one that wins 88 games and makes the playoffs, but the difference isn't huge (unless they go on to win the world series). So the marginal utility of those extra 2 wins is capped, whereas the marginal utility of the extra category point that puts you in the money is infinite.
Did VMart not qualify for this list, or do you not see him gaining much in points leagues despite his silly K rate last year?
Yup that makes sense, but you said that catchers were tied for second among positions with that 7.7% walk rate. So if, say, second basemen walked at a 7% clip but hit .250 in 27000 PAs their math would be: .07*27000 = 1890 BB, leaving 25110 ABs * .250 = 6277.5 hits, 8167.5 H+BB / 27000 PA = .3025 OBP. So their raw OBP is still higher than catchers (who had .3022) but by much less than their BA is, so .3025 - .250 = .0525 difference, less than the .0582 for catchers.
Note, I made up the numbers for second basemen, but mathematically I don't see how you can have a group that had a higher collective batting average and lower collective walk rate and yet a higher collective OBP-AVG. If catchers had the second-highest walk rate and the lowest AVG, they must have had at least the second-highest OBP-AVG
I think it would be interesting to include a term representing total career pitches seen (though I'm not sure how early in the career this should start - do college PAs count? HS? Should it only be pitches - or fastballs? - with >90 mph speed?).
The reason that term could be helpful is that one of the big factors making ML hitters so good is their ability to "chunk" visual images of the type they usually see. I read an article that was discussing why Jenny Finch was able to regularly strike out major league hitters (http://www.si.com/more-sports/2013/07/24/sports-gene-excerpt) that argued it had a lot to do with disrupting the typical patterns they look for when at the plate. Using those patterns essentially gives hitters more time to react because they grasp the important information faster.
So I'm suggesting that hitters with more exposure to fastballs (and actually, now that I'm thinking about it, more exposure to an individual pitcher) will essentially react faster, even though their baseline reaction speed isn't any different.
Definitely excited about this column.
One question: "Their collective efforts ranked last among position players in OBP-AVG differential last year, driven almost entirely by a group batting average a full six points lower than any other position. Their 7.7 percent walk rate tied with outfielders for the second-best rate among the six positional groupings, but the net effect of that strong rate was diminished by the lower average."
I don't understand this. If they are tied for second best walk rate and the lowest AVG, shouldn't they have the best or at least second best OBP-AVG differential? Just as an example, a group with a .250 BA and an 8% walk rate should have (.25 * .92 + 0.08) = .310 OBP, while a group with 8% walk rate and .260 BA would have (.26 * .92 + 0.08) = .3192 OBP. So OBP - BA = .0592 in the second case and .06 in the first.
I think it would be great if the ballots were public, but it does make it harder to keep borderline candidates around in stacked years. I wonder if they should change the rules about keeping candidates on the ballot to be a fixed (lower) number of votes, instead of 5%. Like 5 votes and you stay on. Or even 1 vote. Then you can make the ballots public, people can all vote for the players who are obviously worthy (or else defend their non-vote with some actual reason, instead of using a non-vote for Pedro as a strategic way to keep more players on the ballot).
No discussion of Norris - his place on the list puts him as a high-end C2 in a 12 team league, but is there still some danger of him platooning? SD's not that much worse than OAK, and the improved lineup and potential for an every day job gives him enough extra opportunity to out-earn last year, right? What should we be expecting from that tier? earnings in the mid-to-high single digits in mixed leagues?
Seems like Beane et al think the undervalued commodity is guys somewhere between replacement and average with lots of club control. Though I guess there could also be some (perceived) undervalued talent in the minor league haul (either individually or in aggregate).
I'm curious what their aggregate payroll looks like over the next 2-3 years compared to how it looked before the Russell trade. If they are 5-ish WARP worse next year, how bad is it in 2016, and how much did they gain in 2014? How much money did they save per win they dropped?
I've mentioned this on other articles, specifically as it relates to fantasy, but I think one of the best ways that PECOTA (or most other systems) could be improved is to really bake in the concept of marginal value and attempt to create realistic weighted outcomes.
Basically, if we reason from the basic conceit of PECOTA (that looking at the career trajectories of previous players whose careers most closely match the player of interest gives us a good prediction of the player of interest's future career), we have to make some choices about how to integrate data that is telling us different things.
To take an example, if I look at Player A, whose top 5 comps hit .250/.325/.425 the following year on average (after translating their production into 2014 league context), I could say that the most probable outcome for Player A is to hit .250/.325/.425 next year, then apply a normal distribution around that (pulling the variance for each statistic from the comps), and then perhaps apply some other modifiers, like league-context adjustments for 2015, or ballpark effects, etc. That gives me a range of possible production values, like maybe 1-stdev above is .275/.340/.450 and 1 below is .225/.310/.400. Another approach would be to generate a 20th-percentile slash line by just taking the line of the worst comp, then the 30th-percentile could be the average of the 2 worst, the 40th percentile the same as the 2nd-worst comp, and so on, so the 80th percentile is the performance of the best of the 5 comps, and if I have even more comps I can generate higher (and lower) percentiles. And similarity scores can be used to weight comps and improve the accuracy (though they also add noise, potentially).
So that's great, I've used league context to adjust the performance of comps and applied it to my future performance. If I have a normal set of comps, though, there are some duds and some studs in my list, so the difference between a 90th- and 10th- percentile performance is pretty big. Saying "there's less than a 10% chance he ends up with a .290 TAv" is useful, but it means that a whole bunch of folks will miss their average projection by a whole lot. When people ask how someone is going to do next year, they are really asking, "How's he going to do when he's relevant to my interests?" They don't want you to average the 1 time in 4 that he gets -1 WARP with the 2 times in 3 that he gets 1 WARP and the 1 time in 10 that he gets 4 WARP and say that he projects to get 0.8 WARP. The question is really asking about that 3/4 of the time that he's a solid major league player. In those cases, he's a 1.4 WARP player, on average.
I'm not sure why we should expect the distribution of outcomes like OPS to be Gaussian. Skill might have a normal distribution, but the chances of being injured in a way that affects performance, BABIP, and non-injury skill changes (up or down) are complex systems that are much more likely to follow power-law distributions.
It pretty noticeable following the PECOTA predictions each year that the system is very conservative. Statistically, that's justified - if you have an outcome with a 1/100 or even 1/10 chance, you are better off assuming that it won't happen than assuming that it will, so your single projection is going to be too conservative or too aggressive on guys who hit those low-probability shots, even though it will be accurate on a grand scale. In other words, getting the mean approximately right doesn't require you to properly estimate the variance.
The question is, though, what do we want a projection system to do? If you were making a projection system for rolling a bunch of dice with different numbers on their faces, would you ask it to tell you the expected value of each outcome, and be satisfied that it was going to miss high & low on some of them? That's what the systems are doing now. Asking them to predict which dice are going to roll their highest values is unreasonable, so you need to judge the success on something more nuanced than just RMSE. The projection needs to give different possible outcomes with a likelihood attached to each (or return some formula for generating the probability distribution of different stats) in order for it to be judged on anything but aggregate accuracy.
So for a normal die, you would want the algorithm to tell you there's a 1/6 chance of each number 1-6 appearing, since that's the best it can possibly do at predicting. Then you need some method of determining whether or not those predictions were accurate when there's no way to repeat the experiment. For example, figuring out how likely the actual result was, given the projections.
So from a fantasy perspective, I guess the interesting next question is: how well is next-season fantasy value predicted by HT qualification? Would probably need to compare it to using this-season fantasy value and this-season K% - BB% as predictors since those are pretty common ones. I'm going to do that analysis for my league, I think, but it's a quirky points league.
I will mention that the importance of the shift in the mid-point of pitchers vs. hitters is much larger in points leagues than in rotisserie.
Great article. I have one quibble, though: I find that the people who complain the most about trades aren't usually people who would actually have offered more for the players the "winner" got. It's more often people who would have or did offer less for the players the "loser" got.
People who look at a trade and think "I would have offered more for those guys" need start talking to other owners more to find those opportunities, but people who think "he should have had to pay more for those guys" or "he totally overpayed for those guys" are not looking to make more trades, they are looking to add a social "tax" onto trading that prevents other owners from gaining the value they are missing out on. Often this is because they are either gun-shy themselves or don't have as good a relationship with certain owners and therefore feel left out.
What about a bread bowl with chowder in it?
Cheesesteaks, tacos, hot dogs, pizza, certain styles of hoagie - these things are all characteristically different from sandwiches: ingredients are laid on a piece of bread, then folded up and eaten sideways, instead of being squeezed between two pieces of bread.
Pizza, especially when eaten properly has many of the same characteristics as a hot dog. This does not make it a sandwich.
Maki is not a god damn sandwich, even though you've got one element enclosing another element. A jelly doughnut is also not a sandwich. Neither are combos, or M&Ms.
A bagel, sliced in half and eaten with cream cheese and lox on top, is not a sandwich. Put the two halves back together and it is now a sandwich.
This illustrates the real point - it's a sandwich if you are sandwiching something between two flat pieces. A KFC Double Down is a sandwich, no matter how horrifying. A taco isn't one. Neither is a burrito, tax law be damned.
If you need a super-class that encloses all these things for some nefarious purpose, I suppose you can call them whatever you want. But that has as much validity as referring to a piece of toilet paper as a Kleenex. We get what you're saying, but I'm still going to narrow my eyes as I hand it to you.
So pizza is a sandwich?
Leaving aside the issue that of course draft status doesn't really matter anymore and evaluations of the player himself should be about what he can do, as opposed to evaluations of the draft-day decision that lead to him being picked, I think that latter question is actually still an interesting one. What does Moran have to produce to have been worth a #6 pick?
I guess there's two ways to evaluate that. One is to compare him to historical #6 picks and see where he shakes out. Clearly if he produces positive WAR in his career, he will be an above-average pick looking at the whole history of the draft. That's not entirely fair, though, since scouting has improved so much. On the flip side, you can't really compare him to guys like Alex Jackson and Albert Almora directly in a numerical way, since none of them are in the majors and they are all playing in different contexts at different ages. So perhaps the most fair historical comparison is to look at the last 10 years or 15 years (or wherever you think a reasonable cut-off is for "modern" scouting) and ask, "knowing what I now know about each of these players, where would Moran rank if I was drafting only their careers to this point from draft day?" So, for example, this assumes that Ricky Romero was back to his draft-day age, but that his next decade is going to look like his last one.
That seems like a reasonable ranking. Perhaps you'd push various guys up or down a peg, and maybe Moran ends up better than Detwiler, or worse than Miller and Skipworth. Maybe Almora and Jackson flame out also and Moran ends up in 4th or 5th, but it seems like best-case he's the 4th best #6 pick of the past decade. And more likely he slots in between 7th and 9th.
Another way to look is to look at opportunity cost. That's probably the most fair way to look, since it's not like the Marlins could have taken Barry Bonds or Derek Jeter in that slot. We can do the same thing... what if instead of Moran, the Marlins had taken one of the next 9 guys at random? Which of them would have been a better pick?
Hunter Dozier (35.9)
Austin Meadows (29.1)
J.P. Crawford (20.2)
Colin Moran (15.2)
D.J. Peterson (14.8)
Reese McGuire (10.8)
Braden Shipley (8.7)
Hunter Renfroe (5.4)
Trey Ball (3.2)
Dominic Smith (3)
Going purely by peak 5 upside, and ignoring Bickford, Moran shakes out at 4th. Peterson might be a better prospect because of his power, and McGuire because of his position, and there's some split opinion about Smith's potential, but I think it's pretty clear from this list than Moran wasn't the best possible pick, but also wasn't a particularly bad one.
So I think his problem is really that's he's harder to dream on. You can look at guys like McGuire and Peterson and say "if they put it together, holy cow", whereas guys like Moran and Smith you think, "well, I guess if everything breaks perfectly they could be John Olerud"
At the end you say that the HR numbers even out at number 40 - was that supposed to be 140 or 240? Since number 40 you already said was Rendon with 20 HRs, I'm just curious how far down the list we have to go to get to parity with previous seasons.
League-average is easier to calculate, but it accounts for a lot more people than are typically in a fantasy league. The average SLG for the top 120 or so sluggers would be a more interesting number, though also potentially misleading in that some of those guys would be SLG-only dudes you wouldn't want to start unless you were desperate. It's encouraging that Xander's .362 SLG is only a little below league average, though.
As for factors, I think one thing that should be considered is expansion. Offense skyrocketed in the 90s, and it might have been the case that PEDs were more prominent or more effective then than previously, or that workout routines were suddenly much better and the exploding salaries allowed the hitters to be in much better baseball form than previously, but really a lot of those things should affect pitchers as well as hitters - being able to work out more both because your muscles are juiced and because you don't need an off-season job / have access to far better nutrition and exercise expertise should be helpful regardless of your role.
The factor that everyone predicted would lead to the single-season HR record being broken, though, was expansion. The league added 4 teams in 5 years, creating about 15% more pitching jobs. The result should have been two-fold: pitchers who would have been in the majors with 26 teams should have gotten better, but also all hitters should get better, and especially those at the top, who can really feast on AAA-quality pitching. What we saw was actually that - guys like Pedro, Clemens, Johnson, Maddux, etc, and guys like Bonds, Griffey, McGuire, ARod, and so on down the line to the guys in the middle having better years. Pitching lagged behind because it's harder to do and because development of it was less important previously. When hitters started going nuts, and all these new pitchers were needed, teams began to realize how far behind their pitching was and how much more of it they needed to insure against injury.
So teams got better at scouting and developing pitching talent, and they put a focus on stockpiling and preserving arms (for example, the development and application of pitch count limits and innings limits). I also wouldn't be surprised if the draft, especially outside the first round, put more emphasis on getting players with the raw ability to pitch in the majors, letting fewer of them slip to other sports or end up delivering pizza. Now we are seeing the benefits of that emphasis as the league has grown into its size.
One thing that may be helpful in setting up an analysis like this is the idea of creating clear criteria for the decision tree you are setting up. It will be inherently difficult to judge "I will get the 3 best cheap guys I can find at the end of the draft, then swap them if they struggle or someone better comes along." That's tough to judge because "struggle" and "someone better" are loosely defined.
I did a similar type of analysis, regarding my plan to invest heavily in relievers in my keeper points league's auction draft. There's obviously a ton of opportunity costs involved: I could have spent that same money on hitters or starters, either spreading it around or dumping it into a single player, and hoping to get some great pickups at RP.
The approach I took to evaluating the players I could have had was:
1. Identify all players who might have fit a distinct strategy (who are the mid-range guys I could have afforded? who are the breakout guys I could have picked up? etc).
2. Figure out what those players produced *while actually on a roster this year*
3. Figure out the average production per game I could have cobbled together assuming that I didn't have any better knowledge of breakouts than the people who did roster the players
4. Compare that to the average value of all RPs at the top end of the price range
5. Choose a contending team that spent very little on RP in the draft, and calculate how many points that team got from RPs
6. Compare the marginal points-per-dollar I got from my RPs relative to his to the marginal points-per-dollar I could have had by upgrading my hitters or starters to one of the scenarios I identified.
The important points here, I think, are
1. I didn't just look at the guys I did select (since that's a separate process than the general one of "spend money on RPs"), I also looked at guys that other people selected, and
2. I didn't compare to the best-possible guys I could have picked up, or to the average undrafted RP, or any other artificial construct, I compared to the actual guys actually picked up by others who needed them and the performance they had while on a team (Yahoo makes this easy, since the Team Logs show performance while actively slotted) - this accounts for injuries and guys who got hot, were picked up, and then got cold or who got cold, were dropped, and then got hot.
So perhaps one of the articles of your series could focus on the process behind evaluating process ;)
If you guys are looking for topics for future articles, I think a under-discussed area is mid-season updates on former prospects. Not necessarily post-hype guys, but guys who have been in the majors enough to not be prospects anymore, yet still show room for growth.
In some sense, this is similar to the U25 rankings, but instead of fitting in young MLB players vs MiLB ones, it would be about re-assessing the ceiling, expectation, or floor for guys who were, say, top 25 or top 50 prospects on one of the last 3 preseason lists (to pick an arbitrary group of players) and are in the majors now.
I don't really understand the complaints about these sorts of trades, unless you have a farm-team effect where one owner only really trades with another specific owner. Even then, I think it's a bit silly to worry about it - if everyone thinks it's a bad thing, they will form their own coalitions.
To me, the more important problem is concentration of talent. You shouldn't be doing things to make it harder for the worse teams to get cheap talent, you should be making it easier. Something like the old free agent compensation rule seems more appropriate. Like, if you choose not to keep a player who was on your roster at the start of the previous year, you get to spend an extra 10% (or 25%) of his salary in the draft. Or with cut-offs: if you don't keep a $10+ player, you get $3 extra to spend, $20+, you get $6, etc. So now you aren't looking at a player with 0 value to you vs whatever someone will offer, you are looking at a player with a minimum value to you so any decent offer has to exceed that value.
My league uses a flat salary appreciation ($465 total budget for 30 players, anyone can be kept at their draft salary + $7, draft salary = 0 if they were undrafted), which ends up being a fairly harsh drag against all but the absolute best bargains. Looking at the 2009 draft, there are 11 players who have been kept every year (meaning that they were considered worth it at $28 + their 2009 draft salary). Most of them were either $1 guys that year (Latos, CarGo, Strasburg), or cheap keepers from the prior year (Kershaw, Hamilton, Pence, Votto, Cano, Cliff Lee). The other two were Verlander and Cabrera. Also, Beltre, Posey, Price, Adam Jones, and Carlos Santana were picked up after the draft and kept since. So that's 16 players out of 256 (counting the DL slots, it's an 8-team league) after 5 years. It has the effect of making borderline keepers pretty low value, but its really organic and flexible. You can make a bet on a Xander (who was drafted at $7 prior to the 2013 season) and just stick to it looking for that payoff, you can double-down on guys like Cabrera (kept at $58 this year) and Darvish (kept at $64), or you can bargain-hunt the auction ($9 Cole Hamels? $10 John Lester?), or you can stash all the prospects and see who shakes out (Gausman, Stroman, Bryant, Rymer, Walker, CMart, Odorizzi, Heaney, Salazar, Miller?). There's decisions all up-and-down the cost spectrum.
I had some more specific questions, now that I've gone back to look at Brett's list:
You both have OT and Bryant above Buxton. Is this a preference for proximity, or have you seen something in the past month that either downgrades Buxton or upgrades the other two? Is Buxton still a "perennial top-5 pick" ceiling?
Likewise, Noah Syndergaard is up about 4 spots, while Sano is down 5 on your lists relative to Brett's. Is that recency bias, has Syndergaard shown something a lot more in the past month to get the upgrade? Has Sano lost ground, or would the injury have made him lower for you guys even last month?
I'm curious about the process you guys use to put a list like this together.
Do you start with a previous ranking (say, Brett's mid-season list, or the preseason one) and a list of guys who've impressed you that you'd like to include, and then go top-down, comparing players 1-1 (or removing guys who "graduated") and looking for places to slot in your extra guys?
Do you start with a much longer list like a top 250 and group the players into tiers, then refine the individual positions within tiers? Are the tiers based on overall ability, or do you break out by profiles like OFs, IFs, Cs, SPs, or by dominant tool and later stitch the list back together?
One analysis I haven't seen, though I would be surprised if it hasn't been done, is to base the measure of clutch hitting on the probability and potential of the result, not just the effect of it. So, for example, if you are up in a 1-run game with runners on 1st and 3rd and 1 out, hitting a deep fly ball is clutch, even if the outfielder throws out the runner or the runner stumbles or whatever. Likewise, hitting a walk-off homer in the bottom of the 9th is just as clutch as hitting a game-tying homer, even though it increased the odds of victory by more. Basically, if BABIP is noisy, and a batter's ability to improve in the clutch is noisy, it would be good to try to remove as much BABIP noise as possible when measuring clutch ability.
You could argue that a better analysis would be to categorize the PA into more controlled outcomes (BB, SO, GB, FB, LD), and see how the predicted results of an outcome of that type would affect Win Probability, with respect to the maximum (positive & negative) effect they could have. So if the worst you can do is drop the probability from .4 to .35, but the best takes it from .4 to .7, then you get a little credit for leaving it at .4; if the worst takes it from .4 to .25 and the best from .4 to .5, you get more credit for leaving it at .4. Or, in other words, some of clutch hitting is avoiding negative results at important moments.
The other aspect that I'm a little less clear on is the probability of a result. WPA should take into account the typical distribution of results in that situation, but it has less to say about the rarity of a particular outcome. Even normalizing for leverage doesn't exactly remove that as a factor - a rare outcome probably changes the WP by more, but not necessarily in proportion. This could actually cut both ways - perhaps HRs are a powerful enough outcome that they are actually disproportionately represented in WP.
I think if we are talking about fans watching certain players and saying that guy is clutch, we have to assume that for that to really be true in a way that is meaningful, the effect has to be fairly large (at least, on the order of perceptible difference in BA). But when a fan watches a situation, they don't see "well, a walk only adds .01 of a win, a double adds .05, and a homer adds .2" and then judge the double as a mediocre outcome - they see the possibilities of making an out, driving in a run, or getting on base. They are more likely to call a guy who drives in runs consistently a clutch hitter, obviously, but a guy who strikes out, grounds into a double play or hits a lazy fly ball is going to seem a lot less clutch than a guy who hits is hard somewhere. So maybe the question we've been asking is "do hitters accomplish anything useful with clutch hitting, and can they repeat it?" and we should really be asking only one of those questions at once.
Are the Reds not interested in Daniel Murphy? Haven't heard any recent rumors about the Mets' thinking on him, but Murphy for Ervin might make sense if the Mets thought the struggles at A-ball were a blip. Is that reasonable or do the Mets want more? Is that deal unattractive to the Reds?
In points leagues he's still pretty bad though. I was worried coming into the season, but I had him for cheap and he was so highly touted. The Ks just kill him though, he really needs to have a SLG over .550 to make a 30% K rate play, even his 2012 slash line would be borderline unplayable except in a deeper points league.
Random number generator is random.
I'm wondering if Upside would be better defined by taking, say, the average of the top 10 real seasons (i.e., not projected ones) after weighting by similarity, instead of averaging the non-negative WARP.
The idea is that it's supposed to measure how awesome the player could be (whereas the projections measure a risk-adjusted version of that). But there's a big difference between a guy with 12 comps that project to 1-3 WARP / year (and 8 that project to less than 0), and a guy with 4 comps in the 4-5 WARP / year range, and 12 in the 0-1 range.
You could even argue that the upside on a player is his single best similarity-weighted comp. If there's a chance you're Mike Trout, you have more upside than a guy whose best-performing comp is Angel Pagan.
Just wanted to echo the comments that this series, and integrating it into the player cards, is amazing.
Love the series. I had a question about this statement:
"PECOTA tends to compare players to others at the same position, but a shortstop in Low-A today might not be a shortstop in the major leagues, because he might not have the range for the position. Nevertheless, teams will keep that potential shortstop at that position in the minors, in hopes that he’ll develop at the position—he has positional upside, after all."
Does PECOTA not account for players who were, say, playing SS at the same point in their career, but ended up elsewhere? In the case of catchers, specifically, would a catcher today get Neil Walker as a comp if they had a similar profile through the same age, or is he eliminated because we now know he's a 2B? If the latter, then why?
More generally, it seems like it should at least be possible to account for the general trend of positional shifting when projecting minor leaguers. If (to pull numbers out of nowhere) 30% of players who are catchers at age 22 and reach the majors do so as 1Bs, 10% become 2Bs, and 10% become RFs, couldn't the positional upside of a player be discounted based on that trend? Or would this step on the toes of the comps?
I'm curious about this line: "Over the past season-plus, the Mets have about the same BABIP with him on the mound as they do with anyone else on there."
If the Mets are an above-average defensive team, and Wheeler has an above-average BABIP for a starter, then does this sentence mean that relievers & spot-starters tend to experience a higher BABIP that drives up the league average overall? Does it mean that the Mets have a park effect that results in higher BABIP?
Would be interesting to see the full version (perhaps trained on 3 years of data), incorporating RA to make a wins prediction.
Since the model's strength is its simplicity, I'm curious what the weights it's using are. Are you using one set of weights derived to optimize the predictive value at all points along the curve? Would it be better / close to just use [PECOTA * 162 + RS * (GP / 162)] / 2?
Very interesting that teams scored more of their runs early in the season, so improving on PECOTA's projection requires even more RS. If that's really a trend across all seasons, it has a lot of implications for fantasy...
I enjoyed this article and would love to see more like it.
This whole topic is really complex, since establishing those expected values for wins and price-per-win is not a trivial process. Even if you figure out how many wins each available player/employee is worth to each team, and how much total money each team is willing to spend, you still have to make guesses about how they are going to split their money when they can't have all their optimal players (will they pay full price for the one that gives the most marginal utility? Will they go after a bunch of smaller utility guys that have lower costs? etc), and that's really tough to do, even in the more straightforward setting of, say, a fantasy auction draft.
Even if you try to aggregate the numbers, you can easily make a really bad estimate. Let's say that there are 3 teams, and 5 players. Team A has $20 mill to spend, and will get 5, 4, 3, 2, and 1 win, respectively, from the players. Team B has $15 mill and will get 3, 3, 4, 3, and 0 wins, and Team C has $10 mill and will get 6, 3, 3, 4, and -1 wins.
5 4 3 2 1
3 3 4 3 0
6 3 3 4 -1
If you got price per win by summing the maximum win value for each player, and summing the total cash available, then dividing, you'd get 19 wins, and thus ~$2.37 million / win. If you took the average win value of each player, you'd get 14.33 wins, and thus ~$3.14 million / win. But with neither of those estimates can team C afford player 1, so not only is there a big discrepancy between the estimates, there's also going to be an effect from budget concerns. Which is why certain players not signing can cause the whole market to pause. Until player 1 is off the table, team C might want to wait and see if players 2 and 3 go for high enough prices to give them a shot at player 1.
I'm not saying you're wrong about that - I both agree with you and would trust your valuation over mine. I'm just saying that the reason that's true is because the market's valuation is accurate, not because straying from market's valuation would be bad regardless of how accurate it was.
If the roto value produced by the best pitchers on draft day was actually $110/team, then that would mean that the value produced by the best hitters on draft day would be $150, and thus you would be able to get a competitive piece of the pie if you were able to spend $150 on hitters, either because you made up the difference with pickups or because the value delivered by the top hitters was less than folks paid for them.
The point does still stand, though, that you can't use your split to assign values to the players - you have to estimate their market value, then decide how you want to split your budget. So you can't just drop the bid limits on every hitter by 15% and up it on each pitcher by 15% and then settle in to look for bargains from there - you have to have a better way to control your spending or you will pass on so many hitters that you can't spend all your money or you end up with a split that is out of whack with your target.
I think it would be interesting to see what strategies different folks apply to make sure they stay on-budget. What do you do if your league's valuations are weird / different from yours? How do you make sure that you don't blow your whole budget early or wait too long and miss out? In my keeper league, I put together groups of players at around the same value that I was targeting because I thought the league would undervalue them, then set limits for myself on how many players from each group I wanted to draft.
I don't get the premise here. You are saying that if I value the top 6 hitters more than other folks, then I will end up buying all 6 and spending all my money. But that could happen even with your bid limits - if no one was willing to go up to $1 less than your limits on those same 6 guys, would you end up buying all of them? Would you end up with Verlander, Felix, Darvish, and Sabathia given your limits vs. the avg salary?
The 70/30 split can be the default market position because it's the default market position, but that doesn't mean that it's the best way to spend your money. If you think the league undervalues pitching as a whole, and the split should really be 55/45, then you can go ahead and draft Verlander, Price, and 2 of Felix / Weaver / Darvish at +1 vs. the average salary, and spend $1 each on some guys to fill out your budget at around $117, or you can stop at Verlander and Price, then look for some mid-range guys later since your budget is depleted and you don't want to rely on $1 guys.
tl;dr If everyone looks like a bargain, you still have to fit them into your budget constraints. That's not a semantic difference vs. the market's split, unless you assume that the market's valuation is the correct one.
Just curious, would you consider dropping an injured Adam Lind for Singleton, or is that getting crazy?
Are PECOTA projections re-run taking into account stats so far, or are they only run prior to the season?
Interesting. And does it use projections for the comps, or only players who had actual production that could be compared?
Assuming it uses projections (given the comparables list on the PECOTA cards), I wonder how it would change if only completed seasons were used (so therefore for prospects all the comps have to be from 6-10 years ago or more)
Oh, so average really is league-average, then, not average of the comps?
To continue talking to myself here what happens with this example:
Guy A has 19 comps with PEAK5 of 0, and 1 with PEAK5 of 25, so a 95% chance of being replacement-level, and a 5% chance of being a 5 WARP/year player. UPSIDE = 50.
Guy B has 15 comps with PEAK5 of -5, 4 comps with -2, and one with 25. So the average is -2.9, meaning that the -2 guys count as above average? So this guy has upside of only 34.
Maybe that's alright though - if your floor is basically replacement level but you have a chance at being an MVP, maybe you are better than a prospect who will either flame out or become an MVP.
Thinking about this a little bit more, here's a guess:
Let's take a player like Bubba Starling, whose top 5 comparables are Brett Jackson, Daniel Fields, Julio Morban, Austin Jackson, and Michael Saunders.
Their PEAK5 WARPs are: -4*, -0.8, -6.4, 13.2, 6.1
The average is 1.62. The sum of the above-average ones, times 2, is 38.6, which is meant to represent an average PEAK5 of just-under 10, so I can't just divide by 5 to get the average it corresponds to, because it depends on how many players ended up counting as above-average.
*This one is approximate, since I didn't want to try to translate his minor league stats from last year into major-league WARP
Really excited about this series. One question about the calculation of UPSIDE - you say it's "above-average" WARP. Above-average relative to what?
Correct me if I'm wrong, but the calculation seems to go like this:
- Find the 20 most-comparable seasons (which means the players who, at that point in their career, had had a career most similar to the one being considered).
- For Buxton, this would be Trout 2012, FMart 2009, Daniel Fields 2011, JUp 08, Heyward '10, Cutch '07, Angel Morales '10, Allen Hanson '13, Domingo Santana '13, Jaff Decker '10, Yelly '12, Luigi Rodriquez '13, Myers '11, Gose '11, Eddie Rosario '12, Billy Butler '06, Tabata '09, Snider '08, Nimmo '13, Taveras '12
- Starting with those seasons, get the actual or projected performance for each player in their best 5 years prior to age 28 or the 5 years following the comp if they were at least 24. It figures some average to compare those WARP values to, and sums the above-average ones.
- So that produces 20 5-year sums of WARP. For each, we double it (presumably to scale it to a similar level as the previous non-negative WARP version?) and multiply by similarity, then we add them all together and divide by the sum of all 20 similarities
- So now we have a number. What does this mean? For Buxton, the number is 219.7 which is way more than for anyone else. Yet his long-term PECOTA forecast (which is also supposed to be a weighted mean) has his best 5 years prior to age 28 as 4.5, 4.0, 4.8, 5.1, and 4.7 WARP, respectively. That totals to 23.1, so a) why is his UPSIDE not in the neighborhood of 20 x 23.1 = 462, and b) why would UPSIDE be more reliable than his PEAK5 WARP?
Love this discussion, thanks for humoring me - this is exactly the sorts of topics I was excited to see discussed with the expanded fantasy coverage. Wall of text incoming...
Even though we can't predict who will be injured, who will have unexpectedly poor performances, and who will have unexpectedly good ones, we can still estimate the odds of those things occurring. Even if those estimates are useless/unreliable on an individual level, we could at least apply the aggregate chance to everyone.
What we would see, I suspect, is basically what you're describing in the article, but with the ability to more easily adjust to different formats/leagues sizes/etc. Guys who are close to but just above replacement level will have the bottom part of their performance curve cut off, and thus bleed value to the guys below replacement level (who would have even more of their curve cut off). Guys like Trout and Cabrera would lose no value (except to the injury chances), and you could calculate the built-in inflation by looking at how much value you were projecting for the below-replacement guys.
For example, there are about 9350 runs, 8800 RBI, 2300 HR, 1700 SB and 17800 hits to be had from above-replacement players in a 12 team yahoo standard league (which has 10 hitters and 8 pitchers per team), according to the PFM. If I go down to guys worth at least -$3, the totals increase to about 10000 R, 2450 HR, 9400 RBI, 1800 SB, and 19000 Hits, representing 10 more hitters (at an average of about 65/14/60/9/129 in 496 ABs). If I go up to >$1, I cut out 10 hitters, who perform at an average of about 68/16/64/13/129 in 496 ABs. Now a full analysis would know the real projected distribution of each stat for each player, but just as a wild guess, let's say that the difference in the average stats equals 1 standard deviation. With that assumption, we are saying that about 3 of the hitters in the sub-replacement 10 will out perform the 3 worst of the above-replacement 10. The level of performance will therefore be the average of the top 7 players in the first group + the top 3 in the second group (to a first approximation). So this ups the overall totals for this group by 12.5 R, 8.5 HR, 17.4 RBI, 13.4 SB, and 2.2 hits. That ranges from 1-6% for the counting stats, and a lot less for the hits (which is a sub for BA), but more importantly it told us that the ~7 positive dollars assigned to last 10 hitters should really be more like $4.90 (the 7/10 guys who won't be out-produced). Since you can't actually spend the extra $2 on guys beyond the roster size, it naturally gets spread to the rest of the guys in proportion to how good they are (with Trout and Cabrera getting more of it than anyone else).
I'll note here that I tossed in some numbers to make the math easy that aren't really good numbers to use - the +$1 / -$3 barriers were totally arbitrary, the assumption that they represented 1 standard deviation of performance is almost certainly not true, and treats the stats as composites, instead of looking at each stat individually is obviously a bad way to do it. But hopefully this outlines the general approach I'm talking about. Would be cool to see it applied to the PECOTA percentile performances, but I don't have any good way to do that without a whole lot of manual data entry.
These are great rules of thumb, but the question I have is basically, why isn't this something the PFM is trying to do?
Or at least, why isn't there a setting to have the PFM do this? These don't seem like intractable concepts, mathematically - if they are accurate, then they should be measurable and the PFM should be able to compensate for it. So really, what is happening here is that the PFM is an imperfect tool because it makes projections and then treats those projections the way it would treat end-of-season data (and even there it does it poorly, because it doesn't account for time on the DL vs. playing).
It would be really cool if the PFM could be upgraded to try to incorporate this type of information. I see 3 main steps to this:
1) Deal better with PT uncertainty.
a) Projecting dollar value based on PT estimates needs to incorporate whether or not that estimate is affected by injury chances, job competition, etc. Hanley was worth a lot last season while he was playing. If he's projected to miss 100 PAs this season, those 100 PAs should be added back in at replacement-level rates when calculating his value.
b) Someone who might or might not play due to competition is a different story. They are a case of basically %chance to play vs. expected performance when they do play. So again, their total should be $10 if they have a 50% chance to be worth $20 and 50% chance to be replaced (as opposed to taking their stats, dividing them by 2, and then asking how much that player is worth)
2) Improved replacement level calculations for fantasy.
a) Replacement level isn't just the expected performance of the best guy just outside the top N. That's the easiest thing to calculate, but it's really pretty wrong. About 30% of the value earned by teams in my league last year came from guys who were not on teams to start the year.
b) The better calculation is basically, what is the expected production of all PAs/IPs that will actually be had by players in lineups. This is not easy to get at, but a quick improvement is essentially accounting for the fact that if we look at a points league with 200 hitters active, and hitters #196-205 are projected at 415-385 points, the expected earnings of that bunch is not 400 points/slot. It's a lot higher, because the 5 guys who do the best will get the slots.
3) "Spend all your money"
a) This is obviously more nebulous, but the idea here is basically this: when the draft is done, I should be able to judge how I did by adding up the projections of my players, and comparing that to the projections of players on the other teams.
b) The main thing that could be incorporated here is while inflation might cause the value of the last good shortstop to skyrocket, and paying that value would be correct at that moment, the better play would be to keep yourself from being the guy who ends up paying that. Overpay $5 for Hanley, so you don't end up overpaying $10 for Segura or Desmond. If this isn't already handled by valuations, it needs to be (and if it is, then you should never pay more than the PFM estimate unless you think the PFM projection itself is faulty).
Can we apply this year's discount to next year's purchase at MLB.tv? It seems kind of annoying to have to cancel auto-renew in order to wait for the discount next time around.
Non-career-ending injuries and just general not making it is why the prospects go up and then down - that seems very realistic. Look through old top 101s and tell me how many of those guys had some decent years and then settled into mediocrity?
Career-ending injuries and other forms of culling explain the ticks back up in even later years - players who stick around that long tend to play well or they wouldn't be employed.
This post and some of the similar complaints sound to me like people on poker forums complaining about AA being projected to lose 20% of the time against 22.
Those projections look fairly reasonable to me if you take them for what they are: the performance of comparable players. Naively, what would you expect from guys knocking on the door? They should have a high chance of flameouts/career ending injuries/general disappointment, shouldn't they? Most prospects are risky. But take a 22 year old who is nearly major league ready, and project him to still be playing 4 years from now, what do you get? The numbers should tick up. Especially for an outfielder, since they get fewer shots to come back. The ones who are around at 26 tend to actually be good.
My problem with the projections, though, is exactly this factor. If there's a 60% chance a guy is putting up a .220 league-adjusted TAv in AAA in 4 years, and a 40% chance he's putting up a .300 TAv in the majors, then it's not really accurate to project him as a .252 TAv, and it won't coincide with scout projections. I know it's nice to have a single number, but in this case the average just really doesn't convey useful information. Upside seems like a better approach.
Maybe for a single number it would be better to use the median value among the comps? Weighted by similarity, perhaps, so you'd add up all the similarity scores in the top 100, divide by 2, then count down until you checked off that much similarity, and just use that player's performance in that years (in other words, re-order by performance in a given year, then count off the similarity).
Rios in the top 15 seems like a bit of a stretch - why do you think he's there ahead of guys like Upton, Puig, CarGo, or Jones?
I'd like to add to this with a request that future versions of the PFM have the option to report error bars for players. If it reported each separable prediction (perhaps playing time vs. averages, assuming that things like K%, BA, and SLG are not really separable) with its standard deviation, it would be easier to do analysis of risk.
The problem with applying the suggestions in this article is that humans make those "mistakes" with good reason - we know that we don't know everything, and that rare events are harder to predict. So if you tell me that I have a 5% chance of winning (or losing) $100k, it is hard to convince myself that you are estimating that chance properly (unless you specify a mechanism that makes it clear that the chance is precise, like rolling a d20). If you tell me that Risky Dude has a 50% chance of returning $50 and a 50% chance of returning $0, while Safe Dude has a 90% chance of returning $27, and a 10% chance of returning $7, then sure, mathematically these guys are the same, but do I really believe that Risky Dude's odds are well-understood? I'm going to assume that you are closer to correct in your prediction of Safe Dude than of Risky Dude.
Back to the PFM, though. If it is going to recommend a draft value, it seems like it would really be capable of running through the permutations of performance deviation, and assigning a %chance that so-and-so is one of the top-N players (where N is dictated by available slots in the league), instead of just relying on weighted means. This is a case where it really does matter, because the cut-off causes us to assign zero value to players whose expectation drops below the top N, but the reality is that a lot of the actual production across the season is generated by these zero-dollar players who get a lucky roll on their performance.
If you look at the top 10 1Bs, add up the homers they are projected to get, and then assign each of them a $/homer based on the overall percentage of homers projected at all positions, you are not making the right prediction, because, just as a simple example, if the #10 guy has a 25% chance of hitting 20 bombs, a 50% chance of hitting 15 bombs, and a 25% chance of hitting 10 bombs, you will factor him in as 15. If there are 10 guys just below him who have 15-24% chances of 20 bombs, 50% chances of 15 bombs, and 26-35% chance of 10 bombs, however, then the odds are extremely high that the #10 *slot* will produce 20 home runs, so all home run production is worth slightly less.
Given the potential shakiness of Olt, Castro, and Valbuena, how likely do you think it is that this infield has Baez at short, Bryant at third, and Alcantara at 2B by Aug/Sept? Does that only happen if they are out of contention?
Does Bryant end up in RF with Castro or Olt at third? What happens to Castro when Baez is ready? 2B?
History of the game != hall of fame. The history of the game has to include Pete Rose too.
I think character should be paramount: someone who brought major benefits to the game should be in regardless of stats. Someone who brought major controversy to the game should be out. It isn't the hall of stats.
I was a fan of Bonds before the PED controversy, but I find it very difficult to see his achievements with anything other than an asterisk, and his prominent place in a field of folks who really hurt the integrity of the game makes him an important symbol. The same is true for all potential hall of famers - the standard should be high.
And arguments like "Bonds would deserve entry based on what he did pre-steroids" are irrelevant - he's not being left out because we doubt his achievements. He's a no-doubt, top-tier HoF caliber player statistically given any reasonable "steroid adjustment". But the very fact that we have to question what the "real" record is across a bunch of the sport means that his impact, specifically, has been quite negative.
How different was the schedule back before the shift? If 4-man rotations were the norm, then why does the percentage of short-rest starts drift between 20 and 40 in the 50s-early 70s? I mean, there's still a clear transition in how often teams start a guy on short rest, but I'd expect to see percentages closer to 70 or 80 if a team stuck to a 4-man rotation today. Or does the data include so many pitchers who made only a few spot starts? I wonder what it would look like if you restricted it to pitchers who made at least 10 starts in a season.
Also, the injury/fatigue point of view seems like an important one. If the consensus favored the idea that a pitcher who threw more than 200 innings (or whatever number) in a year was more likely to get hurt that year or the next, then that would be a good reason to sacrifice a win to keep your ace healthy. Same with fatigue - the stats on how pitchers did on 3 days rest based on how many previous starts with 3 days rest they'd had that season are interesting, but I have two questions:
1) How did you calculate pitcher success based on # of three-day-rest starts? I assume that it was relative to the pitcher's own baselines for the year in a similar way to the overall analysis (in other words it isn't simply an effect of better pitchers getting more starts). If so, it seems pretty meaningful that there would be enough pitchers who did better on short rest to create that trend.
2) What does it look like if you do the same analysis, but using # of total starts? In other words, do pitchers wear down after a certain point and thus it is actually worthwhile to "keep them fresh" for the post-season?
Even there, though, how do you account for players with multiple eligibility? Also, is it really useful to say that replacement level is the value of the best guy who doesn't play and calculate value off that?
Just for a simple example, let's say that you've got 2 teams playing one hitter at each of 2 positions (C and F) and there are 5 available hitters at each position. All you care about is steals, with projected numbers for the Cs of 5, 3, 2, 2, 2 and for the Fs of 9, 7, 5, 3, 2. If you call "replacement" level 2 for C and 3 for F, and each team has $10 to spend, then you are saying that there's $20 to spend on 10 "steals above replacement", so each one is worth $2, and thus the value of the players are $6, $2, $0, $0, $0 for Cs and $8, $4, $0, $0, $0 for Fs. If you discover that the 3rd-best F is really projected at 6, does it make sense for that to reduce the value of the best F?
Now I know that's a contrived example, and the whole concept of replacement level is that it's the "readily available" talent level, but I think it's a fundamental flaw to assign so little value to the field. Almost no one drafted guys like Domonic Brown, Josh Donaldson, etc and projections had them solidly below replacement level entering the season, yet they put up huge numbers. I would argue, therefore, that they and any other players with a similar shot at a breakout (which is basically anyone near the top of the depth charts, in some sense) had value greater than 0 prior to the season. Their value if they were a bust was zero because no one would just toss them in the lineup and let them muddle through, but their value if they had a breakout was high, so the weighted average including the odds of those things was greater than 0.
PECOTA gives the weighted mean performance, but that isn't really helpful for fantasy - a player with values for 90/75/50/25/10 of $5/$0/-$2/-$4/-$8 isn't worth -$1.90, he's worth closer to $0.50 (though not exactly since you won't know in advance where his performance will end up, so there's some chance of missing upside or including downside when you end up playing him). The PFM should use the different percentile projections directly - at the very least, it could establish a replacement level based on the weighted means, and then evaluate each player relative to that level using his own percentiles (and then normalize so the cash values still add up).
Why is there no mention of the actual contract length/value? I mean, I can infer 3 years/$30 million from the rest of the article, but it would still be nice to see the number somewhere.
Great article! The interesting next step, of course, would be to look not at how many wins you get from the $57.8 million versus spending nothing, but what the incremental value of extra expenditures might be.
For example, looking at the numbers above seems to indicate that teams spend ~$73.4 million on 7+ year player salaries to get about 12 wins. How much more could be spent usefully on cost-controlled players? If you evened out the budgets by pulling ~8 million from older players and putting it into better scouting, better coaches, etc, would you add more than a win worth of cost-controlled players? At some point the pool has to dry up, right? The draft forces the talent to be spread out, and there's got to be a limit to how much better you can be at predicting and developing talent than the rest of the teams.
I liked that quote a lot too. You have to have a lot of confidence in your abilities to turn down first-round money in order to go out and get 1-1 money...
...but if you had serious #rig, you'd have your eye on all-star money.
Getting a no-hitter helps the team because there is more to baseball than wins and losses. Fans want a good experience, a chance to participate in something special. A pennant run is worth a lot even if you don't make it (unless you have a big lead and blow it, in which case it's a negative thing). No-hitters, All Star or Hone Run Derby appearances, hitting streaks, cone-from-behind or walk-off wins, and record-breaking or even near record-breaking performances all contribute to the value of the season for fans.
Sure it sucks when a star like Johan gets injured, but it's not like that performance changed the odds of injury from 0 to 1. The effect is relatively minor, statistically. The thing that would be interesting to look at it is if there's a measurable effect of these "irrelevant" events on things like ticket sales (that year and the next) or TV viewership, or team All Star votes, etc.
Is it me, or did you guys re-run the same question/answer from last week?
Loved this turn of phrase: "The Yankees aren't the Cardinals, who have a matryoshka doll of perfect prospects waiting to fill in for fallen major leaguers."
Why is Harper always so high on these lists? Is it just the tools and the fact that Trout exploded so suddenly or is everyone convinced he'll hit 50 homers next year for a good reason? I feel like there are several guys who have tons of tools, have produced better, and are more physically mature.
I mean, McCutchen for sure, Kemp unless you think he'll continue to be perpetually injured, Braun probably though screw that guy I wouldn't actually draft him, but what about guys like Stanton, Jones, Gomez, even Choo and Pence who have some 4-category juice? How far above his current performance is Harper projected at next year that he leap-frogs all these guys?
He has only played 3rd this year, so he won't be eligible.
No mention of where Reyes fits at SS, but he has to be top-4, right? Or do you take Segura over him because of injury risk outweighing regression risk?
Hehe, well I think the thing about sabermetrics that makes it what it is is the inherent amateur nature of it. As soon as it becomes an endeavor for professionals who derive money from the game, it becomes statistical work.
The thing I find odd about closing off the data is that it doesn't really help anyone - the teams would benefit from allowing the public to view and research the data, and MLB is strong enough to simply enforce an agreement between all teams to release the data (so there's no first-mover penalty where the last team to release its data gets an advantage against the others because it gets to see theirs first).
Take a look at scouting - there are plenty of scouts and scouting directors who all talk to each other and to press guys like Jason Parks because they get something out of it too - if they kept everything secret, everyone else would too and they would be worse off than the teams that were sharing info (though obviously not their hottest tips).
The one problem that amateurs will always be able to overcome in ways that pros can't is the problem of resources, implications, and associations. Amateurs can research what they want no matter how fruitless it seems, no matter what it might imply about the sport, and without regard for other associations. A team statistic department can't do that - it can't outright criticize other teams without risking a backlash that hurts the cooperation the team depends on for trading, it can't release research that shows its team's players suck or that a player the team wants is good (or at least, it has to control how it reveals information of this nature), and it can't go ahead and waste time on something "everyone knows" isn't true. All of these things hurt the progress and open blind spots.
No vetoes. The only time I would veto a trade is if both owners involved asked me to. If owners are colluding, they need to cut it out or leave, and while vetoing a specific trade might prevent a season from being ruined, it is useless without the agreement of the owners who made it that what they did was shady and should be avoided in the future. If they can't agree to that (or convince the league their motives are pure), the league is already ruined.
Unless this isn't the first time he aggravated it. If you raise someone who just raised you, you could of course simply say you will now raise, but more commonly you would emphasize the number of raises by saying re-raise
To play devil's advocate for a second, there are other considerations too - developmental resources, for one - how good can you be with each guy when you have too many struggling toolboxes? Teams are another factor - get a bunch of talented raw players on the same team and you might have a strong tidal effect with the struggles of the many pulling back against the success of the few. I don't know either of those things are certain to get in the way, but they might be considerations. Take a smart, driven player so he can set a good example and help along the raw guys.
I'd much rather have high ceiling guys, plenty of whom will make it to the majors even if they are busts than draft someone I'm confident will reach a lower ceiling. Lets be honest - how many guys even have a major league ceiling in any given year, let alone an ML floor. Sure you can save a lot if your bench and pen are home-grown, but it take a lot of Juan Lagareses to make up for a single Matt Kemp.
What was the distribution of guesses on each one?
I got 5/10. The hits I got right are described as:
- infield hit
- soft line drive to right field
- single between short and third
- double off base of wall
- line drive double over first base
The hits I got wrong were:
- infield hit, shortstop
- line drive single off glove of diving right fielder
- infield single dribbled down 3rd base line
- popped up single to right field
- line drive off Green Monster
So I say there was a slight bias towards me getting things right (I missed the harder-hit ball only twice), but the small sample induces too much luck. I don't think we can draw a solid conclusion from so few ABs, but it would be really cool if there was a way to automate the creation of these gifs and the collection of people's responses to see how they did with a variety of batters and a large sample.
All stats are arbitrary to some extent - even BBs and Ks. My problem with the QS as a roto number is that it's still too granular, especially in H2H leagues. It might be an improvement over wins (though your math here only adds up if all that extra value is being removed from somewhere - is it relievers or do hitters lose value because there are now more consistently dominant starters?), but GS probably would be too. Why not just use IP?
I'll say I got 7 of 10 - there's at least a few where I have a really strong feeling one way or the other.
I think it would be cool if there was a mirror feature to this one called the Drop List or whatever, where you discuss players that might be sitting on your bench or in your lineup who have been performing poorly and are now worth dropping for better guys. Basically, the top 20-25 guys who are more than 25% owned whose slots would be better used with one of the guys on this list.
Just curious why Moran isn't on your board. The PG mock draft has him going to the Sox assuming Frazier, Stewart, and Shipley are off the board.
I'm sure there was a lot of thought behind your initial list of 14 players, but I wonder if
a) you do/should take a step back at various points in the process to see what others are saying and consider whether there are players you should add to your mix
b) how much performance since your initial evaluations does / should impact whether new names get added to the list or old ones removed.
I don't think this argues against the idea of stars and scrubs, honestly, it just suggests that "scrubs" needs to mean $1-4 players, not just $1. If you spent $130 on 4 hitters at an ROI of .77, $72 on 3 pitchers at an ROI of .9, and the remaining $58 on 20 players in the $2-4 range (ROI 2.07) and 2 $1 players (ROI 3.75), you'd end up with a total value of $288.32 for your $260. That's a better return than spending $252 on 21 players in the $10-14 range (ROI .96) and getting 8 $1 players (ROI 3.75), which only totals 271.92.
Of course, you could also do $167 on 10 hitters in the $15-19 range (ROI .9), $87 on 13 pitchers and hitters in the $5-9 range (ROI 1.15) and 6 $1 players (ROI 3.75) for a total of $272.85.
I'm sure there's a set up in there that's more optimized, but just throwing together some naive breakdowns seems to support stars-and-scrubs as a plan. The reason is basically that you get a lot more ROI on <$4 players, so the more of them you have, the better, but you still need to spend the whole $260, because the ROI on unspent dollars is 0, which means you are better off getting some star players despite a poor return to just spend the money and making up for it with far better returns from your scrubs. It also gives you the roster flexibility to capitalize on the scrubs that pan out, instead of having great ROI players sit on your bench because you spent $12 on the guys at their positions.
Basically, what this data says to me is exactly what you'd expect: victory means finding the guys who outperform their expectations and only paying slightly more than the expected value in order to get them. It's obviously hard to find an undervalued $35 player, but the data says it's also hard to find undervalued guys that go for the average price.
I think the main reason is just salary inflation. PEDs made offense explode for a while, but the inflated salaries of the 90s convinced kids growing up at that time (and more importantly their parents) that spending the money and effort to make it is a good idea, and thus that having real professional training was a way to get ahead for a job that would pay huge dividends in the end.
Hmm, the Blue Jays draft makes me think they've decided that draftees are generally way underpaid and that dropping big money (relative to that underpaid-ness) for the best talent available is a good deal.
Might also be interesting to see if it has an effect on BABIP. In other words, does a pitcher who gets more swings and misses tend to get hitters to make weaker contact also?
Did you try to do a similar regression, but using vote share instead of just wins (or is that actually how you did it)?
The problem with doing away with the current revenue sharing in favor of a payroll-based subsidy is that payroll isn't the only thing teams need to spend money on. It's fine to argue that the team is only going to invest money when it believes that doing so will result in a positive return, but the argument implicitly assumes that the owners have a bottomless amount of money they are willing to risk on investments in their team. That just isn't necessarily true. A riskier or more marginally profitable bet becomes easier to take when there's more money to bet with.
Of course, that doesn't mean that's the most effective way to spend the revenue sharing money. It might be much better spent by using it to match investments in the team that include all the things the team should spend on to become competitive. If all those things receive the same amount of matching, then a team that isn't competitive will be encouraged to spend on the draft/player development, while a team that could be competitive is encouraged to spend on free agents. The problem is that in order to give people the whole picture we need to be showing the expenses as they actually are instead of just the major league payroll.
What I don't understand about the tools discussion is why not base your grades such that 80 means legendary, unheard-of tool regardless of which one? There's no reason to give 10-20 guys (or more) 80 "run" tools and not give anyone an 80 fielding or hit tool.
What if you thought about it like this: if a tool is an 80, then it is so good that the player can get to the majors with the rest of his tools maxing out at 40. That's kind of how it is with 80 power, for example - the guy doesn't need to run, doesn't need to play an position well, and doesn't need to get that many hits. Shouldn't that apply to the run tool also? A guy so fast that he maintains a .350+ BABIP can be a natural .240 hitter with sub-.100 ISO and below average defense at, say, 2B or even a 4th OF and still be a useful guy (as long as he can take some walks).
I don't know, maybe even with that as the idea there would be more guys that are just that fast because it is a bell-curve, but there's a limit to where getting even faster doesn't make you better at baseball, and you have to put 80 at that spot on the curve.
Just to clarify: I suppose it's possible there were multiple teams interested in taking these guys early and somehow keeping that interest secret, but does that seem likely?
Regarding Culver and Simpson, I think your point is good (that the players are probably a lot better than they were given credit for in the blogosphere before the draft). However, if no one was talking about them, and no other teams were on them, isn't it still a mistake, simply because they could have had a better player and then taken Simpson/Culver in a later round?
Maybe it really requires two things - a star rating and a tier ranking. The star rating can be an indicator of absolute team value, in other words, a championship caliber team of 20 starters needs to have, say, 2-3 5-star guys, 5-6 4-star guys and only 2-3 2-star or lower guys (I'm just throwing those numbers out there). Thus it helps you decide which of the "best available at position" players to grab when you have multiple positions to fill.
For the tiers, though, I would divorce them from the star rating, and set them up based on a much finer-grained equivalence between players. For every pair of players at the same position, there are three options - "I would always take A over B" "I would always take B over A" "Whether I took A or B would depend on other factors". You can display those options by sorting the players in to tiers where every pair within a tier falls into the third category, and everyone in a higher tier would always be selected before someone in a lower tier.
If you went this route, there's probably a point at which pairwise comparisons become tedious, so the bottom tier might just be a "here's the other options, with projections"
If the point of the stars is that equal production results in equal stars and positional scarcity exists only as a function of the quantity of top-star players at a position, how is it possible for the exact same player to get a different star rating at different positions?
In other words - what does the star rating translate to? If it translates to draft order, then it should completely factor in positional scarcity. If it translates to absolute impact on the team, then it should completely ignore position and VMar should be 3 stars everywhere. Am I missing something?
I think the point is that many people think holding out for an extra million or two tacked onto a $15 million deal is superfluous or stupid compared to getting his career started and striving for greatness. Not greatness for the sake of an even larger payday, but for the sake of the game. You may think that's dumb, but it's the reason that salary negotiations always leave a bad taste in the mouth of fans.
Strasburg is set for life either way, even if he never signs another deal, so why should anyone, particularly a Nats fan, have sympathy for his need to squeeze out every dollar? Sure, the owners don't deserve that money any more (and perhaps less) than Strasburg does, but this is exactly why a slotting system would be better - it avoids a silly game of chicken that sometimes ends up with everyone worse off.
Also, I'm pretty sure that Zimmerman, regardless of what you think of the exact figure he named, was simply saying that Strasburg needs to sign, not that he shouldn't negotiate the best deal he can get before signing. And he's right - if Strasburg goes back in the draft, there's a very good chance he will not get as much next year as if he takes a reasonable offer from the Nats (assuming they make one).
If a guy like Micah Owings can be used as a pinch hitter on a regular basis, doesn't it make sense to push Kelly as both a hitter and a pitcher, or is there just no chance that he could develop both skills and still fulfill his potential as a pitcher?
Free wireless at the ballpark and a netbook would be a good solution for someone who planned to see a lot of games but really wanted to stay informed. You don't have to spend much time looking at the computer to stay up on happenings around the league - there's plenty of time between half-innings for that and grabbing food, etc.
There's a lot more strategy in the NL - proponents of the DH always argue that pinch-hitting for the pitcher is an automatic move, so it adds no strategy, but not only is that not true (there are tons of situations where you have to decide how important a slightly increased chance of a run or two is vs. the value of the pitcher going a couple more innings and the wear and tear on the bullpen that pulling him would cause), it also is a narrow view of the difference.
In the NL, teams have an additional roster slot, which allows for more pinch-hitting, pinch-running, defensive substitutions, etc. Double-switching is also not an automatic decision, you can't just take out whoever made the last out and replace them with a bench player, especially in a close game.
There is also a significant effect on the game from having the pitcher bat - bunting is much more frequent, and sustaining a rally is much more difficult, therefore runs are more important and tactical choices such as the hit and run, steal or pitchout have more impact. Obviously all of those things are done in the AL even with the DH, they just aren't done as often.
If you really think the game should put the best offensive players out there all the time, why don't you believe it should put the best defensive players out there? I guarantee that if you allow teams to have 9-10 more roster slots and play with an offensive team and a defensive one, the quality of play overall would be vastly superior - the best hitters in the world would be in lineups, regardless of their ability to play at a position, and the best fielders would play the positions. I'm sure 99% of people who just read that thought it sounded like a terrible idea that would make the game into something that isn't baseball. Which is why the argument that it is stupid to watch a pitcher bat is simply a bad one. Yes, they get out frequently, but the times that they succeed (like Johan's butcher-boy double a couple starts ago) are that much more exciting.