Contrasting fantasy to real life from a sample size perspective.
I face a dilemma almost daily as a fantasy analyst, given my background in sabermetrics. I know that sample size is important--without a proper sample size, it's hard to take a player's recent performance (for better or worse) seriously. But fantasy baseball is a game that requires quick decisions--you aren't the only person wondering if a player is for real or not, and there is always someone more desperate than you are for any help they can get. We know when statistics matter for baseball analysis of the real thing--you can thank a certain Baseball Prospectus author for that information--but oftentimes (okay, all of the time) you don't have enough time to wait for the information you need for an informed decision.
For example, I wrote about how Jeff Francoeur was worth keeping an eye on due to a potentially newfound understanding of the strike zone. He's taking more walks even if he isn't taking more pitches, and it may just have to do with recognizing which pitches he should and can drive and which ones he should sit on. As far as regular baseball analysis goes, I'm more cautious towards Frenchy because we don't have the sample size to know if this is random variation or not, but as a fantasy analyst (and owner) I scooped up Francoeur off of waivers in one league in the hopes that his performance is for real. When you can't afford to wait for the necessary information, you have to learn to read what is available to you to the best of your ability. You're going to swing and miss like our friend Francoeur, but when you do connect, it's going to go a long way.
First-month stats don't give analysts much to analyze.
April really is a great time to be a baseball fan. Even in the worst case (say, being a Cubs fan and watching Carlos Zambrano getting lit up list a Christmas tree on Opening Day), having baseball is better than not having baseball. And April is truly a time when all baseball fans can have hope. Nobody’s been eliminated yet. Nobody’s even out of the race yet. Now, of course, some things are more likely than others—but that’s not what hope is about, is it?
So it’s a great time to be a fan. But it’s a horrible time to be a baseball analyst. (That's still a net win, really, as baseball analysts are generally also baseball fans.) Why do analysts suffer? Because there’s an expectation that since there’s baseball going on, and since one is a baseball analyst, one must, well, analyze that baseball. The thing is—there’s not really a whole lot one can say about a month’s worth of ballgames, at least in the way of useful analysis. There’s little we can know in April that we didn’t already know in March.
Which moundsmen has BP's projection tool missed with any regularity?
Several weeks ago in this space I took a look at batters that PECOTA has habitually overrated or underappreciated over a period of several seasons. Today I'll take a look at starting pitchers to see if we can identify those that continually flummox PECOTA by making a mockery of their pre-season forecasts year after year.
Who consistently surpasses or underwhelms their projections?
One of baseball's enduring charms is its ability to defy prediction. Each time we think we're absolutely sure of something-say, that the 2008 Tigers will score a bajillion runs, or Juan Pierre will be a disaster filling in for Manny Ramirez-our forecasts are confounded by baseball's eternally fickle nature. Sophisticated projection tools, such as Nate Silver's PECOTA, are designed to help take some of the guesswork out of predicting how teams and players will perform during a given season, and often produce surprisingly accurate forecasts on the whole. But even PECOTA is prone to big misses, especially in individual player projections, which help to preserve the game's air of mystery.
Has the perceived decrease in foul territory brought by the new stadium boom contributed to the surge in home runs over the past two decades?
Last time around, after discussing how the baseball itself may have changed in a manner that helped to boost home run rates over the past two decades, I took a look at the myth of the shrinking ballpark. To recap, the notion that the stadium construction boom that's taken place over the past 20 years has left us with a game full of bandboxes is actually a false one, at least when it comes to fence distances:
Jim cleans up some old business, ponders the all-time greats at second base, and tries to avoid throwing things at the TV set.
\nMathematically, leverage is based on the win expectancy work done by Keith Woolner in BP 2005, and is defined as the change in the probability of winning the game from scoring (or allowing) one additional run in the current game situation divided by the change in probability from scoring\n(or allowing) one run at the start of the game.';
xxxpxxxxx1160158525_18 = 'Adjusted Pitcher Wins. Thorn and Palmers method for calculating a starters value in wins. Included for comparison with SNVA. APW values here calculated using runs instead of earned runs.';
xxxpxxxxx1160158525_19 = 'Support Neutral Lineup-adjusted Value Added (SNVA adjusted for the MLVr of batters faced) per game pitched.';
xxxpxxxxx1160158525_20 = 'The number of double play opportunities (defined as less than two outs with runner(s) on first, first and second, or first second and third).';
xxxpxxxxx1160158525_21 = 'The percentage of double play opportunities turned into actual double plays by a pitcher or hitter.';
xxxpxxxxx1160158525_22 = 'Winning percentage. For teams, Win% is determined by dividing wins by games played. For pitchers, Win% is determined by dividing wins by total decisions. ';
xxxpxxxxx1160158525_23 = 'Expected winning percentage for the pitcher, based on how often\na pitcher with the same innings pitched and runs allowed in each individual\ngame earned a win or loss historically in the modern era (1972-present).';
xxxpxxxxx1160158525_24 = 'Attrition Rate is the percent chance that a hitters plate appearances or a pitchers opposing batters faced will decrease by at least 50% relative to his Baseline playing time forecast. Although it is generally a good indicator of the risk of injury, Attrition Rate will also capture seasons in which his playing time decreases due to poor performance or managerial decisions. ';
xxxpxxxxx1160158525_25 = 'Batting average (hitters) or batting average allowed (pitchers).';
xxxpxxxxx1160158525_26 = 'Average number of pitches per start.';
xxxpxxxxx1160158525_27 = 'Average Pitcher Abuse Points per game started.';
xxxpxxxxx1160158525_28 = 'Singles or singles allowed.';
xxxpxxxxx1160158525_29 = 'Batting average; hits divided by at-bats.';
xxxpxxxxx1160158525_30 = 'Percentage of pitches thrown for balls.';
xxxpxxxxx1160158525_31 = 'The Baseline forecast, although it does not appear here, is a crucial intermediate step in creating a players forecast. The Baseline developed based on the players previous three seasons of performance. Both major league and (translated) minor league performances are considered.