“You… can be a millionaire… and never pay taxes! You can be a millionaire… and never pay taxes! You say… ‘Steve… how can I be a millionaire… and never pay taxes?’ First… get a million dollars.”
–Steve Martin, Saturday Night Live, 1978
I can’t tell you how to be a millionaire without paying taxes, but I can tell you how to beat PECOTA without a computer model. First, get the PECOTA projections. Here, I will explain how you can beat PECOTA once you do.
Many of the people in your fantasy league have heard of PECOTA and use its projections while drafting. Certainly many of the people who win their leagues do. So if almost everyone has a mini version of Nate Silver with them on draft day, how can you get an edge?
The key is to stay one step ahead by knowing what PECOTA‘s strengths and weaknesses are. But let’s make one thing clear-PECOTA is very smart. It knows a lot of things that the naked eye does not. It goes through a hundred years worth of baseball players, finds the guys that are most similar to the player it wants to project, and generates a projection. You and I don’t have that kind of memory. However, we can incorporate some information that PECOTA can’t, and that give us an advantage over our competition.
WHAT PECOTA DOESN’T KNOW
A lot of my personal research has been on batted ball statistics, such as groundball, flyball, and line drive rates, as well as BABIP on each of these three types of batted balls. I have found that this information can be incredibly useful in projecting player performance, but here’s something you probably don’t know about PECOTA-as strong as it is, it does not use batted ball statistics. That’s not an error, but a sacrifice that had to be made. There are no records of batted ball statistics for the 1950’s or consistently measured statistics for the 1980’s, nor are there records of BABIP on groundballs, flyballs, and line drives. That type of information is only available from 2003 and on. To use that information, PECOTA would need to sacrifice choosing from hundreds of thousands of pre-2003 player-seasons to find comparables and would fail miserably.
However, you do have that information at your disposal, and you can use it to your advantage. If you find the more recent players in PECOTA‘s list of comparables for a particular player, you can check some of their basic statistics and compare them with the player in question. If you do this, you can improve on PECOTA, resulting in better player projections, and a better chance at winning your fantasy league.
In my recent research, I developed a quick and dirty method to project BABIP and projected 277 players for 2009. Of those with sufficient plate appearances, my quick and dirty method has a .45 correlation with BABIP so far this year, and PECOTA has a .42 correlation with BABIP this year. Though this is clearly too small a sample size to judge conclusively, it’s worth noting that I have done as well on that component using a model that includes no comparables, doesn’t adjust for age, doesn’t adjust for position, doesn’t adjust for handedness, and only uses data starting in 2003.
This isn’t to say you should use my model in place of PECOTA for BABIP. Rather, what you should do is use a hybrid of the two by looking at batted ball statistics to see where PECOTA is being tricked and can be adjusted.
EXAMPLES
-
Geovany Soto and BABIP on line drives, groundballs, and flyballs:
PECOTA‘s projected BABIP for Soto: .334
My projected BABIP for Soto: .306
Current 2009 BABIP for Soto: .270On the heels of Soto’s .337 BABIP in 2008, PECOTA projected him to repeat that number this year. However, what PECOTA doesn’t know is that last year Soto’s BABIP on line drives was .805, which is .087 points above the 2008 league-wide average. You might expect something like that out of a monster power hitter, but not Soto. In fact, line drive BABIP has far lower year-to-year correlation than groundball BABIP and flyball BABIP (.11 for LD-BABIP, .32 for GB–BABIP, and .32 for FB–BABIP), so we should expect Soto’s inflated BABIP on line drives to regress to the mean. If Soto had a league average LD-BABIP, he would have had a BABIP of .313. Comparing him to his most recent top comparable, Chris Shelton, shows that Shelton has a career .334 BABIP but due to a high line drive rate of 23.1% (Soto’s is 20.5%). Shelton does not have the high BABIP on line drives that Soto had last year (Shelton’s was .746 for the comparable year, and Soto’s was .805). PECOTA doesn’t know Soto and Shelton differ in this way, which explains why it thought Soto’s BABIP luck would stick when it was much more likely to disappear this year.
-
Jeff Francoeur and infield fly rate:
PECOTA‘s projected BABIP for Francoeur: .307
My projected BABIP for Francoeur: .286
Current 2009 BABIP for Francoeur: .271Franceour’s most recent comparables (the ones for whom there are infield fly data) are Paul Konerko and Torii Hunter. They have 11.9% and 13.9% career infield fly rates, respectively. Francoeur’s is 15.4%, which is far higher than the MLB average of about 10% and both Konerko’s and Hunter’s. PECOTA is right to compare him to players with above average infield fly rates, but even those players’ rates are not as high as Francoeur’s. It is extremely difficult to have a .307 BABIP when you hit so many infield flies, since they are so easy to catch. This means that you can scale back his projected average and his runs and RBIs as well.
-
Luis Castillo and groundball rate
PECOTA‘s projected BABIP for Castillo: .300
My projected BABIP for Castillo: .324
Current 2009 BABIP for Castillo: .322Castillo is an excellent example of the importance of groundball rate. For his career, Castillo’s groundball rate is 64%. That’s probably about 22% more grounders and 22% fewer flyballs than the average player. Statistically, groundballs are hits about 10% more than flyballs. This translates to roughly 2.2% more hits per balls in play, or .022 of BABIP. Castillo’s top two comparables are Mark McLemore and Tom Herr, who had 47% and 49% career groundball rates respectively but not 64% like Castillo. A higher groundball rate than his comparables means Castillo should outperform his PECOTA BABIP projection this year.
-
Michael Young and line drive rate
PECOTA‘s projected BABIP for Young: .323
My projected BABIP for Young: .342
Current 2009 BABIP for Young: .366Michael Young has a career line drive rate of 25.1% (MLB average is 20%). Looking at his comparables, we see that they do not have similar line drive rates. Jeff Cirillo‘s is 19.3%, Edgar Renteria‘s is 22.7%, and Mark Grudzielanek‘s is 23.6%. Having 5% more line drives than the average hitter, holding everything else constant, is going to lead to a .030 higher BABIP. Line drive rate has less persistence than most people think. It only has a correlation of about .17. Groundball/flyball ratio has a correlation of .77. However, if certain players are line drive rate standouts every year, then unless their comparables are too, PECOTA will underestimate them as they have with Young.
TAKEAWAYS
PECOTA projections are some of the best tools fantasy baseball players can use. But you can combine PECOTA and a little bit of extra knowledge to beat competitors that rely on PECOTA alone. The key is to know how the system is created and how its deficiencies can be exploited.
For hitters, there are a few key things to look for that can help you identify overestimated and underestimated projections.
-
Check baseball-reference.com to see whether a young player’s BABIP was high due to high BABIP on line drives or high BABIP on groundballs or flyballs. If a player’s BABIP on line drives was significantly different than .720, then he may be due for a regression to the mean, and PECOTA may not know this.
-
PECOTA doesn’t know a player’s infield fly rate. Go to fangraphs.com, and check it out. Average infield fly rate is about 10%. If his recent comparables have very different infield fly rates than him, that’s a sign his projection might be off.
-
PECOTA doesn’t know a player’s groundball rate. Check out if there is a major difference between his groundball rate and his comparables using fangraphs.com (or calculate it from baseball-reference.com).
-
PECOTA doesn’t know a player’s line drive rate. If a player consistently puts up high line drive rates and his comparables don’t, then PECOTA will probably underestimate him.
The Steve Martin quote at the beginning says you can be a millionaire and never pay taxes, but the catch is that you first need a million dollars. PECOTA is your million dollars in this case, and you already have it. But, PECOTA has taxes of its own-shortcomings due to limited information. With the tips here, you won’t need to pay them.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
Matt is becoming the first article I click on each week.
The use of the data available is a huge boon to fantasy players. Systems which don't use this data can be improved by using it. I thought this was a great piece, and I don't understand Will's visceral reaction to improving PECOTA.
Other factors I feel PECOTA is ignoring/undervaluing at its own peril:
1) Using pitch type/velocity/movement data to find player comps. more accurately.
2) Grading past performance as reliever vs. as starter on different scales (and doing the work to find out exactly what that adjustment should be).
3) Using strength of schedule. No credible team performance prediction system would ever consider making the assumption that all opponents are league-average. Why would PECOTA? Sure, it's more difficult; but it cannot really be that hard to look at the quality of opponents face for each season.
4) Dealing with platoon advantage more fairly. This ties in with #3, as players who are used abnormally more against benefically-sided opponents (LHPs facing mostly LHBs, or LHBs facing only RHPs) will get inflated rate stats. This needs to be accounted for in the schedule difficulty calc., so that a situational lefty who posts a 3.00 eqERA will be penalized to reflect what he would have posted vs. a normal mix of batters.
Apologies if any of these topics have been fully addressed already in the latest PECOTA. But to my knowledge, they have not; and I am not getting the feeling that anyone at BP is committed to improving the algorithm as aggressively as it needs to be done.
Mine was on the pitch data. You can massively improve projections with it. I have, and I feel like I'm just scratching the surface.
--JRM
Still, I agree with you that Matt did a great job--for my money, the best article of the round.
If you find them, please point me to where ideas #2 and #3 were addressed -- as well as #4. And more generally, please be more transparent with what exactly PECOTA uses as factors and how it uses them. The secret's been out of the bag for a while now. It's time to actively solicit ideas and tweaks from your readership (after first explaining to them what you are currently doing).
The fact that two of your concerns were addressed three and five years ago, respectively, should serve as evidence that Nate has been driving to make PECOTA better pretty darn aggressively. Your other complaints are somewhat ironic, given that you're commenting on a BP Idol thread--a contest where we've "actively solicit[ed] ideas" from our readers--and to an article that shows that readers who have done their homework have great tweaks to offer.
I do not have the 2006 book in storage, so I cannot check the specific adjustments to PECOTA you are referring to -- again, would be nice to have a detailed explanation avail. to subscribers in which the improvements/revisions made over the years were documented.
When I say "strength of schedule," I mean the quality of opponent and environment (park) each player actually experienced on a per-AB basis -- not a generic "the player played in this division, so we adjust him by this amount." E.g., if a Padre miraculously managed to play all 30 of his season PAs at Coors Field, your system would not account for this (to my knowledge). Furthermore, the link you quote refers to incorporating SoS in team win projections, which is not the topic of this discussion. [BTW, Silver's article *manually* applied SoS factors to the team projections in this article].
I.e., you have the data to know exactly whom each player was facing and where it was for every plate appearance. Why not use this to normalize the difficulty (for both pitcher and batter) of each PA? I recognize it is not a trivial procedure; but certainly something you should at the very least be in the process of developing presently.
As for handedness adjustments, the "Platoon Splits" section you refer to from BP 2008 is quite vague. It says that PECOTA now tries to estimate the handedness mix of the opponents a player is "likely" to face for the upcoming season, and adjusts the raw projections accordingly. It does not say that it evaluates on an AB-by-AB basis what handedness mix the player has faced for the past three years and how that is used to adjust his historical eq-Stats, if at all. Again, it's a matter of looking at what the specific difficulty of each PA (including handedness) and averaging all PA to come up with an adjustment, rather than making division-wide or league-wide assumptions as to the difficulty a player has faced.
I am certain you will (try to) correct me if I wrong. I am not a PECOTA historian, but rather someone who wants to see the most accurate baseball projection system possible. That is my sole motive. I am not sure what your motive is, and you appear more interested in playing defense than in looking for new ideas. And BP Idol clearly is *not* a solicitation of new ideas to improve PECOTA. It is a solicitation for articles and future employees. There is a big difference.
Also note that in this competition, the judges have looked favorably on what Matt did in regards to PECOTA and have praised other finalists for using non-BP metrics and concepts (assuming they presented them in a good fashion).
Sadly, your offer of "Apologies if any of these topics have been fully addressed" seems to have been disingenuous, as your increasingly belligerent replies have shown. I am heartened, however, that you finally admitted that there have been improvements to PECOTA every year, even if you're irked that those improvements weren't done exactly to your taste. The only thing I objected to was the implication that the good people in charge of PECOTA were not diligent in their work. As Matt showed above, it's possible to suggest improvements to a system without denigrating the efforts of those who have and continue to work hard to improve it.
But here's the thing. I've attended one SABR convention in my life, I mainly browse here, CNNSI and ESPN. I don't really know what other analysts are out there and what kind of work they are doing. But I read the Foreword of the BP Annual and the essays in back and see the changes or attempts at changes being made. If me, with my lack of a background, read about changes to BP's metrics in their publication essays/articles about what changes they have tried to make, then how can you criticize them when you didn't do your "homework"? Now, we can debate their success or failure at modeling and properly implementing those changes, but you seem to act like BP is the medieval Catholic Church resistant to input, change and inclined to dogmatic thinking. I'll say this, though... I've never had a complaint about the responsiveness of BP's authors (though Christina tends to get real busy and can take awhile sometimes, but she's an editor so she probably does a lot of responding), even to my silliest questions. I'm sure if you did your homework and presented a well-founded innovation or change that PECOTA needed, they'd listen.
Hey, I'd love it if they published the minutae of how PECOTA works, but it's their proprietary information and I'll have to live with it just like I live with the idea that I drink Coca-Cola without any idea of cocaine's still in their syrup recipe or not. And if I don't like the taste of Pepsi, it doesn't matter what's in the formula anyway. Moral of the story is if you don't like PECOTA, you can suggest changes or you can just not use it.
I enjoy reading articles from those who know what stats do tell and don't tell. I liked the article. More importantly, I now know that Matt is smart.
Thumbs up, with credibility in future rounds.
Recognizing FIP is important, but since it's not the best thing out there, BPro authors shouldn't feel a need to play to it, other than the fact that's it's pretty damn popular. tRA, from statcorner, should probably get more play everywhere, and BPro's own Quick ERA deserves more mention.
wOBA is a bit more accurate than EqA, although we can quibble about more accurate vs. don't sweat it.
I'll admit I don't keep up with stats, but a couple of the initial entries talked about FIP, so I went to Fangraphs and tried to get a grasp on it. As with many stats, I was left with a feeling of "Umm, so tell me smart guy, what's the result?" since that part's what I care about.
A great no-numbers explanation is available here (among other places, I don't think it appears at their site): http://www.beyondtheboxscore.com/2008/11/9/657217/tra-explained-sans-numbers
The explanation WITH numbers appears here: http://statcorner.com/tRAabout.html (also a glossary at the site)
What do you mean, what's the result? Like, what does FIP tell you and where would you use it? Remember DIPS? FIP is DIPS but with a really quick and easy formula.
Maybe it's just because I'm also studying economics, but (despite the two game theory pieces not really containing anything ground-breaking for me) Matt seems to be at a level of creativity in his analysis that's far ahead of the others. Maybe when I finish reading the articles this week I'll disagree, but I've been largely disappointed by the other 4 articles I've read so far this week. This, though, was good.
This was one of the last ones I read and one of the only that has something useful to say. I would read - and be willing to pay for - more writing like this.
What I loved wasn't that your droned on and on, but you took very specific examples and detailed the analysis. Each detail had a purpose and in the end of the piece, each reader understands why batted ball data is important, and how to make these determinations for themselves.
Perfect.
The takeaway section was golden! Regardless of whether I agree with Matt's assumptions and whether I thought he was cherry-picking players that happened to meet his projection system or not, he provided a great concise summary of how PECOTA tends to operate and how players tend to profile in PECOTA, then gives a short-hand list of indicators to look for that can affect the valuation of your players in a fantasy draft.
Matt, you started out good and keep getting better and better... thumbs up!
What's funny is that I was looking for an example of a player who's groundball rate was higher than his comparables, and settled on Luis Castillo despite the fact that his mid-week BABIP was closer to PECOTA's projection than to mine. It just was a good example. But then he had a good week and regressed towards what I suspect is his true BABIP skill. It's funny because I'm a huge Phillies fan, and naturally root hard against the Mets all the time, and I had this weird conflict of interest rooting for a player to help my individual goals but simultaneously against him to help my hometown team's goals...I guess that really DOES make this fantasy baseball week at BP Idol!
And yeah, you know you are a fantasy baseball player when you try to figure out how your real life team can win while also allowing your fantasy league team to rack up stats. :)
Although I will say that I live by the idea that "my Marlins are more important that my fantasy Marlins" if you know what I'm saying.
If not, I'd ALWAYS rather my real team succeed.
Is it scorer's error, or is it that the stadium has small foul ground and depresses flyballs as a result?
Responding to the BA article, I listed Rangers batters since beginning in 2003, and I think only one had more LDs on the road than at home. I still lean towards scorers bias, but it's probably a combination. I'm doing research on pitching at altitude, I can see if LD rate has any correlation.
When I compared parks I used LD% = LD/(LD+OFF). Smaller or larger foul territory should affect line drives and outfield flies equally, so that the ratio won't change.
1. Explain the state of the art
2. Explain how newly-available data extends the state of the art
3. Explain what you give up when you need to be able to look at a big chunk of baseball history on a uniform basis
4. Give people something they can *use*, now, to understand better or school their co-workers (or both).
I'd give this one 7 thumbs up if they'd let me.
I wouldn't see a problem, however, if he had critiqued PECOTA more. He seemed to go out of his way to assure us that its weaknesses were not really the fault of the analysis - just data limitations to the record keeping 50 years ago.
I enjoy this site a lot, but a weakness is that the conclusions of tools such as PECOTA are seen as facts instead of theories or opinions. And any divergencies of outcome from projections are too easily called "luck".
Nothing wrong with introspection, challenging conventional notions, and looking to improve.
This article takes what is a useful tool and tries to improve it.
Well done.
I published this:
http://www.thegoodphight.com/2009/1/16/726379/babip-projection-and-new-s
a little while later that I had been working on a while on
how to do predictive BABIP by batted ball type-- I had a formula for BABIP on line drives, groundballs, and flyballs individually. I introduced infield fly rate as the largest correlate for flyball BABIP, and homerun rate as the largest correlate for line drive BABIP, and infield hit rate as the largest correlate for groundball BABIP.
A couple weeks later, this article came out to do a predcitve model using infield fly rate, homeruns per flyball, and an improved term to proxy for infield hit rate using handedness and groundball rate. The model was quite predictive, using only one year of data. However, it was a regression developed using coefficients directly aimed to match their dataset, so it was bound to be a little more predictive than Marcel, which it was.
I wrote a few subsequent articles
http://www.thegoodphight.com/2009/2/2/743228/improving-babip-estimation
and
http://statspeak.net/2009/02/babip-projection-batted-ball-types-and-interaction-terms.html
putting all the components together now that I felt I had a better grasp from my previous article, and then the article that I linked to in this article. By breaking it down using correlates of individual batted ball types' BABIP or their BABIP directly, I get a more predictive result with higher R^2.
Their article was definitely a strong one and one I cited in several articles on BABIP since it got a few things out there about BABIP. One statistic I didn't have access to when doing this article was what they call "spray" which measures how well a hitter spreads the ball around the field (as opposed to pulls the ball more often). That's a very useful statistic and one that would have improved the model for players without several recent years of data. It wasn't as useful if you had at least three years of major league experience with 300 PA or more, when I manually recorded it on a smaller sample size, and suspect it's more useful for second and maybe third year players.
http://statspeak.net/2009/03/skills-repeatability-and-peripherals.html
In that article, I explained that there is certain information that can be obtained from more peripheral data which can be useful at predicting more typical statistics. A leading example in that article is strikeout rate and "contact rate" which is the percentage of pitches that a hitter does not miss when he swings. If you have only the previous years' strikeout rate, you can do an okay job of predicting a player's strikeout rate the following year. However, if you also know his contact rate (and hence how much he misses when he swings), you can predict it a little more accurately. If you know he also swings and misses more than other people with the same strikeout rate the year before, you know is he more likely to strikeout more the subsequent year. However, let's say you also know his strikeout rate from the year before that. That's actually more useful, and contact rate from the previous year is no longer very useful at all to predict strikeout rate. Simply, two years of direct strikeout rate data is better than one year of strikeout rate data combined with contact rate data from the same year. These scouting type peripheral statistics (like contact rate) are useful in filling in the gaps when you would like to have more useful information.
Consider the following as an easy example. You're considering signing Mark Teixeira or trading for Matt Weiters. Which move is the sabermetrician more useful with, and which move is the scout more useful with (at least relatively)? With Teixeira, he has years of data and he's young. To know how well he'll age, what his true ability level is, and how he'll play in your stadium, you value the sabermetrician. To know how Weiters will adjust to major league pitching, what his vulnerabilities are as a hitter, etc., you value the scout. Obviously it's better to have both, but relatively scouts are relatively more valuable with less statistical data available, and sabermetricians are relatively more valuable with more statistical data.
This leads into my discussion of spray rate. Knowing how well a hitter spreads the ball around the field is very useful for second year players. Ryan Howard has a ridiculous BABIP his first year and a half in the majors, but he pulls most of his balls in play (though surprisingly, not most of his homeruns). Knowing that he does not spray the ball around the field well would have been useful to predict his fall in BABIP during his second and third full seasons. However, knowing that information now is less valuable. In three and a half years of data, the effect of his tendency to pull the ball is reflected in his lower BABIP, specifically his lower BABIP on groundballs.
This effect seemed to hold true in analysis I ran when I had smaller samples than in this article. Using "spray" was useful in predicting BABIP for second year players, because there was already a lot of noise in their first year BABIP and a lot of adjustments that defenses would make as there was more information on the guy. Using "spray" was not useful in predicting BABIP for players with at least three previous years of BABIP because the effect of how frequently they pulled the ball was fully contained in the three years of BABIP data.
However, if you know K(T-2), that's more useful than C(T-1). In other words, if you try to predict K(T) using K(T-1) and K(T-2), you'll get more useful information than using K(T-1) and C(T-1). In fact, if you have K(T-2), K(T-1), and C(T-1), then you don't really even need C(T-1) at all.
I would guess to rephrase this is a bit that if C(T-1) has some of the aspects as BABIP and BABIP can fluctuate to an extent, then K/AB is a better predicter.
Fair statement?
*grumbles* I need some caffeine.
My thought was if strikeout rate is an indicator of future strikeout rate for batters, how well does strikeout rate correlate as a future indicator for pitchers? Also, if given only one year of a rookie pitcher's data (or, a rookie reliever), can something like BABIP, H/9 or some other form of "pitcher's contact rate" be constructive in projecting strikeout rates?
(Great article, Matt.)
But this was great. This is probably the class of the round. If Matt can write articles like this consistently, he needs a regular job somewhere.
Also, A+ for having the cahones to scrutinize PECOTA. Dangerous thin ice, but you ended up leaving us all with a warm fuzzy feeling after all. Well played.
What I am left wondering is: how significant are these nuances? We all have time constraints, we want to be able to decide if it is worth the added considerably effort of looking up BABIP on groundballs and line-drives, etc. to achieve this higher degree of accuracy. Is it worth checking for a consistently high line drive rate, but not bothering with the rest of the stuff? Would it be worth it to take lessons in VLOOKUP for this?
How reliable are reported line drive rates in the first place? That is a very subjective statistic, is it not?
I couldn't find BABIP on line-drives, etc, in Baseball-Reference. Could we get a more specific link and/or description of where to look, please?
Re: comments:
Where do you get spray data?
So, in predicting strikeout rates two years of strikeout rates are more significant than the more recent year's strikeout rate plus contact rate. I assume a higher strikeout rate in year-2 means that year-1 may have been fluky, so his strikeout rate should be in-between instead of presenting a trend downward. Correct? I am interested in finding when a trend is real or not.
Thus, try to use past strikeout rate as much as possible.
But if there's only one year of data because the player is a rookie, there might be sample size issues, so then use strikeout rate in conjunction with contact rate to project future strikeout rates.
To find BABIP by batted ball type: go to a player's statistics on baseball-reference.com. Then you will see a tab for "splits [+]" above his statistics, and if you run your mouse over this, it will let you select the splits in a player's career or a specific year of your choosing. One of the last few splits (I think it's the one before splits by opponent and splits by stadium) is splits by "hit trajectory."
The reported line drives are measured with error, but so are the line drives used in my regression so it's mostly okay...(A little regression background now, but those who are not interested can skip over this: when any independent variables are measured with error, the regression process will bias the coefficients towards zero and away from the accurate values. This is called "attenuation bias" or occasionally "regression dilution." In a situation like this, it's a shame, but there is not much I know how to do to correct for it. There are some methods that are used, but often I've been told it's best to just understand that the effects are understated.)
I can't really tell you how valuable it is to use this stuff. If you have a lot of value on your performance in fantasy baseball, I'm sure this will help. I think how intensely you work on this is a matter of preference. You can certainly sort players by line drive rate, groundball rate, and infield fly rate at fangraphs.com, and then check a few extreme outliers to compare them to their PECOTA comparables. Alternatively, if you're the type of person who picks out which players to target in advance in a draft, it's probably smart to check over the dozen or two dozen guys that you specifically want on your team. It's really up to you.
Of course, it is up to the individual how much time he spends on this. Sorting on those stats and looking for outliers as you suggest sounds worthwhile, but then as you point out one year of line drive BABIP isn't all that significant. I was trying to get a notion of much added improvement these checks would bring to PECOTA.
Ideally, Nate will consider them and test how and whether he should incorporate your ideas.
And by definiton, to succeed in the BPro Idol contest, you have to be popular with this audience. If you don't win, you weren't as popular with this audience (for whatever reason), and to a prospective employer, might not be seen as popular with certain similar audiences or niches... thus your ability to negotiate your own contract as well gets diminished somewhat.
Also, the American Idol comparison does not work as well either since the winner of American Idol signs an exclusive contract on content and those terms are not in this competition from what I've seen.
Using outside material is a strength.