June 25, 2009
Checking the Numbers
Much Ado About Liners
Line drives are the hardest-hit balls put in play, resulting in base knocks much more frequently than their fly-ball and ground-ball counterparts. In terms of exact figures, the liner/hit conversion rate averages out to 73 percent, roughly three times as often as balls beaten into the ground, and almost five times as often as those lofted in the air. When trying to estimate BABIP, using the expected values of success-73 percent for line drives, 24 percent for grounders, and 15 percent for fly balls-proves to be a more accurate methodology than the popular formula developed by Dave Studeman that adds .12 to the line-drive percentage of a player.
Using the expected values helps to differentiate between, say, bloop popups that fall in between fielders for doubles and the scorched liner that a third baseman snares on a dive, dealing with the probability of success instead of the actual results. Clearly, the latter example will add to hit totals at a higher rate than the former, something that should be accounted for in any sort of estimation or expectation formula. Several issues arise when discussing line drives, however, involving correlation to overall performance, classification and park factors, and plate discipline with regards to the batter/pitcher matchup; this last point was brought up in the comments section of our look at Pedro Feliz and other substantial jumps in on-base percentage over the years.
Relative to overall performance, it stands to reason that those belting liners at higher clips will post better triple-slash stats. After all, if a hitter hovers around the normal batting average and slugging marks of .733 and 1.013 on liners, while scorching the ball more frequently than other players, he is expected to experience more success on balls in play, potentially translating into better all-around performance. But are line-drive rates stable for players, staying relatively consistent from year to year? And piggybacking off of that question, for that matter are success rates on liners consistent?
I have discussed intra-class correlations in this space several times before; they work much like a year-to-year correlation, incorporating multiple seasons for each player as opposed to just two. Running the ICC first for overall line-drive rates from 2003-08 produced a correlation of 0.33, suggesting that the rates at which batters hit frozen ropes is of moderate stability from year to year. When batting averages on line drives were placed under the ICC lens, a measly correlation of 0.14 surfaced, indicating that the success rates are not necessarily consistent in the aggregate on a multi-year level. The 0.043 standard deviation provides that roughly two-thirds of the players queried fell between .687 and .773; 95 percent of the players would fall between LD-BAs of .644 and .816.
What happens when players deviate drastically from the mean? Many very astute analysts are quick to bifurcate levels of luck when discussing BABIP, but getting a little more granular can help explain the reasoning behind such shifts. Given the standard deviation of the data set, players soaring well past or falling well below their means in a specific ball-in-play area, instead of the overall mark, could justifiably be considered lucky or unlucky in that specific area. In other words, instead of commenting on a player's level of luck relative to his overall BABIP, dig deeper. Let me reiterate that I am in no way suggesting that success, or the lack thereof, on balls in play will always be the root cause of a player's struggles, but rather putting it out there as a potential cause worthy of investigation. For instance, take a look at Jimmy Rollins of the Phillies, who since 2003 has boasted LD-BAs ranging from .723-.766, equaling or besting the league average. Through Wednesday, Rollins had nosedived toward a .600 mark, well below the league as well as his own expected threshold.
Since 2003, among those with at least 75 line drives, only three players-Jason Kendall in 2007, Scott Podsednik in 2004, and Placido Polanco in 2004-have posted LD-BAs lesser than, or equal to, Rollins' current mark. It would seem that Rollins is certainly bound for some kind of regression over the remaining three months, but you can now see how few contemporaries have actually ended up this low in their success frequencies on liners. Another cause for concern with Rollins involves his .188/.214 BA/SLG on grounders in a .242/.263 league. From 2003-07, Rollins bested the league with a .252/.284 line, but plummeted last season to just .194/.211. Considering the virtually identical numbers from a year ago, the idea that he has been unlucky on grounders doesn't hold as much water.
Falling below the league averages and personal levels of performance on balls-in-play statistics is certainly noteworthy from an anecdotal standpoint, but don't fool yourself into thinking that if Rollins' LD-BA were adjusted to the .730 range, all of his issues would magically subside. In fact, if his .600 LD-BA rose to .730, and a few of the hits were credited as doubles, with such a small number of raw line drives this "early" in the season, Rollins would only see his line rise from .214/.257/.332 to something in the vicinity of .234/.281/.359, essentially the slash-line equivalent of putting lipstick on a pig. With regards to the line drives themselves, Rollins may very well be belting the ball right at fielders, but he also could be hitting weak or soft liners-balls classified as line drives based on their observed trajectory, but not necessarily limiting the reaction time of the fielder as much as your standard frozen rope. This serves as the perfect segue into a discussion of the classification issues with line drives and balls in play.
The balls put in play are classified by trained scorers either present at a particular game, or via videos. The one commonality between the various entities offering the data is the fact that classifications are assigned by humans. Though batted-ball classifications are certainly accurate, to err is human, and such designations are not without potential flaws. For instance, what do you do with "fliners"? Or scorched one-hoppers that seem to be on a line before briefly landing on the infield dirt en route to the outfield? Cases could easily be made for both sides of the coin in each of these scenarios, an aspect that might not prove problematic except for the substantially different success rates and expected values. Calling a line drive a fly ball may seem relatively harmless, but it can hinder BABIP estimators since liners go for hits 73 percent of the time, while fly balls do so at just a 15 percent clip. Additionally, even if we eventually agree that one-hoppers are grounders, they cannot possibly carry the same expected value as dribblers or weakly hit grounders. Each type of ball put in play consists of a few different, more in-depth, categories. To not differentiate would be akin to grouping together the bloop double that falls in and the scorched liner that also produces a two-bagger. Forgive my this sidetrack, but this is why the most accurate linear-weights systems, at least from a theoretical standpoint, would incorporate the type of ball put in play in addition to the end result.
Another potential flaw deals with the various different scorers across the country, though the stadiums themselves may share some of the blame. Brian Cartwright penned a brilliant piece a while ago in which he researched park factors on line drives. In its simplest form, Cartwright investigated the frequency of liners per all non-grounders at home and on the road via the matched pair; take all line drives and fly balls for the Astros and Pirates in each of their home parks, repeat for every matchup, and sum both sides, showing the number of line drives for the Astros and their opponents in Houston as well as all other parks. A startling conclusion stated that a ball was 18 percent less likely to be classified as a line drive in Houston than on the road. This does not imply that Astros hitters themselves hit fewer line drives, but rather that their balls in play were 18 percent more likely to be coded as liners on the road than at Minute Maid Park. At the Ballpark in Arlington, balls in play were 18 percent more likely to be classified as line drives.
These park factors could be attributed to those scoring the games, but we cannot ignore that certain ballparks may depress certain statistics. Weather, air pressure, field dimensions, and several other factors independent of the scoring may cause balls that would otherwise rope their way into the outfield to loft a bit more, looking more like a fly ball. These are not guaranteed reasons, but rather just ideas possibly capable of explaining the practically irrefutable park factors researched by Cartwright. With the advent of the Hit-f/x system-which I plan on exploring more over the course of this season and absorbing at the Pitch-f/x Summit in July-coding errors like this could eventually subside, as the spray angle, launch angle, and speed off of the bat could indicate the type of ball put in play.
Line drives and their assorted BIP cousins may also be contingent upon plate discipline. In the comments thread of our look at historical OBP spikes, commenter Sarah Gelles put forth the following idea:
I'm curious whether the BABIP improvement could also possibly stem from the BB% improvement-perhaps if the hitter is laying off of more bad pitches they're more likely to hit the pitches they do swing at hard. This, combined with the already-mentioned idea that pitchers might give the batter more pitches to hit when they realized he wasn't swinging at the bad stuff as much, could explain the sustained BABIP increase. Maybe you could look at the LD%? If the contact was harder because the ball was in the zone more, some GBs and FBs might become LDs, which would show up.
This very valid proposition proves quite difficult to investigate, especially with the lack of reliability on ball-in-play classifications dating before 2003 and the still relatively recent implementation of the Pitch-f/x data set. Unfortunately, for the purposes of quantifying aspects of the above suggestion, the 16 players that experienced a sharp spike in OBP and sustained the new rate in the following year, as shown last week, all did so prior to 1995, so no valid strike-zone or plate-discipline data is freely available, and the rates at which players hit liners in these seasons is not worth guessing at. Regardless, the theory holds plenty of validity, as the batter-pitcher matchup is one of constant adjustments. If a player has exhibited a strong propensity for swinging out of the zone, it would behoove the pitcher to deliver his offerings out of the zone. If our hypothetical hitter shows signs of improvement in terms of spitting at pitches "juuuuust a bit outside," then the pitcher must adjust and fire in the zone more often. Whereas contact made on the poorly placed pitches may result in a weak grounder to the left side, contact made on pitches in the strike zone, perhaps in a wheelhouse, could definitely lead to harder-hit liners.
The three major points of contention presented here can be summarized by saying that line drives, though moderately stable, do not always translate to better performance marks given the instability of LD-BA; classification errors are more costly than meet the eye considering the substantial discrepancy between the expected values of liners to fly balls and grounders, as well as the park factors present; and other strategic aspects evident in the game of baseball can lead to fluctuating line-drive rates. Those who hit the ball harder, limiting the reaction time of defenders, will be more likely to succeed, but not all hard-hit balls are line drives, and not all line drives are hard-hit balls.
A more uniform approach to classifying batted balls, such as the utilization of data in the Hit-f/x system, will help to reduce errors and increase the accuracy of expectation formulas, separating balls in play by how hard they were hit, as opposed to preselected buckets. Using the new batted-ball data this way can also aid in the eradication of luck-based analyses deeming Player X terribly lucky on the heels of a vastly above-average rate in certain areas. While someone like Jimmy Rollins may in fact be unlucky so far on his line drives, most of his liners may be hit rather weakly for all we know, cutting back on the likelihood that he should be near the league-average rate. The batted-ball data provided by Hit-f/x will allow analysts to further partition the LD/GB/FB buckets, determining expected values for weak vs. scorched liners, hard-hit vs. soft grounders, and other BIP contests on the card. Even more exciting is that this data may be available retroactively based on video archives used for Pitch-f/x purposes. The MLBAM systems are tremendous advances in the world of sabermetrics, and will provide various new roads of exploration in addition to scheduling some much-needed road work on those trodden on with great frequency. The Hit-f/x data set will take some time to get used to, just like its pitching sibling, but it will afford us with the information necessary to conduct some incredibly important and in-depth analyses.