Checking the Numbers: Houdini, Meet Jorge

June 3, 2009

In an attempt to bolster a pitching staff relying on the injury-prone Mike Hampton and John Thomson and the question marks that were Horacio Ramirez and Kyle Davies, the Atlanta Braves took a flyer on Jorge Sosa prior to the 2005 season, acquiring the righty from the (then-)Devil Rays in exchange for Nick Green. The Braves’ brass hoped that Leo Mazzone could work his magic on the young flamethrower who, though lacking a proven track record, possessed raw abilities capable of charming the pants off of pitching coaches. After a few months, the project seemed to be paying dividends; Sosa sustained a shiny, under-3.00 ERA. Problems lurked beneath the surface, however, primarily in the form of the number of baserunners Sosa allowed, and that he was deriving most of his success from stranding an inordinate number of them. His teammates came to embrace the quirkiness of his success, giving him the moniker “Houdini” for his ability to escape from situations with minuscule margins of error.

Sosa stranded 85.1 percent of his runners that season, a rate that ranks 28^th since 1954 among pitchers with at least 100 innings in a season, and fourth place since the 1999 season. In 2006, Houdini became the pawn in a David Copperfield vanishing act; Sosa was barely able to pull his weight in a severely decimated Braves bullpen, the team tired of his performance, and he was shipped to the Cardinals at the end of July. Since then, Sosa has bounced through the Mets and Nationals organizations, continuing to serve as a precautionary tale: how peripherals contingent upon luck-based indicators straying from the mean, as well as from the pitcher’s prior seasonal trends, can require due diligence and further investigation.

A pitcher’s Strand Rate, or LOB%, is one of several innovations from fantasy guru Ron Shandler that has since been beautified by The Hardball Times. The general formula of (H+BB+HBP–R)/(H+BB+HBP-(1.4×HR)) calculates the percentage of runners that fail to score based on events considered to be out of a pitcher’s control. Shandler found that the league average tended to hover around the 72 percent mark. Since 1954, the American League has averaged 71.2 percent with a standard deviation of just 1.6 percent. The National League deviated even less so, under one percent of the time from an average of 71.7 percent. Strand Rate, much like BABIP, adds another level of granularity capable of explaining ERA fluctuations, but is it stable? Can we count on a pitcher with a solid rate one year to hold steady over the next few seasons?

In the same year that Sosa took the senior circuit by storm, Dave Studeman ran a regression analysis on strand rates and found a year-to-year correlation of 0.28, roughly the same level of stability that was found to be inherent in home runs back when DIPS hit the web. Adversely, Zach Fein of Bleacher Report found next to no strength in a study of his own. Two years of data is not a large enough sample for me, especially when dealing with a potentially volatile statistic. Instead of running a standard year-to-year correlation, my statistical instrument of choice is an AR(1) Intra-Class Correlation, which I’ve mentioned in this space before. The ICC works similarly to other correlations, but it encompasses a larger time span; essentially, it does the same thing as a year-to-year correlation, and it also includes multiple seasons, not just two. Running the test across all qualifying pitchers from 2004-08 resulted in a correlation of 0.25, suggesting a relationship of slightly below moderate strength.

Not all pitchers are created equal, however, so it stands to reason that the better performers may exhibit more control over stranding runners. After all, higher quality pitchers consistently post top-notch ERAs, a stat that happens to share a -0.78 correlation with strand rate; as one goes up, the other goes down. The R-squared also suggests that 60 percent of the variability in ERA can be attributed to stranding prowess. Following this idea, I partitioned the sample of pitchers into three different groups based on their average FIP over the five-year span. FIP correlates to LOB% at -0.39, a weaker relationship than ERA, but it serves as a better predictor of future ERA and therefore makes sense as the partitioning barometer.

The average FIP of the group was 4.46, with a standard deviation of 0.63. Therefore, the three groupings will be <= 3.83 (greater than 1 SD from the mean), 3.84-4.46, and >=4.47. The results:


Type     ICC    R-Sq
Overall  .25    .063
Good     .44    .194
Medium   .15    .023
Bad      .16    .025

As expected, the Halladays and Johans of the world exert more influence over their strand rates. This does not automatically indicate that ace pitchers post rates that are well above average, but rather that they tend to stay more consistent to their own true talent level in this area. However, if a really solid pitcher puts up strand rates exceeding the league average, don’t be so quick to write off his numbers as superficial. I also ran correlations between strand rate and balls in play, but nothing surfaced. It made intuitive sense that higher line-drive percentages would hinder the ability to leave men on, given that liners result in hits 73 percent of the time, but the correlation dropped below 0.20, as it did with both grounders and fly balls.

How about some of the best and worst rates? Of those with 100-plus innings from 1954-2008, Wes Stock produced the highest rate of stranding runners at 91.9 percent, when he posted a 2.30 ERA and 3.36 FIP for the 1964 Orioles. For starting pitchers-Stock threw 113 2/3 frames in 64 relief outings that year-John Candelaria stranded 88.8 percent of his runners in 1977, less than two years after debuting in the big leagues. The top five rates for starters:


Name             Year  LOB%   ERA   FIP
John Candelaria  1977  88.8  2.34  3.99
Doc Gooden       1985  86.9  1.53  2.17
Pedro Martinez   2000  86.6  1.74  2.13
Billy Pierce     1955  86.6  1.97  2.93
Bob Gibson       1968  86.6  1.12  2.01

And some of the worst:


Name             Year  LOB%   ERA   FIP
Taylor Buchholz  2006  55.8  5.89  5.18
Nelson Briles    1970  56.3  6.24  4.60
Jim Abbott       1996  56.6  7.48  6.28
Billy Muffett    1961  56.7  5.67  4.76
Zane Smith       1995  56.9  5.61  3.82

Now, we have already established that the dominant starting pitchers can control their rates much more than others, but what about elite relievers? The industry’s standard argument states that relievers like Mariano Rivera are able to sustain lower BABIPs and influence other luck-based indicators more than the average relief corps member. Running the ICC among all pitchers with at least 15 relief outings from 2004-08, a very weak correlation of 0.06 emerges. Ultimately, with such small samples of stats accrued each season, many numbers, LOB% included, fluctuate.

When the closers are separated from this group-and I am loosely defining closers as anyone with at least 15 saves in a season-the correlation actually drops to -0.03, effectively debunking the conventional wisdom. I understand that the rather arbitrary cutoff of 15 saves could prove problematic in that some pitchers included in the sample would be nothing more than makeshift closers, so to assuage those concerns we’ll adjust the minimum to 25 saves. Increasing the number of saves also works as a selection-bias filter of sorts, since it tends to only incorporate the really solid closers; a pitcher is unlikely to close out games for an extended period of time if he remains ineffective. Unfortunately, even with the adjustment, the ICC barely rises to 0.04.

In case these numbers don’t make a compelling enough case, let’s try one more scenario: 15 or more saves with an ERA of 3.16 or lower (3.16 was the average ERA for this group). Running the ICC across all of these pitcher seasons from 2004-08 produced a correlation of -0.25, much stronger than before but still not akin to the level of strength evident in elite starters. All of this is not designed to suggest that all elite relievers and stud starters are immune to fluctuations, but rather that the grain of salt that we take with their strand rates should be much smaller than for everyone else. On top of that, a mediocre pitcher can become an ace, as Cliff Lee seems to be doing, and someone once considered an ace, like Dontrelle Willis, can fall off the map.

The goal here was merely to get across that LOB%, just like any other luck-based indicator that might affect the performance marks held near and dear to our hearts, must be treated as a support, and not as a deal breaker or as the basis of definitive claims. Most data comes with even more information beneath the surface, and understanding the roots helps to explain what can be found on stat-lines. Perhaps adrenaline that would sadly dissipate in the following year allowed Jorge Sosa to really amp up his pitch data with runners on base during 2005. Perhaps his mechanics were off course during his windup, and the flaws were automatically corrected while pitching out of the stretch. Or maybe his short-lived success really was nothing more than dumb luck, the type of performance that is bound to fall back to Earth given his lack of a track record and perceived level of talent.

Regardless, the point remains that investigating causation is much more valuable than taking everything presented at face value. Different types of pitchers follow different sets of rules, and should therefore be treated… well, differently. Strand rates can be particularly stable for certain groups of hurlers, which is very important to note given the evolution of strand rates and BABIP in web-based analyses. The stats seem to begin as ideas worthy of our skepticism before they evolve into arguable supports and then end up as context-free point-makers. Let’s all take a giant step back to the arguable supports stage, and try to actually understand what it is that the metric explains, and how it might potentially differ, from a regression standpoint, for certain types of pitchers.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Eric Seidman

Latest Articles

You need to be logged in to comment. Login or Subscribe

markpadden

6/03

Good stuff. I like the quick summary of existing work on the topic, and then your addition to it. A couple questions: did all three groups have roughly the sample number of IP/season per pitcher? That is, did the "Good" pitchers have more innings/season in their sample than the "Bad" pitchers? I ask bc, if so, the larger sample could account for some of the decreased volatility (vs. other pitchers) seen in their stand rates. I.e., you would be using more information for Good pitchers than you would for Bad pitchers.

On a similar note, is strand rate consistency more influenced by pitcher quality (FIP) or by pitcher consistency? That is, I would be interested in creating three buckets of pitchers: inconsistent, average, and consistent, and running the same kind of test as was run on Good, Medium and Bad. The issue is how to define "consistent." I might try using stdev of season FIP over the sample, so those Ps with the highest FIP stdev over the past several years would be labeled "Inconsistent," etc. Just trying to assess if a consistently mediocre pitcher is more likely to maintain strand rate than a mediocre pitcher who swings wildly from good to bad each season.

Thanks.

Reply to markpadden

EJSeidman

6/03

evo,

The IP/season is a good point, but I'm at work until 4 and cannot check that until around that time. I would venture a guess that yes, of course guys like Halladay, Johan, Carpenter, etc average more innings than Adam Eaton. But then it's a chicken/egg situation - is the decreased volatility because they log more innings, or do they log more innings because of the decreased volatility?

I will also look into the second paragraph, probably using stdev of FIP, finding the standard deviation of the whole group as well as the SD of the individual deviations; IE - if the SD of the whole group is something like +- 0.30, but the SD of individual SDs is +- 0.08, then we can partition the pitchers that way: SDs of 0.21 or lower would be consistent, 0.22-0.38 medium, and > 0.38 inconsistent.

Reply to EJSeidman

doncoffin

6/03

Shouldn't your "within one S.D. of the mean" group be
3.83 - 5.09? 4.46+/-0.63? Or did I miss something?

Reply to doncoffin

EJSeidman

6/03

No, I actually phrased that wrong. Even if we adjust it, the results stay virtually the same, though.

Reply to EJSeidman

molnar

6/03

Nice.

The "Halladays and Johans of the world" tend to miss a lot of bats. I wonder if you regressed year+1 Strand Rate on Strand Rate *and* K rate, whether the previous year's Strand Rate would still be significant.

a couple questions...
"roughly the same level of stability that was found to be inherent in home runs" - home runs per what?
"The National League deviated even less so, under one percent of the time" - time??

Reply to molnar

EJSeidman

6/03

Molnar,

When DIPS was first formulated, the walk rate, strikeout rate, and home run rate were found to be the only really stable components for a pitcher. Walks and strikeouts had a correlation around 0.50-0.65 if I recall correctly, and home runs were around the 0.30 mark.

As far as the NL deviating, as in the standard deviation of all years, in terms of strand rate, for the NL from 1954-2008 was under 1 percent. I can see where the confusion would arise from that. Didn't mean "of the time" per se but just that the SD for strand rate in the NL from 1954-2008 was under 1%, meaning it barely deviated from the mean.

Reply to EJSeidman

Oleoay

6/04

Eric,

Can you run similar correlation numbers between strand rate and K/9? But besides an overall strand rate vs overall k/9, I'd like a bit of elaboration too...

The average major league pitcher, apparently, has little control over strand rate. There is an average k/9 for an average major league pitcher as well... thus, I wonder if there's a level of k/9 performance when it starts affecting a pitcher's strand rate. For example, maybe a k/9 of 5.0 doesn't correlate well to strand rate, but a k/9 of 7.0 starts to have an effect. The reason I think the gradation is important is that someone who has a K/9 of 9 would, all things being equal, strike out one batter in an inning and thus reduce the opportunities available for a batted ball to be put into play.

I'm not sure how sustainable inducing double plays are either, but that might be another avenue to run correlations on.

Reply to Oleoay

EJSeidman

6/04

I had better respond or else I might not get Richard's vote ;-)...

I took all pitchers with 15+ games from 2004-08, ran correlations on overall K/9-LOB, and then broke it down into the segments, >= 5.0, >= 6.0, >= 7.0, etc, since that seems like a more accurate way than 5.00-5.99, 6.00-6.99.

Overall: 0.280
>= 5.0: 0.274
>= 6.0: 0.249
>= 7.0: 0.235
>= 8.0: 0.180
>= 9.0: 0.118

So it actually goes in reverse, when the minimum gets higher and higher. When we include all pitchers from 5.0 K/9 and up the correlation to LOB is almost identical to the overall correlation, which makes sense given that most MLB pitchers fan over 5 per 9, but when we increase the minimum to higher strikeout rates, the relationship with strand rate dwindles.

Reply to EJSeidman

Oleoay

6/04

Aw hey, you've gotten my vote since I first saw you writing here.

That's weird that it goes in reverse. It almost seems counter-intuitive, unless those pitchers with a low K rate were somehow forced to adapt their style to figure out some non-strikeout way to strand runners like inducing a double play.

I guess I mean that, if you had a low K rate and a low strand rate, you have a low chance of being a major league pitcher, so you have to figure out some way besides a strikeout to prevent runners on base from scoring.

Reply to Oleoay

Oleoay

6/04

Or maybe those with high k/9 rates tend to be flyball pitchers and so might give up more ISO?

Reply to Oleoay

EJSeidman

6/04

I agree with the counterintuitiveness but it really just goes to show there is more to stranding runners than strikeouts. For instance, as I wrote in the piece, FIP had a -0.39 correlation to LOB, whereas ERA was at -0.78, so the controllable skills seemingly have less to do with it. Then again, groundball rates also had little to do with it.

Reply to EJSeidman

Oleoay

6/04

If there was little correlation across the board between K/9 and strand rate, then wouldn't each of those correlation breakdowns be the same? Shouldn't it be something like .200 (as an example) for a K/9 of 5+ and for a K/9 of 9+?

I mean, if there is a stronger correlation between a lower K/9 and strand rate than a higher K/9 and strand rate, is there some way to identify what factor that might be that is causing the stronger correlation at the lower K/9 rate?

Reply to Oleoay

EJSeidman

6/04

No, the correlation breakdowns wouldn't be the same. For instance, when I run the correlation of the entire group, and then switch to just K/9 >= 5.0, the program doesn't know it's being compared to the entire group. It merely tests the strength of the new dataset.

The overall correlation was 0.28, which isn't that strong, but definitely suggests something is there in terms of as K/9 increases for this whole group, the LOB% increases.

However, as the K/9 minimum gets incorporated and increases, the sample sizes become smaller, and we see that the relationship is more random... that guys with K/9s above 9.0 could have high strand rates but there isn't anything there to suggest they always do or don't.

As for the last point, we can run regressions to see what sticks out but I personally think it's pointless because correlation doesn't equal causation. What if we run a regression and find that foul flyouts are a big factor? We don't even know if that's sustainable and yet we're supposed to treat it as the sole determining factor for strand rates. Know what I mean?

I'm personally satisfied with the conclusions found above, that higher quality pitchers, simply put, have more control over their strand rate and can sustain it more than others. And that these higher quality pitchers get to where they're going in different fashions, without necessarily a few unifying factors.

Reply to EJSeidman

fireorlime

6/04

"higher quality pitchers, simply put, have more control over their strand rate and can sustain it more than others"

I'm really over my head in this conversation so please forgive the question, but, are you saying that the higher quality pitchers will consistently have better strand rates, or that higher quality pitchers have a more consistent strand rate from year to year, or both?

Thanks.

Reply to fireorlime

EJSeidman

6/04

The second one... as the data I found showed, the higher quality pitchers had an ICC of 0.44, which is very significant and a much higher level of stability than anyone else. This DOES NOT mean, as I mentioned in the article, that they always post higher rates, but rather that they are more likely to sustain whatever rate they hover around. So if you see a guy like Halladay put up a 77% strand rate, he isn't an automatic lock to regress to the mean given that ace pitchers tend to be more stable year to year in their strand rate and therefore fluctuate/regress less.

Reply to EJSeidman

fireorlime

6/04

I get it now! Thanks Eric, very interesting.

I wonder if looking the opposite way would reveal any insights. If we just looked at the pitchers with the highest ICC we could say that they are the highest quality, and if perhaps one of those had a very high ERA despite the stable strand rate perhaps that would point to unluckiness in some other factor?

Sounds a bit roundabout but might it be a small tool to use when deciding whether or not to sign a FA pitcher.

EJSeidman

6/04

Then we get into all sorts of discussions about whether or not consistency matters. I wrote something here in either Feb or March looking at whether or not staying consistent in various stats (IE - using standard deviations) mattered with regards to overall production, and it didn't. A guy can be flaky and still be just as productive, overall, as the most consistent pitcher out there.

Your idea is interesting, though, reverse engineering. Find the most consistent pitchers with regards to strand rates via the ICC, however we would have to incorporate some form of baserunners allowed. For instance, a pitcher with a 77% strand rate for four straight years who also has a 1.50 WHIP is going to have worse numbers than a 72% strand rate for four years with a 1.20 WHIP.

Oleoay

6/04

I just had a thought.. a silly stupid thought... but maybe strand rate should be analyzed differently. Is the issue regarding pitcher quality really whether a run scores or not, or whether a pitcher is more or less likely to give up a series of hits/walks/HBP in a row?

A pitcher can in effect, walk the bases loaded then get three popups in a row and would have a better strand rate than someone who gave up a triple, a sac fly and got the last two batters to pop out.

If the "The general formula of (H+BB+HBP-R)/(H+BB+HBP-(1.4Ã—HR))" technically, the pitcher does not even need to get an out to have a strand rate of 1.

The difference, in my mind, is that a pitcher who gives up a lot of hits/walks/HBP in a row is more likely to "stack the deck" against his luck while a higher quality pitcher will reduce the number of baserunners, thus reducing the chances of luck being a factor.

Oleoay

6/04

You could even take it a step deeper.. if a pitcher primarily has control over their BB/K/HR rates and little control over what happens to a ball when it is in play, then runs scored/stranded is not as important as the number of opportunities a pitcher provides to a batter, relative to anoter pitcher... on the theory that the more opportunities a pitcher provides a series of batters, there'd be an increased chance that someone would score.

fireorlime

6/04

When you say consistent in various stats, over how large of a time period are you talking about? Is your claim that if you have 2 pitchers with equal ERAs, but Pitcher A is great one start and poop the next (2006 Daniel Cabrera), and Pitcher B is consistently decent, they are equally productive? And when we say productive what exactly are we talking about? Giving his team the most/best opportunities to win games?

You know what ignore me, I just looked up the article you speak of, http://www.baseballprospectus.com/article.php?articleid=8579

After reading my only question that remains is how are you measuring production/success/value in a pitcher? ERA? FIP?

EJSeidman

6/04

I believe I used SNVA (Support Neutral Value Added) in that article as the barometer of pitching success, since it's one of those catch-all win-based statistics.

fireorlime

6/05

I'm new to BP, is there a place that explains how SNVA is calculated? Like the actual equation?

Feel free to respond, "Corey, it is really comprehensive and complicated and you wouldn't understand it anyway with your tiny brain, don't even try."

I can accept my limitations.

EJSeidman

6/05

Corey,

From Michael Wolverton, himself, the guy who created SNVA:

"The Support-Neutral pitching stats are designed to measure the value of a start in terms of how much it adds or subtracts from the team's chance of winning. Using situational scoring tables and some basic laws of probability, I calculate the probabilities that a pitcher's start will lead to a W or an L for him, as well as a win or a loss for his team. When totaled over all of a pitcher's starts, that gives us the three SN measures:

* Support-Neutral Wins and Losses (SNW/SNL) -- a starter's expected W/L record, given the way he pitched in each game and assuming that he had league-average support from his offense and his bullpen.
* Support-Neutral Value Added (SNVA) -- the number of games the starter is worth to an average team in the standings, over (or under) what a league average starter is worth."

Oleoay

6/04

Eric, I follow what you mean and I do know correlation does not equal causation. I'm not a wiz at correlation though and the sample size explanation with fewer pitchers with great k/9 ratios makes sense.

I guess the thing I was trying to figure out is "What makes a higher quality pitcher a higher quality pitcher?" If a quality pitcher is someone who allows few runners to score, then how does he go about doing that? WHIP-type stats only tell so much, where something like your drill-down on Matt Cain provides detail about how his pitches and even his overall stat lines change. Then again, there are other high quality pitchers that don't strike out players... as you said, those high quality pitchers tend to get there in different fashions. Why does one person with an 88mph become Greg Maddux, another one a Russ Ortiz, and another one bounce back and forth betweeen AAA. Granted these are all rhetorical, and few people have problems identifying great pitchers... but it seems the description of what makes a pitcher the best is often done in qualitative terms. It would be nice if there was some kind of unifying factor that separates them out.

I guess I'll have to dive into FIP more to find out how it defines a higher quality pitcher.

Reply to Oleoay

EJSeidman

6/04

This seems like something PITCHf/x could help with in some due time. What makes one 88 mph guy Maddux and the other Ortiz? Probably a lot to do with location and movement, as well as selection. Everyone gets there differently. One guy might have pinpoint location and average movement, yet he achieves similar results as a guy with poor location but ridiculous movement.

Reply to EJSeidman

Oleoay

6/04

A month or so ago there was an article about Jamie Moyer's decline that I was saying begged for some pitch F/X data to see if his pitches are acting any differently than they used to. It was also based on the theory that if Moyer did not have much "stuff" even when he was doing well, his case might automatically control for some elements like pitcher velocity and narrow down what attributes of a pitcher's pitch actually relate to effectiveness.

Checking the Numbers: Houdini, Meet Jorge

Thank you for reading

Latest Articles

Next Man Up ’24: Week Three $

Fantasy Starting Pitching Planner ’24: Week Four $

speX ’24: Week Three $

Box Score Banter: Experiments in Takeout Slides B

Some Potential Answers for Pete Fairbanks $

Eric Seidman

Latest Articles

Next Man Up ’24: Week Three $

Fantasy Starting Pitching Planner ’24: Week Four $

speX ’24: Week Three $