In 2009, the Atlanta Braves as a team stole 58 bases and were caught 26 times, for a total of 84 stolen-base attempts. By some comparison, the Tampa Bay Rays stole a league-leading 194 bases (against 61 times caught stealing), meaning that the Rays successfully stole more than twice as many bases as the Braves attempted to steal. Because the manager is (generally) the one who gives the signal to steal or not to steal, by extension, we can assume that Rays manager Joe Maddon is an aggressive manager who "likes to run," while eternal Braves manager Bobby Cox is a more conservative gent. Or can we?

In the next few weeks, I will seek to profile the mind of the manager. I’m not out to evaluate managers and to figure out how much they affect their teams (yet). Instead, I’d rather take a look inside the mind of the manager to see how he operates. The reason is simple. Baseball can be a game of brute force, but it’s at its best when it’s a game of move and counter-move. The manager is the driving force behind the strategy that a team employs. To know his leanings is to be better able to predict what’s coming next. To know what’s coming next is to have a strategic advantage. I propose that we can create an effective psychological profile of a major-league manager by looking closely at his behavior.

To my knowledge, there are very few manager metrics out there. Even the manager stats kept here at Baseball Prospectus focus mostly on the idea of abusing starters and having to deal with bad bullpens. But how does a manager think? Are there some managers who are more aggressive than others? Do some like to tinker more than others? Can we quantify these differences? I believe that the answer is yes.

Most readers of Baseball Prospectus can quickly pick out the inherent problem in evaluating a manager. He can only work with the talent he’s given. If a manager has a bunch of fast guys who are always on base, we might expect him to try for more stolen bases than a manager who has been given a bunch of slowpokes who are rarely on first base to begin with. If you had a roster filled with Lou Brock, Rickey Henderson, Davey Lopes, Vince Coleman, and Kenny Lofton, you’d probably call for a few stolen bases too, even if you were overall reluctant to push the "run" button.

How do you get past this problem? We need some basis for comparison. It’s hard to know what another manager would do in the same situation, but it is possible to generate a good guess.

Warning: Gory methodological detail alert

First, let’s isolate some situations that might call for a stolen base. I took all instances from 2003-2009 in which a team had a runner on first and in which second base was un-occupied. Yes, this does eliminate any thefts of third (and home), and any double steals, but the majority of stolen base attempts are from first to second. I coded all of these events as yes (1) or no (0) as to whether a stolen base attempt was made. Whether the runner was safe was irrelevant (for now).

Since I have a binary outcome, I used a binary logit regression to predict the odds that a situation would have a SB attempt made in it. For those unfamiliar with the technique, since the outcome is binary, rather than continuous, the statistical program attempts to fit an equation that predicts how the independent variables will affect the chances of the dependent variable being "yes" vs. "no." So, it might say that, given this set of circumstances, the model believes that there is a 10 percent chance of the manager sending the runner.

As predictors, I used the inning (SB attempts tend to happen less in the middle of a game) as a categorical variable, with everything in the ninth and beyond grouped together. I also input the score differential (up by two? Down by three? Tied?), with everything beyond six runs grouped together. I also included the number of outs. For technical reasons, I also set these as categorical variables.

For speed, I used my own home brewed speed scores (I detailed my methodology for calculating those here), entered continuously. If you ever plan to do your own research, don’t use my speed scores. They’re a pain to calculate. I only used them because they’re mine, and I happened to have them handy (and because they’re slightly better than the classic Bill James formula.)

I asked my trusty laptop to save the chances that the runner would go for each situation. The resulting model tells me, given this set of circumstances (game state, speed of the runner), what the average manager in this sample would have done. I can then compare what each manager actually did to what the league-average prediction would have been for him. And I did. I created a simple ratio of actual SB attempts to predicted SB attempts.

The results

So, in 2009, who really was the most aggressive manager when it comes to stealing bases? Ladies and gentlemen… Bob Geren? Geren sent 166 percent of what a league-average manager would have done, outpacing Ozzie Guillen, who was in second place. What’s interesting to note is that Guillen, whose White Sox stole 113 bases (against 49 CS), was rated as more aggressive than Joe Maddon (third place), despite calling for 95 fewer stolen bases than Maddon. Guillen had a slower team to work with, while Maddon had Carl Crawford and B.J. Upton. The model corrects for this bias and shows Guillen to be the aggressive manager that his reputation suggests he is.

On the other side of the coin, Jim Leyland was the most reluctant to try to steal, followed by Don Wakamatsu and Fredi Gonzalez, again relative to what the league would be likely to do in the situations those men faced. What about Bobby Cox? Actually, Cox rated on the aggressive side, sending 108 percent of what the league average model would have expected of him. Cox’s Braves were one of the slower teams in MLB in 2009.

Five Most Aggressive Managers        Five Most Conservative Managers
Bob Geren      166% of expectation   Jim Leyland      70% of expectation
Ozzie Guillen  143%                  Don Wakamatsu    74%
Joe Maddon     136%                  Fredi Gonzalez   81%
Mike Scioscia  134%                  A.J. Hinch       82%
Clint Hurdle   126%                  Ken Macha        83%

I also looked at whether this ratio showed any year-to-year consistency. Do managers keep the same level of aggressiveness from year to year? To test this, I used one of my favorite techniques, the AR(1) intra-class correlation. It’s somewhat like the year-to-year correlation, but it enables the inclusion of more than just two time points. It can be read, however, like any old correlation. Over the seven years in the study, the ICC was a nifty .538. (Sounds like a website.) So, managers are moderately consistent over time in how aggressive they are in ordering the stolen base.

Where to go from here

This work is part one of several. In the next few weeks, I’ll be looking at various things that managers actually do, whether their players actually successfully carry out their orders. Eventually, I’ll attempt to distill it down to a few dimensions of behavior on which we can rate the managers. Stay tuned. This ought to be fun.

Russell A. Carleton, the writer formerly known as 'Pizza Cutter,' is a contributor to Baseball Prospectus. He can be reached here.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
Intriguing! Would be interesting to also look at player/manager pairs, too. Over a player's career, you could look at how often a player ran with second base open and how that stat varied, depending on who was managing the game. Aggregate that for each manager and you get a much more precise measure of managerial aggression.

Track the percentage of successful steals at the same time and you get a measure of how effective each manager was in managing baserunners. In fact, you'd end up with a great chart showing aggressive/passive on one axis and successful/unsuccessful on the other.
This comment is not just for this piece, but many on BP. I'd like to see the entire list of MLB managers, rather than only the most- and least-aggressive 5. Two reasons. First, I want to see how my team's manager falls, which I think is something most BP subscribers would like. I'm working off an assumption that the vast majority of readers are not only interested in advanced metrics, but also have a rooting interest in a team and share my disappointment when they can't put their team in context of a well thought-out piece such as this. Second, beyond rooting interest I think it would be nice to see what kind of clusters there are, and any league-wide tendencies that are not represented merely by saying: here's my method, here's the most- and least- or bottom- and top-anything.

Agreed, I'd like to see the full list too.
I usually cut the whole list for space because the extremes are the more interesting ones to talk and read about. However, since several of you asked:

Managers listed in order of aggressiveness, with percentage of model expectation.

B. Geren 1.66
O. Guillen 1.43
J. Maddon 1.36
M. Scioscia 1.34
C. Hurdle 1.26
T. Francona 1.21
C. Cooper 1.19
J. Riggleman 1.18
J. Tracy 1.15
E. Wedge 1.14
B. Melvin 1.12
J. Torre 1.10
B. Black 1.08
B. Cox 1.08
R. Washington 1.06
T. Hillman 1.04
D. Baker 1.04
J. Manuel 1.03
D. Tremblay 1.03
J. Russell 1.02
M. Acta 1.01
J. Girardi .99
T. LaRussa .98
L. Piniella .97
B. Bochy .94
C. Manuel .94
R. Gardenhire .89
C. Gaston .86
K. Macha .83
A.J. Hinch .82
F. Gonzalez .81
D. Wakamatsu .74
J. Leyland .70
I agree that the extremes are the most interesting, but they can carry a bit more weight when we put them in the context of the whole league and how the managers' results are distributed. Awful lot of names fall within 10% +/- the norm of 1.00. That makes Geren's approach stand out even more in the context of a league where most managers are hovering near the norm of aggressiveness.
It's quite surprising to see a manager like Manny Acta, who has a deserved reputation for being forward-thinking, have nearly the same score as Dave Trembley, who seemed to be constantly running the Orioles out of innings. I can't help but feel like we're missing part of the story; I'd be interested to see this score combined with hit-and-run frequency to give a broader picture of aggressiveness (it seemed like Aubrey Huff was making desultory slides into second base after failed hit and runs for about two weeks straight).
Not directed just to you, but to all of BP like the original comment was, but rather than "cut the whole list" why not use the power of the internet to allow the whole list to be accessible a click away (either in line with some dynamic HTML scripting or as a separate data page linked and clickable from the main article). That would allow the best of both worlds with the data tables not overwhelming the piece or breaking up the prose too much, but still letting those of us who wanted to see where our manager (6th least agressive Cito Gaston) fell in the mix when they weren't in the top or bottom 5.
I agree... just throw the whole list up there next time.
Great piece. I'd love to see how Gardenhire and the rest of the middle of the pack rate.
Great stuff, Russell! To mirror John's comment, I'd like to see how all the managers rank against each other. I'm particularly interested in Riggleman and Trembley.
Very promising work. I'd add a couple more dependent variables having to do with how good (or bad) the opposing pitcher and cather are at controlling the running game. Surely, a good manager takes these into account.
True in theory, hard to quantify. Not that I won't try.
Btw, I'm not surprised Bobby Cox ranks as above-averge in aggressiveness. The Braves have just been a slow team in recent years. But Cox has always been willing to take some chances. Great manager, with the exception of bullpen management. But that's another topic.
I agree, people forget, but a quick thumb through of the Bill James handbook shows Cox actually led the league for a couple years in calling for SB. Your talking about a team that used Marcus Giles and Kelly Johnson at lead off for three years.
How do you plan to deal with the "green light" players? I know that Terry Francona, for example, has given Jacoby Ellsbury the green light to run whenever he wants to. So a fairly large percentage of Boston's attempts are a result of Ellsbury's judgment rather than Francona's. (Well, obviously, it's Francona's judgment to give the green light, but the individual attempts are frequently not his situational decision.)
Actually, I'd say that's the manager's fault more than anything. Francona, instead of making the decision in the moment, makes the decision ahead of time. Either way, he's still giving the green light to steal.
FWIW, the distribution of outcomes is nearly, but not quite, normal. Managers seem to be bunched more tightly around the mean than in a normal distribution, but the distribution around the mean is extraordinarly symmetric. Also, the arithmetic mean of the manager tendencies is not 1.0 (almost certainly because different managers have different numbers of attempts)--it's 1.060606... The standard devation is 0.022.

I'd upload the chart, but that doesn't seem to be an option here..

The standard of comparison is all managers from 2003-2009. The league was a little bit more aggressive last year than in previous years.
You seem to be missing the link to your speed score calculations. Would be helpful to determine how important steals are in the that calculation, since using steals to predict steals dilutes the effectiveness of the measure.
1/18 It was in SABR's "By the Numbers" newsletter.
How confident are you that we can measure a player's speed through inferring it from statistics? That's always a difficult problem.
Great read! Looking forward to the next installment.
any worries as entering outs as categorical rather than dummy variables? I'd worry that 2 outs, man on 1st is different that 0 & 1, so the assumptions behind categorical might fail.
My program actually takes variables like that and dummy codes them. But good catch. Someone was reading closely! Ten extra credit points.
Pizza, we may have discussed this before in another venue, but since "r" is always a function of (the underlying) sample size (not the number of pairs in the regression), in your intra-class correlations, how do we/you know the sample size associated with your "r"? For example, if I were working with the same data you are, and I regressed first half on second half, I might get an "r" of .4, if I regressed one whole year on another year, I might get an "r" of .5 or .6, if I regeressed 5 years of manager data on another 5 years, I might get .8, etc. In this instance, you mention that the "r" was .538. Without knowing how many games (or steal opportunities or whatever the "unit" is) that represents, I have no idea whether .538 is "consistent" or not.
.538 doesn't refer to the r^2 of the logit regression, though. If I'm understanding the grouping decision correctly, that value (the ICC) is calculated as the ratio of the variance across managers to the sum of the variance across managers and the variance of managers over time. .538 means that (variance of managers) = .538*(variance of managers + variance over time), or that the variance between managers is equal to roughly 1.16 times the variance of a randomly selected manager over time, meaning managers are relatively more consistent over time than they are across individuals. I'm not as familiar with ICC as others (Eric and Russell both, for sure), but it seems that if sample size entered the equations for estimated variance it wouldn't have much of an effect.
Mr. Solow's response is mostly right. ICC is a measure of consistency across the years. I did toss out most of the interim managers who only had a few games at the helm when I ran that ICC, specifically for sample size reasons. (He had to call for at least 50 SB attempts.)

Think of ICC like year-to-year. If I only had five observations per year, then I'd probably get a lot of random variation and so not a lot of consistency within managers over the years. My choice of inclusion cutoff was somewhat arbitrary, but based more on the realities of what we're observing. We look at managers based on the season-to-season level, so I evaluated them as such.
"If I only had five observations per year, then I'd probably get a lot of random variation and so not a lot of consistency within managers over the years."

Do you mean managers with 5 SB opportunities or 5 managers per year? I am talking about the former, of course, when I am talking about sample size. The number of observations will NOT affect the correlations, only the standard error.

You always say, "Think of an ICC as like a y-t-y correlation." But, as I originally said, the magnitude of a y-t-y correlation specifically depends on the number of "opportunities" in each year and without knowing that number, it means nothing. If I regress OBP on OBP from one year to the next, and I only include players with 100 or less PA each year, I might get a correlation of .25. If I only include players with PA greater than 400, I might get .60. So just saying, "My y-t-y 'r' for OBP was .5" means nothing unless I know the number of PA per year in my sample. (It is also nice to know the number of players or "observations" as that will help me to figure my standard error around the correlation.)

So if I have bunch of players in a bunch of years, and you tell me the ICC for OBP, again, that means nothing to me unless I know the range or distribution of PA in the sample, right?

Maybe I have it wrong. Maybe the ICC is sort of a combination of "r," as when we do a y-t-y "r" and the underling sample size. For example, if you have a bunch of players with samples of 400 PA and you do an ICC for OBP and you have a bunch of players with samples of only 100 PA, will you come up with the same ICC?
The magnitude of a year to year correlation does NOT necessarily depend on the sample size either over time or within a given year. Your estimate of the population correlation may be more accurate, but the value of that estimate is not a function of sample size. There's some noise in these estimates, which means increasing sample size is always a good thing, but as long as there's enough sample that the law of large numbers holds, you're probably pretty safe.
I meant 5 SB opportunities as well. I think we're on the same page methodologically. You are correct in that the number of PA/BF/opportunities can affect ICC, much in the same way that it would affect yty. However, as Mr. Solow points out, so long as you set your inclusion criteria high enough, it's not going to make a big differnence. In this case, I actually upped the criteria a bit and didn't get much improvement in ICC. It's something of an asymptotic relationship.

In this particular case, there are two different questions that one can ask. One is, "How reliable is this stat year to year?" (which I chose to ask, .538) The other is "How many PA/BF/opps does it take before this stat becomes reliable?" I haven't run that one yet.
Speed score link's not there.
Ah. See it now.
How does OBP correlate with all this? If a team has more baserunners than league average, are they more likely to try stealing a base? Similarly, how does SLG correlate with these numbers?
Not sure. Brian's comment below develops this same idea. I think you've both hit on something interesting.
Something that's not being measured but may have a important influence on the number of steal attempts is the run environment. How many runs can the team be expected to score with any steals? What is the skill level of the current batter at driving in runs?

Steals are best leveraged when you have hitters coming up who are good at driving in runners from scoring position, but not so good at driving them in from first - high BA, low ISO.

I did an article for the Idol competition that showed how league rates of steal attempts varied inversely with the overall performance of the league's batters.

Just another thing to complicate your model!
Ah, more variables. Actually, this is a good point. I've got a lot on my plate this week, but this one might be worth a look.
I understand that, for example, teams managed by Bob Geren historically have attempted 66% more steals than the 2003-09 average after controlling for the speed of the runner and other variables. However, can we do better than that in shaping our expectation going forward. He's got a smaller sample size than someone who managed through 2003-09, so we should expect he's got a greater chance to be an underlier in either direction than some of the others. Can you look at persistancy of the manager's tendency and then determine how to regress it to average to better show our best estimates of each manager's true tendency?

Second comment: I've had the impression that first year managers call for more steals on average than other managers. Is this true?

Third comment: Is there a tendency for some managers to steal more but only with their fastest players while other managers order my steal attempts throughout the line-up? I've thought that Ozzie Guillen steals a lot with just certain players for example but that Mike Scioscia will steal with guys throughout the line-up. Can you identify (or disprove) that tendency with your statistics?

Fourth comment: Is there a point of diminishing returns where ordering too many steals leads to lower SB% success? I would guess maybe, but it'd be an awfully weak relationship.

Excellent research, by the way.
Chiming in late, but hopefully this is still being read. This is in line with some of the comments above asking for the full data, but I've always found it a little frustrating when an interesting new metric is discussed in an article on the site and then basically left behind.

I'd love to see Adjusted Green Light Rate (or whatever far more catchy name you wanted to give it) incorporated into BP's panoply of statistical reports as soon as it's invented. Having already put together the code to calculate the metric on your own computer, it shouldn't be so onerous to translate (or have a DBA translate?) it for use on the site, right?

To take another example, wouldn't it be great to see JAWS be a part of the site's statistical reports? It'd be fun to monitor a JAWS list to see exactly the date that Joe Mauer becomes a Hall of Famer. (JAWS is of course trivial to calculate on one's own, but who wants to check in every day to grab the most updated WARP scores and redo that calculation? That's what the statistical reports' update scripts are for, surely.)