May 31, 2006
From The Mailbag
Four Man Rotation, Upton, Rust BeltProspectus Q&A, Bill Geivett
Do you think the Rockies could pull off the four-man rotation? I think most any other team could, but since pitchers recover so slowly at that altitude and it takes so many pitches to get through a game at Coors I think the Rockies might be the one team that actually benefits from the five-man. Who knows, though?
Who knows indeed. I wonder if the tandem starter system might be worth a shot in Colorado. There's also the idea of slapping a tighter leash on a Colorado 4-man than you would a 4-man elsewhere, where you'd end up with more 5-inning, 75-80-pitch starts than you would otherwise. You could also supplement it by using something like the Sunday starter of old--a 5th man who steps in occasionally at the tail end of a series, when there have been few (if any) off days in the preceding couple weeks.
I admit that the 4-man question has become a bit of an obsession of mine, and it's one I ask a lot in Q&As (Baseball Between the Numbers also addresses the question in one of its chapters). I guess I'm just a believer in the idea that if something's not working, a team should try something else, even if it's viewed as a riskier tack than status quo.
I chose to ask you this question because your articles on Transactions Rules were so informative. Anyway, I was wondering how much playing time a player needs before the team 'starts the clock' on his arbitration status or whatever. I know the D-Rays don't want to start the clock on B.J. Upton...how much playing time can they give him this year?
Well, as soon as you're on a Major League Active Roster the clock is started (Active Roster is the 25-man from April through Sept 1, when being in the majors on the expanded roster starts to count, too).
When the clock starts, you accrue Major League Service (MLS) for every day that you're in the majors. That means that arbitration is on its way down the road. So if you're trying to plan your budget in 2009 you want to think a little bit about how expensive your guys will be when you get there. Will they be making the minimum, or will they be in arbitration? And if they're in arbitration, which year of arb are they in? The further you get in arbitration, the more expensive it gets.
Ideally a team would look even further out into the future, to figure out when guys become free agents and such, but that's pretty rare. There is too much uncertainty in baseball to forecast more than a corner of your Major League roster more than 2-3 years down the road.
One thing that teams ought to be more careful of is starting a service clock early enough in the year to make someone a Super-Two Arbitration Eligible player. Most players enter their first arbitration year, and their first big pay raise, after 3 years of MLS. By virtue of the Collective Bargaining Agreement, some players get into arb early if they have 2+ years of service, 86 days of service in the previous year, and a total MLS figure that is in the top 17% of 2+ players.
A Super-Two player is not only expensive for you a year early (see Dontrelle Willis), they also get a 4th arbitration hearing.
From 1990-2003 the cutoff for Super-Two players has been somewhere between two years, 128 days and two years, 153 days. In eight of those 14 years the cutoff was somewhere between 2.130 and 2.140. Upton already has 64 days of MLS, though, so bringing him up now would be too early if you're trying to avoid a Super-Two Arbitration. To be safe you'd probably want to wait till something like the second week of August.
You had an interesting article on baserunning. There is one variable that was omitted, and it is important. While I agree that runner advancement is very important, there are many times that a baserunner should make it from 1st to 3rd and second to home, and in fact, the extra base is conceded. It would be helpful to know how many runs were lost due to conservative baserunning. It would also be intriguing to know how many times a throw was made on the runner or batter/runner, and how many times the team succeeded or failed for each MLB team.
Well, the baserunning framework takes into account your first point since the comparison is made based on the average advancement given the set of variables (handedness of the hitter, ballpark, hit type, and fielder fielding the ball). As a result, conservative baserunning will show a cost, as it does for the Red Sox, who weren't thrown out very many times (just seven), but were second to last in the league in incremental runs (-12.09). The assumption of course is that balls in play with all those variables accounted for tend to be similar over time.
As to your second point, the play by play data that I have access to doesn't support attempted putouts as you describe, and instead records only when the throw either results in an out or an error. By itself, though, that wouldn't tell you too much since faster runners will induce fewer throws.
Hey Jay, like your work. And I agree about much of the silliness of interleague play. But I've got two minor quibbles, which I've tried to boil down as much as possible.
1. Cleveland fans probably have more bad blood and ill will toward Pittsburgh fans than toward Cincinnati fans. The Browns have been under the Steelers' thumb for a while now, and Steelers fans love to point that out--even if it means breaking out the Joey Porter jerseys in late May or early June for a trip to the ol' ballyard.
2. A "Rust Belt Rumble" would be a more appropriate name for a series between Cleveland and Pittsburgh. Cincinnati's top PECOTA comps for cities, if such things existed, would probably be Louisville and St. Louis. While Cleveland's would be Pittsburgh, Detroit and maybe Buffalo. Also the Queen City is about twice as far from Cleveland as the 'Burgh.
Anyway, keep up the good work. Am a big fan of the site. Looking forward to some good Marlins-Orioles-related snark in a couple weeks.
Your point about Cleveland/Pittsburgh vs. Cleveland/Cincy is well taken. I thought of tossing in some football reference to this past weekend's matchup, but my frame of NFL reference is closer to the original Browns than the expansion ones, and more Jack Lambert (or Jack Ham) than Joey Porter. The potential for a gaffe on that front kept me from forcing things. Wouldn't have been a bad idea to check an atlas though. D'oh!
I do like the idea of city PECOTAs, and I'll take that one up with Nate Silver.
Marlins-Orioles: it looks like the Jeff Conine Invitational to me. Loser has to start him at first base the rest of the year, perhaps.
I am an avid reader of your postseason standings report, and I believe it is the finest effort of its kind on the web. It should be required daily reading for all serious baseball fans.
Recently I moved back home to Pakistan. I have been trying to explain to my friends how baseball, more than any other sport, is conducive to mathematical analysis. Your work is an ideal platform for this.
I was trying to understand the logic behind your simulations in order to explain it to my friends here. After reading your published explanation, I am unclear how you derive the Expected Winning Percentages (EWP) used in each simulation.
(A) In each simulation, do you use a constant EWP for the remainder of the season? And is the EWP then normally distributed across various simulations? If so, how do you incorporate the drift that you mention to .500 (or to the PECOTA projection)?
(B) In each simulation, do you start with today's W3% and allow it to drift to .500 (or to the PECOTA projection)? If so, how do you capture variability between simulations? Is it by using a 'noise variable,' normally distributed and centered at zero?
Thanks for your time and congratulations once again on the keen insight you contribute to us baseball fans!
Thank you very much....or should I say, "bahut shukria"?
Each of the million simulations uses a different EWP, which remains constant within that particular simulation for the remainder of the season. The EWP used is determined by three things. Number 1 is the third-order winning pct; number 2 is the number of games played so far in the season, which is needed to determine how strongly to move #1 back towards the mean of .500. These two factors are the same for every run. The third variable is the what you might call a "noise" variable. We can define a (gaussian) normal distribution centered on the EWP derived so far, with a standard deviation that is dependent on games played. I use a random number to make a selection from the distribtion, and that is the value used as the EWP for that particular run. I do have to alter the distribution slightly for extreme events (EWPs outside the range of .250-.750), but those are rare events. Even for the Royals.
The PECOTA-based version is similar, except that I regress towards the PECOTA projection instead of .500. Otherwise the procedure is identical.
Thanks for your outstanding article regarding Albert Pujols and his place in history. The analysis and comparison over the various eras is remarkable.
I am curious how you'd extend and analyze the 1994 stats of Frank Thomas over a complete season ('94 was a strike year of course). As I recall he was among league leaders in just about everything but triples and stolen bases. How would his rankings compare to the best days of Willie Mays, Ken Griffey Jr., Pujols, Ted Williams, Joe DiMaggio, etc.?
Well, keep in mind that the '94 season did extend to Aug. 11 of that year before the season was cancelled. So you'd only be talking about adding another month and a half or so to Thomas's stats, which is not much when you're doing five-year comparisons. Also, since the players were ranked based on a rate statistic rather than a counting statistic, it's not a certainty that Thomas would have helped his ranking had that season been played in its entirety.
That said, I think your bigger point stands. Thomas was at the height of his powers then, and it was a real shame to see MLB wipe out the season that year. Take it from an Expos fan who remembers the best team in baseball in 1994--the wounds never fully heal.