CSS Button No Image Css3Menu.com
New! Search comments:
(NOTE: Relevance, Author, and Article are not applicable for comment searches)
Being in the audience myself, I understood James point to be against LOOGYs and constant pitching changes that requires as 12 or 13 man pitching staff. Have lefties, but leave them in for more than one batter at a time. I believe James also mentioned possibly limiting how many pitching changes can be made in a game as a way of enforcing this concept.
Back when I played Start-o-Matic, I would look at the upcoming lineup for stretches of 3 to 5 batters for matchups where a lefty reliever might be more effective than a righty.
No consideration for Starling Marte? Very good to excellent in both range and arm.
My first read was similar to Richard's, that "glove" didn't consider range or arm, but re-reading, I see Jason's comment was that he asked for an overall rating, without it being broken down into the components of range and arm
If Tabata and Walker can put up 340-350 OBs I'd put them 1-2, regardless of lack of blazing speed, the Marte something like 6th
Marte's a very fast line drive hitter with developing power. It's just that strike zone judgement thing. I sat behind home plate for several games in Altoona in 2011 and in general looked better than what I expected from his previous stats. Yes, there were some ugly PA's for the Bucs this past summer but Marte did lead the Dominican Winter League in wOBA for combined regular and post season, with his highest walk rate ever in the DWL (7.1% in 178 PA)
What drives me nuts, especially when listening to main stream sports media, is the concept that everything has to "mean something". One of the most important concepts we learn in Sabermetrics is random variation. Sometimes things just happen and we can enjoy that as the unpredictability of baseball. I've heard Pirates announcers spend all of 2011 trying to analyze the fact that Kevin Correia had a better ERA in road games than at the supposedly more pitcher friendly PNC park. It may mean absolutely nothing but they do not want to concede that. Maybe they believe that as people paid to report on what happens in a game it's their duty to find meaning and failing that they have failed in their job.
I have a problem with the example of Eddie Rosario. As quoted above, in 2011 he had a raw batting line of .337/.397/.670, while in 2012 this dropped to .296/.345/.490 as Rosario advanced from the Rookie level Appalachian League in his age 19 season to the Class A Midwest League at age 20.
Normalizing Roasrio's stats for his age, level of competition and the specific ball parks he played in:
Year Age BH HR BB SO
2010 18 .301 .055 .057 .151
2011 19 .303 .097 .063 .179
2012 20 .309 .050 .057 .180
Have you checked for a similar trend at other positions, and are you using the same baseline (replacement level) for all the seasons?
I have observed that offensive production among players at all positions has dropped off since 2009.
Could I also suggest frustration or anger? At that age I bowled in a couple leagues, and was fairly good (172 average). But if I had a bad throw which hurt my score I would frequently become unsettled, which would make things even worse. I needed to relax and get back to my routine. It did not happen to the same degree when I was pitching, but I was able to show a marked improvement in my control by slowing down, taking a few extra seconds between pitches to visualize my next pitch.
When I first read the article, one of the things I thought of was "does the player deserve the comment?" which didn't seem to be considered, but here I can connect that with Russell's comment about what might not be said. We might need to establish the player's reality, comparing that to how it's described (accurately, inaccurately, or ignored). Language might be an issue on how well the announcers get to know the players personally (not saying socially, but as in interviews or around the clubhouse or field).
I checked the photo I have of Russell, Eric Seidman and myself, and Russell is slightly taller than my 5'9", so I'll accept his claim of 5'11".
So Chris Perez is concerned (and informed) about how many saves he has and if he played in the All-Star Game. F&*% the standings (team wins).
Alan Roth, the Dodger's team stat man for many years in the 50's & 60's, kept pitches in his own records which have since been made public.
and there was a game Ryan went 13 IP or so, 19 K, said to be abt 250 pitches.
I believe there is an effect, but it depends on the individual. On July 19, 1955, Vern Law of the Pirates went 18 innings, and 10 more 5 days later, showing no ill effects. In 24 G, 11 GS, 103 IP before the two marathons, 17 G, 11 GS, 69 IP after. His ERA went from 3.40 to 5.58, BB/9 from 2.9 to 3.3, SO/9 from 4.2 to 1.9 and ISO allowed from .072 to .136. Over the immediate next three weeks Law as the worst, 7 G, 4 GS, 20 IP, 8.41 ERA, 398/455/531 batting allowed.
But he came back fine in 1956 and put up another 5 quality seasons. Starting at age 31 in 1961 he had 3 straight poor season with much fewer GS & IP, presumably injuries, but rebounded and pitched until age 37.
Law absolutely had short term effects, the first 3 weeks the worst, the last 6 weeks of the season so-so, the next year recovered.
I think it's part of the job description that announcers must find a story for everything. They do seem to able to accept that some things just happen, but they do not mean a thing. "Why does Kevin Correia pitch so much better at home?" They'll tell me that Ronny Cedeno is 6 for 15 against some pitcher, so expect a hit. If I were to ask "If Ronnie Cedeno starts the season 6 for 15, will they say that they expect him to have a much better season than last?" If the answer is no, it's only 15 at bats - just shut up. I wish to never hear batter/pitcher match ups again.
But the 275 wOBA was a big drop from his 320 the year before. 2006 saw drops in BABIP, HRCON and BB%. Was Clement overly agressiveness and getting weak contact? He also chased everything when promoted to the Pirates, despite a history o being a patient hitter. 2007 put him right back where he was on 2005, and that's AA/AAA compared to College/Low A. 2008 Saw a jump in BABIP, with marginal increases in HRCON and BB%. Solid numbers, but a low BA. 2009 was right back to where he was in 2005 & 2007. Do a rolling three year mean, as virtually all projections systems do, and there's hardly any year to year change in Clement's projections.
When a prospect finals upon being called up, sometimes he's pressing, sometimes it's just random variation. Adam Lind stunk in Toronto and raked in Las Vegas this year. Add them together and it's almost exactly his number from last year. A lot of it was likely just being hot and cold.
But I agree that there is more, and guys like Peguero and Pena (and many pitchers) illustrate the point I was making in my original comment, trying to identify some part of a player's skill set that can predict success of failure better than only looking at the minor league numbers.
according to 2012 article by Jeff Passan, Morse was suspended three times for a single cycle in 2004, taken to help heal muscle tear
Here are my MLEs for Clement, adjusting for park and league
Year Org Level wOBA BA OB SA _BH _HR _BB _SO
2005 SEA Coll/A-/A .320 .244 .313 .420 .297 .058 .079 .253
2006 SEA AA/AAA .275 .229 .280 .340 .277 .029 .044 .218
2007 SEA AAA/MLB .335 .246 .329 .439 .279 .051 .091 .203
2008 SEA AAA/MLB .357 .263 .350 .471 .310 .058 .100 .219
2009 PIT AAA .320 .240 .310 .427 .286 .048 .081 .232
2010 PIT AAA/MLB .297 .237 .270 .427 .286 .062 .033 .275
2011 PIT AAA .278 .227 .288 .333 .313 .014 .073 .276
2012 PIT AAA .343 .265 .338 .451 .317 .042 .093 .215
For Jeff Clement, there were unrealistic expectations partially fueled by not properly putting into context his performance in college and the PCL. He's still the same hitter he has always been.
Mike Morse has changed. His walk and strikeout projections have been constant, but he has made much more effective contact since joining the Nationals, making large increases in both is BABIP and HRCON.
What needs work in Sabermetrics is trying to identify, through whatever means (stats, scouting, etc) which players will beat the projections and which will fall short. Where there any signs that Morse would start hitting the ball harder? I haven't delved into his batted ball analysis, but for example, could it be that a new organization told him to pull the ball more? (Not that it worked for Adam Lind)
and then Chris Resop caught Ryan Ludwick, who had already homered twice, looking at strike three, and the game was over.
The B-Ref list is Defensive WINS above replacement, not defensive runs above average.
Each win is worth about 10 runs, so Vizquel's 28.2 wins would be about 282 runs, assuming defensive replacement is equal to average. Over 24 years, that's about 12 runs a year
The Hardball Times Forecasts has Lawrie at +5.8, right in line with all the non-BIS ratings. Although there are differences in the details, everything Colin just said about FRAA applies to my defensive metrics.
I'm of the opinion that for this kind of shift, with the third baseman in short right field and everyone one else in fairly normal positions (the Pirates have also been doing this) it might be better to not include these plays in the third base ratings, instead creating a new position. I don't want to ignore the shifts, but the calling the guy in short right a third baseman is really making things difficult.
I met my wife while I was in college, when my summer job was working as the head scorer/statistician for our summer league that I was by then two old to play in any longer (a decent pitcher, an outfielder who never hit a lick). Three or four nights a week were spent at the downtown stadium taking in a pair of 7 inning games, but many of the players were guys both of us went to school with or hung out with. She knew if she wanted to see me, she'd have to watch baseball. She is a football fan, don't ever let her catch you saying anything bad about the Steelers. But baseball was something I did.
We married and moved away, and for many years most of my baseball life was play by mail APBA. Then I started getting involved with the online saber community, getting busier along the way. I heard complaints that I spend all my time at work in front of a computer, then come home and do the same thing.
She will not watch a game on TV, but does enjoy the ballpark. We get to Altoona up to ten times a year, plus some Pirates and Nats games. She has also enjoyed traveling to NYC, Boston & SF to attend conferences and such, and getting a chance to check out some of these guys I spend so much virtual time with.
But if the quality of pitching also rises due to an increase in the talent pool, there's an equilibrium between batting and pitching. The mean production will not change, but the variance between best and worst will shrink, as you have illustrated.
but he hadn't been in a competitive game in 10 months. After a few games he was timing the off speed pitches better.
and you don't know how much the costs were. Last time I got a ticket in Pa, the fine was $10 plus $87.50 in costs. Or $900 to the homeowners association (for not planting grass in December) plus $5000 lawyers fees.
Edge = Long Haired Tom Brady
The new ball cut a run off the league ERA and dropped HRs by a third, so I believe most pitchers were fine with it! Even so, I am a member of the Darvish bandwagon.
Got to a few Altoona games...Starling Marte is still aggressive at the plate, but has been cutting his strikeouts each year, never seemed to be lost at the plate or chasing pitches.
Hevn't seen Robbie Grossman in person yet, but he also has greatly cut down on his strikeouts each year while maintaining or improving his walk rate. Including the AFL this year, he showed 20 HR potential, profile reminds me of Nick Swisher.
If the topic to be studied was reliever to starter conversions, why not restrict the group to pitchers making that same conversion in the past? Much of what you have measured could be the difference between starting and relieving in general (see Tango's rule of 17, referenced in the BP post on Felix last week), and might have nothing to do with increased workload.
But if the first team picking gets to spend $11m on their top 10 picks, and the 1st pick wants $10m, then there is no way they can sign a second player to a large bonus. Gerrit Cole might get his, but the Pirates would not be able to get Josh Bell, who would slide even now because he wanted a lot of money to skip college, and now nobody will be able to spend big money on more than one player. Bell would not get signed by any team.
If a player comes to camp as a NRI, stays until opening day, but then is assigned to AAA - he gets $100k and can become a free agent two months later on June 1. If he is released within 5 days of opening day he gets $100k. 6 or more days, nada.
Remember also what is being quoted everywhere is the summary MLB published, not the exact language in the CBA, which is hopefully more concise in situations such as these.
The Pirates had Jordy Mercer at shortstop for Triple-A Indianapolis. He projects very similarly to Ronny Cedeno, low BA and OB with some power to go with a +10 glove. But I think I would have spent the money on Cedeno.
But I think there's a rule against electronic devices in the dugout
Canzler has started 96 games at 3b the past two years, compared to 81 in the outfield and only 38 at first. A 345-355 wOBA is average for a corner outfielder, but pretty good at 3b, and he's showed basically average defensive numbers...and only 25. Some teams could use that at third.
a 213 BA since the end of May doesn't help any
If of the traded prospects only Carlos Santana and James McDonald are now starting in MLB, and their best two remaining were Dee Gordon and Trayvon Robinson, both IMO barely replacement level, I'd have to think the system has stunk for awhile, and been made worse.
That the Pirates have more than one pitching prospect of Wheeler's quality, without giving up one of Taillon/Allie Heredia. Or am I valuing Wheeler to low? (of course, Beltran could have refused going to Pittsburgh, making it moot)
The swiping motion is because McKenry came across with the glove until he touched Lugo's leg, then moved the glove up in the air to show the umpire that he did not drop the ball.
I lost in on Emmanuel Burriss batting 5th...
There had been a lot of talk at BucsDugout about what it would take to get Beltran, and if the Pirates would be wise to pay that amount in prospect(s). After seeing the actual deal for Wheeler, my reaction was "we could have done that".
The Pirates just need to learn how to beat Milwaukee.
40 IP, is about 170 batters faced. According to Derek Carty's research http://www.baseballprospectus.com/article.php?articleid=14215 that is long enough for GB, BB & SO to stabilize, so you can get a good idea of how well the pitcher is doing in the categories that a pitcher has the most control over.
The sample size on batted balls is incredibly small, thanks in part to his excessive whiff rate, but the pitch counts are at decent levels.
I don't know how much value a comparison to similar style hitters has, as if Rizzo wants to stay in the major leagues his competition is major league quality hitters, especially those who play his position. So yes, start with MLB average, then maybe MLB average first basemen.
15.48 seconds, and it looks like he took the first couple seconds to stand near home plate in disgust of hitting an easy out.
With Rizzo putting in play 10% of the 186 fastballs of 90+ mph, and 14% of the 73 less than 90 mph, that's a sample of only 19 and 10 batted balls with which to attempt a babip comparison. After regressing with about 1000 league average balls in play...
What I do see is that Rizzo's swing and miss rates at both groups of fastball speeds are twice the league average (16/8 of 90+, 12/6 of <90). He needs to make more contact on fastballs of any speed.
"a natural question is whether or not New Coke addresses the concerns I’ve listed above with old Coke. The simplest answer I can come up with is, they’re still both mostly sugar water with a bit of citrus and a few other flavorings, and in blind taste tests you wouldn’t be able to tell the difference."
I drink way too much Pepsi, but can't stand Coke. I've also got a bottle of Mello Yello sitting here, one of my favorite flavors, much more than Mountain Dew. But I have a hard time distinguishing between 7-Up and Sprite, they are interchangeable to me.
For me the HR-CON observations were the most interesting part of this article. I've always used HR-CON for batters and pitchers in Oliver, but had been considering switching to HR/(LD+FB), intuitively thinking that would be more accurate. After all, in order to hit a homer you have to first hit a ball in the air to the outfield, so why not make that the denominator? I'll be investigating this more closely.
studes has already told me he doesn't understand how Oliver works, but that's OK as long as keeps paying me for it.
Thanks Steve, I accept your explanation. As for Fox, that's how he explained it to me.
Final comment, after having now read all of Colin's article, he does discuss some very important issues, especially with the prediction of home run rates allowed. Stuff to think about.
It's just that I (and apparently several others) found the slap at SIERA to be distasteful.
BTW, cola is flavored by cinnamon and vanilla, no citrus
Dan Fox left, took "Simple Fielding Runs" with him but left "Equivalent Baserunning Runs" to BP. Matt left and took SIERA with him.
We did independent testing of SIERA at "The Book" blog and it did well. Yes, it's complex, but the intent was to pick up on some of the nuances of BABIP.
I haven't finished reading Matt's recent series at FanGraphs, but there's some thought provoking issues raised. I'm doing some of my own research (off and on) of the relationship between a pitcher's true talent ground ball rate and his BABIP allowed (it's not linear) of his HR per LD+FB (also not linear). Another thing that's not linear is the relationship between runs created per PA (linear weights, woba) and runs per game (ERA).
I was a math major at one time 30 years ago, but today Matt has better math skills than I do to run the regression analyses that show how multiple terms, not just ground ball rate, interact to predict base hit and homerun rates. Matt's SIERA research published in these recent articles has discovered this non-linearity I spoke of, as well as realizing that one players's fly balls or line drives are not the same as another's.
Regardless of whether xFIP is a better estimator than SIERA, I believe these concepts will be important issues in the next few years as we try to improve our modeling of batted balls and the projections we build from them.
I know Kevin doesn't like MLEs, but Blanks is a good use of them, at levels near MLB (AAA & AA).
Yes, he's had a torrid 24 games at Tucson, but I can remember Jake Fox a couple years ago.
Combining Blanks stats at San Antonio and Tucson, and adjusting for parks and leagues, this year comes out about 280/340/530, very similar, with some additional power, than his 2008 & 2009 seasons. Petco will take back some of that power, but I agree that Blanks is the better option that Rizzo at this point. Rizzo might be up to that level when he reaches Blanks' current age (24) in another three years, but not now.
Or that this seemed to be the first time in a week or so that Kevin didn't say something about Altuve.
Eric, here's my logic.
You can't get a double until you're at first. You can't get a triple until you're at second. You need to follow this sequence, regardless of how fast each stabilizes.
So find (H-HR)/(BC-HR)
For batted balls, they are either on the ground or in the air. GB rate stabilizes quickest. Next look at the air balls, find the OF/IF fly mix, then whatever is left are LD.
2B+3B per 1B+2B+3B (pct of base hits that go for extra bases)
OF per OF+IF+LD
IF per OF+IF+LD (once the ball's in the air, OF or IF)
Discussion of this article at "The Book Blog"
I'm well aware of Tango's daily decay weighting system, where days ago refers to calendar days. The 1.0/0.8/0.6 seasonal weights for batters, and 1.0/0.7/0.5 for pitchers conforms to those numbers.
The problem is writing code to implement it. The formula for the weights isn't difficult, but each player's performance has to be analyzed game by game, instead of season by season. It can then only be used for leagues where play by play is available. This also bloats up the data tables, and presumably the processing time as well, creating and then analyzing those tables.
However, I am considering it for Oliver. On the good side, analyzing game by game would allow cutting off the sample in the middle of a past season, either those games beyond a certain number of days in the past, or after a sufficient sample size is reached (or both).
I was 12, and that was the first MLB game I veer watched.
I had the same reaction as deholm1 - if Fosse's shoulder was dislocated and fractured, and he couldn't lift his arm above his head, what the hell was he doing playing? If he had gone on the DL, even for the rest of the season, he might have healed properly and been able to return to his level of play.
OK, I'll agree that I wouldn't use the ZiPS method, for many of the reasons you discuss.
Can you give us more about PECOTA's method, to the level of detail that you examined ZiPS? This is what you wrote in the previous article:
"Instead, we are taking a player’s season-to-date numbers and, in effect, “regressing” them toward the pre-season PECOTA forecast. The weighting is determined by two things: (1) a player’s playing time so far this season and (2) the reliability of a player’s preseason forecast. The more a player plays this season, the more the rest-of-season forecast can move, but at the same time, the forecast for a rookie is more likely to move than that of an established veteran."
and what's wrong with a mustache? I've had mine since 8th grade.
Things we know as facts:
1. The night before, Harper was hit by a pitch on the thigh and left the game
2. Harper hits a home run the next day
3. Harper blows a kiss to the pitcher while rounding the bases
My conjecture as to what might have happened, but we couldn't hear
Harper hits homer, looks at it a bit
Tells pitcher to "Take that!"
Pitcher tells Harper to "Kiss my a**"
Harper blows kiss back at pitcher
Revere warned the British that the Colonists were a force that could not be conquered. "The bell's a'ringing! The town's alarmed, and you're all dead men!"
Kevin, you and I have spoken about Gary Brown. As a stat guy, the point to be learned their is not to rely on just one data point. Yes, Brown only walked 9 times in his final year of college, but had better rates in the Cape Cod League and his two previous colleges seasons. All those need to be considered in the analysis. He will likely be below average in his bb%, but around 6% is a reasonable expectation.
As a Pirates fan I'd be happy with Rendon, but I'm surprised KG doesn't have Bauer, a guy who's been growing on me, on their list of possible choices.
Just because someone gets waived doesn't mean you have to claim him.
Just because someone hits .300 in the PCL means he can hit.
And, my fielding stats see him as one of the worst fielding CFers in pro ball
It just bugged me about his trying to make a distinction between 'African-Americans' and someone from Puerto Rico. One speaks English, the other Spanish. Both were American citizens at birth, both were of African ancestry, both were banned under the previous rules. I don't see the need for different labels.
Ben, I've always admired your writing skills, and you have been gracious enough to assist me at times in my own attempts. I have to say that it is still a joy to read TA. Well done.
Puerto Ricans are Americans too.
I'm still trying to pimp Brock Holt. Will be getting my ticket package for Altoona asap so I can see Holt, S Marte, Sanchez and some of the starting pitchers.
Hypothetical road trip? I was ready to check my schedule to see if I could join you in West Virginia!
The city's colors are Black and Gold.
Teams have sold rights to the local broadcasters. We need a method that does not cut out those broadcasters.
I recall that last year MLBAM did make Yankees and Padres games available if the customer was already a subscriber to the local broadcaster, although that was not mentioned in this article.
Can we follow the same model?
a. If a customer is in the coverage area of a broadcaster who possesses rights to a team, show that you are already a subscriber, or pay a surcharge, to see that team.
b. If you are not in a local coverage area, then no blackout because no broadcaster is being affected.
I hadn't noticed about Poreda until you brought it up here. Interesting thing that the control seemed to go as soon as he was acquired by the Padres. In 2009 for the Sox affiliates, 46 BB in 85 IP, for Padres, 42 in 35, then the 64 in 54 this year. From 46/85 to 106/89.
That camera angle doesn't show the Terrible Towel hanging on the wall behind KG! I travel to NYC and end up in a Steelers Bar
As a Pirates fan, I'm looking at Cole and Rendon. Cole had some high walk rates his first two years in college, but it has come down each year, and this is has improved to only 5 BB and 37 SO in 30 IP.
It's always great to go to P&P, but Monday night I was still at home in western Pa, I don't go down to the DC area until Tuesday. Every time it's been on a Tuesday night, I've taken time off work and been there.
I make it there most every year but Mondays are a bad day for me, and I did get to see and talk to each of these guys last month in NYC...so maybe next year. Keep up the good work.
Oliver, published 2009 at FanGraphs had Wieters at 294/373/487 - looking back with the current engine 285/365/472
Maybin has not matched expectations, but maybe the expectations were incorrect. He is good enough to be a regular MLB CF'er.
He's hit 246/318/380 so far in MLB, 250/318/409 in 2009.
My MLEs for him the past four years
wOBA BA OB SA FR
2007 20 .318 .244 .315 .413 +3
2008 21 .320 .258 .326 .403 +4
2009 22 .329 .272 .338 .410 +5
2010 23 .316 .258 .320 .398 +10
Matt Lawson is much younger, so I thought perhaps he was Matt Lawton's son.
219 PA is probably too small of a sample size to say that Andrew McCutchen would in the future produce a higher OBP hitting 3rd vs leadoff. Assuming he and the other players perform the same regardless of their lineup slot, right now I might like Neil Walker better batting 3rd. McCutchen has on base, speed and steals than Walker, while Walker has more extra base hit power, including combined doubles and triples. So I think the Pirates might score more letting Walker drive in Tabata and McCutchen than McCutchen driving in Tabata and Walker. Then in 2012 Anthony Rendon can hit 3rd.
Those young Buc pitchers
will they each have Tommy John
or will flags fly high?
I'll still take Wieters over Danny Moskos
Your description of Montero's defense sounds a lot like Ryan Doumit, and he's still catching for the most part, and Montero is a much better hitter. I'm just concerned that Montero won't have nearly as much marginal value at 1B or DH, unless he develops into a .400 wOBA guy.
One request - could you let us know which teams the players are on? Chipper Jones isn't a problem, but Eli Whiteside?
I used to spend most of my work days mapping landfills, and could use the same argument. Now I'm the airport guy at the company, collecting vertical obstructions for airports around the country, so that the pilot knows what to not run into. At least, compared to the air traffic controller, I can take extra time to get it right.
I finished the article, and was going back to the main page, still thinking "He titled the piece The Next Roberto" but then never said a word about Clemente...then it hit me, that Roberto, Alomar.
Colin will have to answer any questions about specific players, or how Pecota works, as it his (adopted) baby, but for regression in general, you take the players historical data and add a fixed amount of average performance of the group he's determined to be a member of.
Some players will have a sample of 1500+ weighted PAs. Adding 200 regression PAs means the regression is 11% of the total. For another player who had 200 PAs in his only season so far, 200 regression PAs is 50% of his total.
You are confused about what regression is. Including more than one season, with the most recent ones weighted more, is the method of looking at the relevant portions of a player's career. Regression is when we lack information about a player (even when he has played a lot, Colin has done articles on other sites about this). For example, if you have a 20 year old shortstop in AA, you might regress his stats to what all other 20 year old shortstops in AA have done. Or if he's a fat first baseman who hits a lot of homers. How does that group age?
I'll have to point out to Tango that Pecota projects Strasburg for a 2.42 ERA (if he would have pitched in 2011) while Oliver says 2.50
I live in Pennsylvania but work in Virginia. Fortunately, the states have a treaty that allows me to have Pa taxes deducted from my check.
There's baseball, then there's the Pittsburgh Steelers. The 29th will be the week between them playing in the AFC Championship and the Super Bowl.
I'll be there - thank you for putting it on non-football weekend.
I read that the Pirates were interested in signing Veal to a minor league deal, but didn't want him on the 40.
Christina - "but who are essentially fungible and not a whole lot better than what you might find on either the major- or minor-league free-agent market, or from within most farm systems" - that's almost exactly how us stat-heads describe a replacement level guy.
What? I got all the way to the bottom and no answers? I wanted to know how I scored!
Thanks for the work you did and best of luck in the future. It was an honor to meet you last year in Pittsburgh.
Cool, just wanted to make sure that possibility had been considered. I didn't recall that you had studied it already, hence my "it may be..."
and for BABIP - it may be that higher strikeout rates correlated with lower BABIP allowed not because one causes the other, but that they may be both results of the same cause - high strikes. Low groundball pitchers have lower BABIPs allowed, and also have higher strikeout rates than high groundball pitchers.
I believe you have to look at contact rate and strike rate at the same time. A pitcher who throws more balls may have a higher whiff rate, but will walk a few more batters before getting the strikeout.
Two inputs (strike rate, contact rate), three outcomes (walk, strikeout, ball in play).
MLEs from my Oliver projections
Age BA OB SA wOBA BA OB SA wOBA
18 277 317 394 314
19 299 343 492 359 241 328 423 336
20 264 321 453 335 225 346 376 327
21 273 386 448 370
22 249 354 355 325
23 217 313 362 303
Crap, I paid $40.
Cox over Carpenter. Carpenter has shown a great glove in two seasons (+10,+21) but he only projects as a .250 hitter with under 10 HRs. Cox looks more like Belt - could hit 290-300, 35 doubles, 15 HRs.
Steve Pearce was originally reported out with a sprained ankle. After about two weeks he started a rehab assignment and played every day for almost three weeks, some games at 3b, hitting well. Then he's out for the season, and you report knee surgery. Was this a new injury?
I don't think Steven Jackson will be around for long. The Pirates dumped a starter and a reliever, then called up two relievers. As was done earlier in the season, Jackson is likely an 8th arm in the pen until it's time in the rotation for whomever replaces Brad Lincoln. If not Charlie Morton, the only other likely choice for starting pitcher is Rudy Owens in Double-A Altoona, who's regular turn comes around on Thursday.
I wrote about Derek Jeter's defense last year
I agree with kensai, Jeter is very sure handed (very few infield hits or errors) but lacks range. Each play has a run value, with hits to the outfield costing more than infield hits. One or two extra ground balls a week through the infield isn't something our brains catalog very well, but they can be counted on the scoresheet.
Excellent article, hard to find anything to disagree with.
As someone who produces a set of defensive stats, and who has studied this for almost 30 years (and likely doesn't have any better answers than Colin) a few thoughts...
I started off way back when reading Bill James' introduction of Defense Efficiency Rating for teams, and thinking of how it might be applied to individuals. James stated that with proper positioning any ball is fieldable, unless it's hit too high off the wall. So let's take the team total balls in play used for DER and assign each and every one to a fielder. There is a matter of opinion on which fielder had the best chance in a split zone, but likely much less so than calling ground balls hits or errors.
If we know that there's uncertainty in the source data, perhaps we should back off in how precisely we want to measure things. Colin gives and example of grading balls hit in the 3b/ss hole as difficult or easy. I might ask "Why bother?", let me know it was in the hole and which fielder presumably had the best chance, as described above. Peter Jensen called his Gameday derived system "Big Zone Metric" because he didn't attempt to bin the batted balls into zones, only assign them to responsible fielders.
We can be less precise in our reporting of the results. We might be used to seeing runs saved on the season reported to the nearest tenth, but our observational biases and errors probably won't let us see inside 5 runs. I can add the fielding runs to batting and baserunning to calculate a WAR value, but in grading the fielders it might be best to follow the example of Tom Tippett's "Pursue the Pennant" which assigned six grades, Ex,Vg,Gd,Av,Fr,Pr and I'll add Bd to make it three above and three below average.
I think Posey will be better than that Wieters guy, but I expect a 290-300 BA and 15-20 HRs. He did have those 26 HRs in college, but then only one in 111 AB as a pro the rest of the summer. It evens out. But a top 5, even maybe top 3 catcher
Thanks for the article, helps clear some things up.
One that confused me was on June 16 when the Pirates DFA'd Akinori Iwamura. With no further detail on the transaction, I assumed that he was being dropped from the 40 man roster. Seven days later, on June 23, it was reported that Iwamura was optioned to Triple-A. I was confused on whether he was optioned or outrighted. Apparently this was an optional waiver.
To OKGOJAY, there are three different seasons in which a player on the 40 man roster may assigned (optioned) to a minor league team, but there is a time limit to use those options. Hwoever, I thought that clock started on the date of the player's major league debut. Hopefully Jeff & Eric can clarify this.
Ouch on the Pirates.
Wasn't Thome's original position 3B?
and does the President know the names of any of the White Sox players yet?
"Most major-league hurlers were good hitters at some level of competition"
Back in my days of high school/college summer leagues, Pete Vuckovich was the poster child for the above statement, at age 18 and 19 leading our league in BA, HR & RBI, also in pitcher wins & SO. He could hit Charlie Sheen in 'Major League' but not actual major league pitchers.
Shawn Hillegas, on the other hand, hit like a pitcher when he was 19 and under, a .170 career hitter in the same summer league.
Super 2 turns into a game of chicken this time of year - who wants to go first? Jason Heyward opened the year with Atlanta, then no debuts of note (that come immediately to mind) until Steven Strasburg and Mike Stanton on 6-8 then Jose Tabata and Brad Lincoln, bith of the Pirates on 6-9. I think the Pirates may have decided to lead with their lesser talents, trying to ensure that Pedro Alvarez doesn't get the Super 2 tag.
Plus, Neil Walker has been playing well this year, both in Triple-A and the majors, even if I don't think it will last. Promoting Alvarez would mean dumping either Andy LaRoche or Aki Iwamura, LaRoche moving to 2b if he survives the cut. With Walker now at 2b and doing well, both LaRoche and Iwamura are fighting for their roster spots. The Pirates might be giving Walker another week or two of playing every day to help them decide who the second baseman will be after Alvarez is promoted.
His best comp based on a high ground ball rate, strike rate and contact rate looks to be Kevin Brown
39 player links for the Mariners article? That's some serious reminiscing
Yeah but now that Matt Wieters is looking more like Ryan Doumit than Superman...I'm still not that impressed with Daniel Moskos, but what has struck me is how much he has cut his walks.
Throwing in the good word for Rudy Owens. Yesterday I was at BCB and after allowing a two run homer to Domonic Brown in the 1st, Owens retired 16 of the last 17 batters in a six inning stint.
Any comments on Brock Holt of Bradenton? He just keeps on hitting, and has shown a plus glove at SS. I expect him to follow Tony Sanchez to Altoona later this year.
I just don't see as much talent in Neil Walker. The Pirates realized he would never be a MLB starter, so they hoped he'd do well enough defensively to be a Delwyn Young type of bench guy - except even Young hits better than Walker. Despite his hot 40 games this year at Triple-A, Walker still projects a 240-250 BA, average power at best, and few walks.
It will likely take a few more weeks to call up Pedro Alvarez, move Andy LaRoche to second and cut Aki Iwamura.
Sad thing for Pirate fans, Steve Pearce still projects as the best hitter in their system, but at first base he's below MLB average.
I was at Rudy Owens start on Thursday, second time I've seen him this year. He pounds the strike zone for 70-75% strikes (66 of 92 in this game) with a fastball that sits at 90-90 and touches 92. As Kevin said, likely a bottom of the order starter, but on a team like the Pirates that can get you a fast promotion. Easily better than Brian Burres, probably as good or better than Zach Duke or Paul Maholm.
That Mays guy can't walk!
Dustin Ackley is still good and Neil Walker still sucks.
Ackley's BB & SO so far have been in line with expectations, it's his BABIP that's under .200 and that will not last.
Walker is still a guy with average power and below average walks who projects to hit .240 in MLB. Knowing he is behind LaRoche and Alvarez in talent is why the Pirates are using him at multiple positions, to see if he can contribute at the major league level as an utility player such as Delwyn Young.
Or should I say that you have the two players merged. The top of the page gives the correct bio and team assignment info for the Blue Jays pitcher (554432), but gives him the 2009 stats of the Nationals pitcher (502808).
The auto generated link for Chad Jenkins went to the National's minor league pitcher of the same name.
Speaking of position players taking the mound, you might have thought it was only something you'd see in HS or college, but not Triple-A...last night with Indianapolis up a run on Louisville in the top of the 15th, starting pitcher Jeremy Powell pinch hit for relief pitcher Jean Machi, then in the bottom of the 15th stayed in the game - in right field, as the first baseman (who started the game at catcher) Erik Kratz came in to pitch - AND GOT THE SAVE!
I attended Strasburg's start in Altoona, and I would say he was overpowering. The first two batters in the game popped up and grounded out, followed by a double, single and a walk, good for one run. At that point Strasburg did appear a little less than Superman, but to Josh Harrison threw a couple 81 mph curves away followed by a 97 mph fastball near his head. Harrison K'd right afterward, then the next seven batters were also retired, four by strikeout. In the fourth he allowed a walk, a batter reach on error, and two ground ball singles up the middle.
The scoreboard reported 30 balls and 52 strikes for the game. He threw 10 balls 14 strikes in the rough 1st inning, but followed with 0 balls 8 strikes for 2 K's in the 2nd. Sitting at 97-98 (per the scoreboard) in the 2nd & 3rd, he did drop to 95-96 in the 4th & 5th.
Garrett Jones, trying to show that 2009 was not a fluke, going deep in his first two plate appearances
Does the runner steal on the pitcher or the catcher? I hear this question all the time. I figured the answer would come from examining the variances of the groups, but I was not familiar with this procedure. As you explained it, I might be able to do it myself.
I agree with Kevin., I think this year Heyward will likely put up roughyl league average numbers, such as Upton and Griffey did in their first years, but look out after that.
Thing is, which is more valuable for Atlanta, Heyward's age 20 or 26 season? If he adds enough to the lineup this year to put the Braves in a run for the division, then it might be worth it.
Lemieux's situation was different in that the Penguins gave him a long term contract worth a ton of money. He then retired due to health issues (cancer, bad back), but the team still owed him most of that money. Then the team went bankrupt, and Lemieux was their largest creditor giving him the ability to take over the team as the managing partner. Then he decided to play again.
I'll be at Harrisburg's 3rd game of the season (at Altoona), but I think there's little chance he won't start one of the first two games. It would be neat to see him pitch his first pro game.
First, I had a hard time finding the 2010 projections. The 'Find a Player' box still takes you to 2009.
Jesus Montero's weighted mean OB of 315 is only 20 pts above his BA of 295, and lower than any of his percentile forecasts. Clearly a miscalculation. But the final nine years of the ten year forecast are built on that number.
Hey, it works!
I've heard that the Nats liked Storen very much and were thrilled to get him, and that by signing quickly after the draft was able to put himself on a fast track to the majors.
Your fault for moving to Chicago. My fault for missing SABR in Cleveland and DC.
Plans are for my wife and I to spend a weekend in SF if there's another pfx Summit this year, but so far no verification. So maybe we can do Atlanta instead.
Let me restate for those automatic links
I project Daniel McCutchen about a quarter run per 9 better than Kevin Hart or Brad Lincoln (or Jeff Karstens), but there are several in A or AA that project better than McCutchen - Tim Alderson, Rudy Owens, Aaron Pribanic, Brett Loren, Matt McSwain - so Dan might only be keeping that seat warm until next year.
I project Daniel McCutchen about a quarter run per 9 better than Hart or Lincoln (or Karstens), but there are several in A or AA that project better than McCutchen - Alderson, Owens, Pribanic, Loren, McSwain - so Dan might only be keeping that seat warm until next year.
I look forward to attedning Mar 9 in DC
xFIP looks at FB, BB, IBB, HP and SO per IP. Each FB is multiplied by the league average rate of HRs, so it's only the FB rate that really matters. Numbers of batters per IP varies from pitcher to pitcher.
SIERA looks at BB, SO and GB-FB-PU per PA. AS the batted ball terms are divided into PA and not the total number of batted balls, their percent is effected by the number of BB & SO. Perhaps it can add HP and IBB, I'd have to reread the articles to see if Matt & Eric already considered nd then decided to exclude them as insignificant.
They are about equally accurate at predicting a pitcher's ERA next year, with a slight edge to SIERA that widens in y2 and y3.
I have not yet had the time to compare this to my method of first generating a projection based on hits, homers, walks, strikeouts, etc, then estimating the pitcher's ERA from his projected wOBA allowed. In this method, BB and SO are regressed somewhat, HRs more, and base hits heavily.
To be fair in comparison, the tests of different formulas have to be conducted on the same data. Run SIERA on my data, or run my formula on the SIERA raw data set that Eric provided. My projections don't have the batted ball data in the final report. I'll either have to carry them them through to the end, or run the projections based only on the SIERA raw data (which is what I'm leaning to.)
But right now I have to finish coding playing time estimates.
That looking at how many walks, strikeouts and groundballs a pitcher got last year will tell you more about his ERA next year than his ERA last year.
Some simple things broadcasters can do short of quoting new fangled acronyms. Starting with -
1. Stop equating batting average with hitting. Do not say anything along the lines of "He led the league in hitting last year" - if he led in batting average, just say he led in batting average. Don't infer anything beyond that.
2. Ignore anything based on small sample sizes. "Normally it's good to bring in a lefty to face a lefty, but this year Joe Blow is 'hitting' .333 against lefties" 2 for 6? Doesn't mean squat. Give the career totals, or last three years.
Eric provided me with the raw pitching data, and I ran my own tests which I posted at 'The Book' blog.
I confirmed that SIERA gave the lowest total errors in y1 (next year), although not by that much ahead of xFIP and FIP. I also looked at y2 and y3, figuring if this was really measuring the persistent skills it should still be good two and three years out, and again SIERA was slightly ahead.
My son has a 42" HDTV as the primary monitor for his desktop PC, sharing it with his XBox and Blu-Ray.
They didn't have these things when I was young!
I agree with nahtnJM - As you have the play by play, it is not very difficult to do park factors as a weighted mean, counting how many batters each pitcher faced in each park.
How about the Rangers? They probably already have the best starting 9 in the AL West, with the mlb.com depth chart showing Borbon in CF and Josh Hamilton in LF. By signing Damon they could put Hamilton back in CF, possibly pick up 3 WAR in Damon over Borbon, in a bid to clinch a division title.
'real' data sets? Gameday comes from mlb.com.
The only hit f/x data released to the public was for April 2009. We are hoping for more.
Peter Jensen has studied Gameday's ball locations, compared them to BIS and Stats, and although Gameday is slightly less accurate than those, it's close, and all have an error of around 6 feet in x or y in the outfield.
A year ago I published park factors studies that showed recorded line drive rates varied +/- 20% from park to park in the majors. The 'hard' or 'soft' designation had an even larger variance, and I have chosen not to use those. So yes, it's difficult to tell how hard the ball was hit.
It was said that the PECOTA spreadsheet consisted of players expected to play in MLB in 2010, so it appears to be based on each team's depth chart. Therefor, players not on any team's depth chart (not expected to play in MLB or currently unsigned) where not included in the report.
I just re-downloaded, and Ortiz is +12 at 1B, not 3B.
However, the +12 isn't realistic. I suspect this comes from a small sample size (as he mostly DHed) projected over 150 games. First I would Marcel his last three seasons, then do runs as a rate (runs per chance) and regress it to league average before extrapolating over a season's worth of chances.
and Lewis had an excellent 2007 in the PCL before going to Japan
Positioning is a key part of fielding. If a fielder is better than others in selecting an optimal starting positon, then he should get a better rating.
Every once in a while when I click on BP, I get a front page loaded from some past season, but a click on 'refresh' brings me back to the present day. (One time all the Puck Prospectus blurbs were in Finnish!).
So when I went to mlb.com's transactions and saw the Rockies had signed Paul LoDuca and Jay Payton, I kept checking the dates on the page to make sure I wasn't in some similar temporal anamoly. But no, it was indeed 2010.
You look at how the player has performed so far, in which leagues, at which ages, and how did everyone else in the past develop. You try to keep the sample sizes large, and regress to the mean to mitigate when they're not large. The further into the future you look, the larger the uncertainty.
I wondered if players who already had power by age 19 (Stanton, Heyward) might not grow as much, already be closer to their peak. I grouped hitters by low, medium and low HR%, but there was no significant differences in the aging curves for HRs.
But, I think there's a very good chance that Freeman could hit for average and power, an average amount of walks, while making very good contact. He should be an above average MLB first baseman, but we have to see how he does in 2010 and then recalculate. And Heyward looks like an excellent chance to be a star, one of the five best hitters in the minor leagues right now.
Freeman and Heyward are both guys who at age 19 don't project so well in their current season, but still have at least five years of expected growth.
2009 277/335/458 344
2014 309/367/561 394
2009 296/363/503 373
2014 327/397/610 424
Freeman now is meh as a 1b, but could be a Top 5 at the position within 5 years.
Heyward would be well above average now, and in the future could put Matt Wieters to shame.
Sorry, it's so easy to forget Bobby Crosby. A SS who can't hit so he's moved to 1B?
Both Pearce and Clement are now barely above replacement level for 1B. Pearce might stick as the good glove, rest Jones against LHP backup leaving Clement unemployed. I think Delwyn has enough versatility to stick, but Church basically replaces Moss on the roster, who's not going to get any better.
My say: Moss and Clement don't go north, and Vazquez or Young get cut when Alvarez gets his mid-season promotion.
Something that's not being measured but may have a important influence on the number of steal attempts is the run environment. How many runs can the team be expected to score with any steals? What is the skill level of the current batter at driving in runs?
Steals are best leveraged when you have hitters coming up who are good at driving in runners from scoring position, but not so good at driving them in from first - high BA, low ISO.
I did an article for the Idol competition that showed how league rates of steal attempts varied inversely with the overall performance of the league's batters.
Just another thing to complicate your model!
I'm with Colin and MGL here. By only looking at players with 10 year careers, I can't see how you are not biasing the sample. We could also be confusing cause and effect. Maybe these guys had long and successful careers because they peaked later than normal.
I don't think using MLEs for minor league data will work. My MLEs already have a fair amount of aging built in, and when I found that comparing each year's projection with the next year's actual performance worked well to zero out any mean errors in the projections, it was worthless as an aging curve.
It was a StatSpeak thing back in the day. I got to do them once a week with Colin, Eric & Russell.
I had a discussion on this with Matt back at Nats Park in September, but I'd been too busy coding Oliver to get into the research. I've got more free time now, and it is high on my list.
After looking at the hit f/x data or April '09 that was released, I visualized that BABIP likely relates to the vertical angle of the ball, or even better the velocity off the bat times the cosine of the vertical angle, giving the horizontal velocity, which is the reaction time for the fielder.
A cursory review of stats suggests that players with more GB & LD as opposed to FB and especially PU have higher BABIP. Those with higher mean vertical angles, who tend towards more FB & PU may have more HR, but a lower BABIP. This was a Derek Jeter vs Andruw Jones article a couple months ago which got me thinking about the subject.
You might try a regression to get coefficients for GB, LD, FB or PU, but I am going to look at comparable players - what is the composite BABIP of the x number of players most similar to a given player in their pct of GB, LD, FB & PU. That can be used an an expected value to regress the historical data towards.
not by my children
So I can be both a geek and a nerd?
I'm somewhat skeptical of Ellsbury's low ratings. I use Gameday data, and my calculations had him much worse in 2009 than in 08, but going from good to avg, while at the same time I had Bay rise from bad to average. Gameday reports who retrieved the ball, and I have to sue that as a proxy for who had the best chance to catch it. With Bay and Ellsbury playing adjacent and going in opposite directions, it makes me wonder if there were balls hit between them that might have really been Bay's plays to make, but Ellsbury got charged because he threw the ball back in. I would have to break down the numbers by vector to see if the performance shift was between the fielders or evenly distributed. UZR uses BIS and/or Stats, not Gameday, so this may not be an issue with their numbers, but I still suspect and want to investigate a split zone assigment issue.
Thanks for the mention.
I admit it's an unusual emotion to see all of my ex-colleagues from StatSpeak not only working here, but pretty much running the stats section.
As for myself, immediately after the Idol competition ended I was contacted by another prominent site about running their stats, and I have been coding a greatly expanded Oliver (batting, pitching, fielding, baserunning, etc) ever since. It's about 98% done, so hopefully the public rollout will be soon.
There are basically three things I've been trying to accomplish - contribute to the body of knowledge, become famous, and make some money! (ranked from easiest to hardest) I may be a bigger fish in a smaller pond by not being at BP, but I think it's a good deal for me and I believe I will provide content that baseball fandom can appreciate and learn from. Most every article I've written, from SeamHeads to StatSpeak To FanGraphs to BPro describes a concept or process that will be part of the new Oliver.
I like articles that make me think.
I would want to mark a manager lower who brings out his 5th best reliever to pitch a tied ninth because the closer only pitches in save situations.
I do think you are capturing managers who probably thinks right/left matchups are very important when using his setup men in the7th and 8th, but then ignores matchups in the 9th when the closer is pitching.
1. Are you using end of the year stats to judge in-season decisions? It might not be fair to analyze a bullpen move in May based on the pitcher's performance in August and September.
2. Are you using each single season's worth of stats? One season, especially for relievers, especially looking at splits, makes or small sample sizes.
To try to solve both of these, you could try an in-season projection, looking at the reliever's 'true talent' estimate at that date (that's what the manager knows then) based on what the pitcher had done so far that season plus diminishing weights on the previous three seasons.
As to Richard's comment, my son and I (rabid Steelers fans) had immediate replies to that dumb statement - that with a 6-7 record going into the game, how about that all seven lossed were by 7 or fewer points, three were lost in the last 15 seconds of the game, and two more in overtime. And how about 13 games being a small sample.
Obviously, the won-loss record is below their 'true talent level', caused by things like turnovers and fourth quarter defensive meltdowns. It's a small sample of games, but unfortunately they are in the book. Like Ken says, you just have to look a little deeper to understand 'why?'
I would gladly pay for an option where I could choose to listen to the radio commentators of either team integrated into the TV broadcast, rather than Joe Buck, Cris Collinsworth, etc.
Oliver projecting for 2010,
Wolf +13 PRAA, Hawkins +5
Looper -7, DiFelice +10
Gain of 15
While I was comfortable accepting Escobar’s .310/.338/.378 line as a reasonable guess for 2010
Huh? PECOTA'S weighted mean for Escobar is 263/301/350. Oliver says 277/304/353. That's -20 or -25 compared to the average MLB shortstop. PECOTA has Escobar +8 FRAA for 2010, Oliver +4.
And +25 FRAA for Gomez? His PECOTA page has +11, Oliver +3, UZR +7. He's good, but nowhere that good.
Overall, my projections see a 58 run decline going from Hardy/Cameron to Escobar Gomez. That's a tall order for Wolf and Hawkins to make up.
I project both Escobar and Gomez to have the same .287 wOBA in the batter's box. Hopefully the Brewers will avoid the temptation to put two sub .310 OBP in the 1-2 spots becuase "they are fast and make contact", and instead hide their bats at the bottom of the order. Gomez did hit mostly 8th for the Twins in 2009, while in 2008 he had 90 starts in #1 and 54 in #9. Escobar hit mostly #2 and sometimes #3 in the minors, but only had one start in the top of the order for the Brewers, the other 32 at 7-8-9.
Unfortunately, you can only put one player at a time in the #9 spot in the lineup. How about Yovani Gallardo 7th, Alcides Escobar 8th and Carlos Gomez 9th?
While I was comfortable accepting Escobar’s .310/.338/.378 line as a reasonable guess for 2010
with the rates I projected for Kennedy, he'd have to pitch 230 inning to allow 20 HRs and 100 walks. If we go with 162 IP, it's 14 HR, 68 BB, 140 SO. Kennedy has been stingy on HRs, as he only allowed 1 in 53 IP in 2009. I project him at about 70-75% of mlb rate.
4.24 ERA is based on a projected wOBA allowed of .323 (mlb average is .332)
"you can get guys like Kennedy in the minor-league free-agent market, or by sifting through the non-tenders, or by waiting until February for the last guys hanging around waiting to be signed for peanuts"
Yes, Kennedy had a historic meltdown in 9 starts at the beginning of 2008, and missed most of 2009, but he's a guy who put up 300 quality innings at USC over 3 seasons, then tore thru the minor leagues, with a career milb ERA of 1.96 in 248 IP. In 8 minor league starts to end 2009 (4 in the IL, 4 in the AFL) Kennedy allowed 1 HR, 12 bb 53 K in 51 IP. All this is ignored because of 50 innings in the spring of 2008?
Here's how I have the four pitchers projected:
ERA WHIP HR9 BB9 SO9
Scherzer 3.79 1.28 0.9 3.6 9.4
Schlereth 4.06 1.42 0.7 5.3 10.1
Kennedy 4.24 1.38 0.8 3.8 7.8
Jackson 4.56 1.17 1.1 3.6 6.5
Reminds me of way back in 1973, before anyone had heard of computers, but during a multi-year stretch when I scored every Pirates game, then spent the winter (classtime) devining some kind of stats from them.
In that year of 1973, Willie Stargell hit 30 come from behind homeruns out of 44. 30 Times his homer came with the Pirates trailing, and either tied the score or put the Pirates ahead. And Pete Rose won the MVP.
Your referencing how many plays Bay made, but not how many he did not make as well. It's like saying that someone has 200 hits - if he had 500 at bats it's really impressive, but if he had 700 not nearly so. The problem is 'plays not made' is not always easy to determine, which leads to probabilistic models and thus variations between different models.
Looking at my Oliver projections list of park-adjusted pitching, leaders in runs saved for 2009 only
1. Lincecum 63.7 1. Greinke 58.9
2. Haren 52.5 2. Sabathia 35.0
3. Vazquez 50.6 3. Verlander 34.6
4. Carpenter 49.7 4. Halladay 32.9
5. Jimenez 48.5
6. Wainright 42.3
wOBA HR% BB% SO%
2007 .280 .028 .101 .266
2008 .272 .021 .093 .303
2009 .251 .020 .076 .303
I'm sorry Matt, upon re-reading the first paragraph you certianly do call his 2008 lucky.
Richard, I don't have the positions at this time. but soon I improve my code whch determines that.
While pointing out that Hamels may have been 'unlucky' in 2009, I see no mention of how he may have been quite 'lucky' in the previous season, setting up a perceived decline of talent because of the loss of .057 on his babip.
Adhusted for park, but not for defense, Hamels allowed babips of .295 in 2006 and .282 in 2007, for a two year average of .290. In 2008, his babip allowed dropped even further to .261. We might detect a trend (two consecutive years of decrease) but we know that this is a stat that is not as well controlled by the pitcher himself. In 2009 Hamels allowed a .318 babip, severely bucking the perceived trend - but the two year average of 2008-2009 is .290, exactly the same as the two year average of 2006-2007. Larger sample sizes wash out any trends. Using a three year weighted average, Hamels babip projection was .294, .289, .281, .293 the last four seaons, while his wOBA allowed projections were .309, .303, .296, .303 - very consistent.
Breaking down Hamels balls in play
Year GB% LD% FB% PU% IFH% GBH% LDH% FBH% HR% BABIP
2006 .425 .188 .318 .075 .089 .163 .730 .151 .109 .298
2007 .444 .160 .300 .096 .068 .169 .713 .187 .105 .282
2008 .412 .190 .303 .095 .062 .117 .672 .181 .086 .262
2009 .439 .194 .266 .102 .113 .156 .726 .225 .087 .322
MLB .461 .194 .264 .080 .078 .175 .728 .174 .078 .299
2009 compared to previous 3 +9 +3 +4 +7 -3
2009 compared to 2008 +11 +10 +6 +6 0
Although Olivo is among the worst at blocking pitches, he does throw well, projected at +6 SB runs per year
You used Luis Castillo as an example, and this metric charges Catillo with errors for taking called strikes. However, Castillo has just about the best contact rate on swings, both at avoid swing & misses and putting the ball in play. I'm sure in his mind, he can afford to take a strike or even two, because with virtually every swing making contact he knows he won't strike out, even with two strikes. Therefor, the 'cost' of a called strike is very low for Castillo.
Guys like Chipper and Berkman swing and miss much more often, and thus can't afford to give the pitcher a strike. They have to swing more often to maximize the probability of putting the ball in play.
Try looking at results passing thru two strike counts - Castillo should have a much lower SO% starting at two strikes than do the lower contact guys.
Some players also vary their approach based on the count. My eyes tell me that Freddy Sanchez is reasonably disciplined with 0 or 1 strikes - he takes pitches out of the zone at a fairly normal rate, and probably swings at a higher than normal rate on those in zone. But, with 2 strikes he will swing at anything within 3 feet of the plate.
When I saw "Hillerich & Bradsby, the makers of Louisville Slugger brand bats, was found liable in the death of an American Legion player" my first reaction was that a player was hit with a wood splinter and bled to death. Now I read that a pitcher was hit with a batted ball struck with an aluminum bat. His mother thinks all bats should be made out of wood. Hasn't she seen all the pieces of wooden bats flying around the infield? The article reported that on average a pitcher has 400 milliseconds to react, but that this pitcher had 378. Did they have pitch f/x at this American Legion game? How can they claim 22 milliseconds fewer than average? They can assume from tests what the means are, but they would have no idea in this particular case wtether it was faster or slower than normal. What they know is that the pitcher was killed.
Batted balls are dangerous, as are pitched ones. It's not H&B's fault, or that they didn't print warning labels. I recently read about a woman awarded over a million dollars after she put her Winnebago on cruise control and went to the back to make a sandwich. The manufacturer didn't put in the manual that the vehicle needed a driver when in motion!
A couple years ago a minor league first base coach was killed with a batted ball which struck him in the neck. Now all professional base coaches must were helmets to avoid similar injury - only problem is the helmet does not protect the neck, which is where that coach was hit.
If you are studying overwork of young pitchers, why limit it only to those who appeared in the playoffs? I realize the post-season adds a workload that often is overlooked when examining the stats, but I'm sure there are other pitchers on non playoff teams who got ridden hard through 162 games, and would help in the understanding of this issue.
My first thought is that the veteran also has a distribution of projected performance, just that most of the time the uncertainty of the vet will be less that that of the rookie.
You really think Ohlendorf will still be here?
some pics I took
So sorry for misattributing the quotes - you and Neal were in the same direction from me, I probably heard the words more than I saw who said them. It makes sense that you would be funnier than the GM! (and I know you were spreading out the questions, not complaining).
Let's see what I can recall.
Will Carroll introducing Pizza Cutter to the crowd "is there something I can call you?" "well, my mother calls me Russell"
A loud crashing noise, almost like a dropped barbell, rocks the office building, Neal Huntington says "Pujols?"
Jinaz says something about the Pirates' losing record, and Neal Huntington says "this coming from a guy with a Reds cap on?"
then Shawn Hoffman to Huntington "you know we're not going to kiss your ass until you have a winning team!".
In response to a question from Will, Dan Fox says the metrics used by the team range from slightly to a lot better than those availabel in the public domain. After the Q&A, I ask Dan, "Is that because you can assimilate everything we do, but we never see what you do?" "Yep!"
Dan confirmd that teams don't care to share with each other either.
I've known Eric Seidman online for a long time, back to our days at StatSpeak together, and it was quite a treat to talk to him in person, along with our former colleague Pizza Cutter.
Thanks to Shawn, a true yinzer at heart, for setting up the event, and Will for running the show (even though when Neal was looking straight at me, Will wouldn't let me ask a second question).
Pizza Cutter has confirmed he will also be in attendance!
"But the scale doesn't exist to measure major league players, who are all above average baseball players, but minor leaguers"
That's why I think everyone should be rated on a major league scale. Why should I care if a guy is an above average Rookie League shortstop, I want to know how he compares to major leaguers.
Will, I understand that we have to go by what they setup, and people don't get in free...I thought that since I received two tickets for two events, it might be possible to order and pay for one ticket (the event), if you have already paid for the other (the game). But this is my first time attending one of these things, so I could very well be wrong. I just thought it was worth asking.
Will - when I got my tickets in the mail, I had two seperate tickets, one for the game, and another for the event. If someone is a season ticket owner, it would seem to cut them out of being able to attend the event without paying for a seat they don't need. I guess Mickey could ask Kristen in the ticket office.
I say Alvarez is called up June 1. No need to start his arb and FA clocks too soon.
of the eight who didn't win the MVP, five were Ted Williams and two played in Coors Field?
Schaum is quoted as saying that Braun and Longoria adjusted, but Gordon didn't. I will concede that my projections of college stats needs to be broken down by conference to account for strength of schedule (working on it), but the ones I have are still fairly good.
On just college stats, first line is projection, second is mlb record through 2008
BA OB SA BABIP HR BB SO
Braun 274 332 480 326 062 074 239 proj
Braun 301 350 587 328 086 062 209 actual
Longoria 267 328 421 317 042 077 210 proj
Longoria 273 343 532 309 081 091 238 actual
Gordon 258 334 459 297 053 091 205 proj
Gordon 252 330 419 305 039 091 219 actual
Actually, you don't spell it, you just say it, but there are regional differences. In Johnstown we say 'yunz', in Indiana Co, where my mother grew up, it's 'you-ins'.
I have my ticket, and will see yunz there.
I'm assuming we are all sitting together - my ticket is in section 142, the right field bleachers. We could be fighting each other for a Garrett Jones homer.
I apologize for misreading my own formula - I did not see a popup term and it made me think it was not being considered - but PU%*0 = 0 so it can be left out, as long as things are defined so that GB+FB+LD = 1-PU. So strike that comment from my original post, but I do believe that fly ball hits and ground balls hits are seperate, repeatable skills that may be able to be estimated by other measures such as isolated power or line drive rate - ways to see which batters hit the ball on average more sharply than others - the research continues.
In the last table, looking at batter performance, I assume that the PAs are the same as the previous table, for number of throws.
First, even the 3190 PAs with 2 throws is probably a questionable sample size - the 1219 for 3 throws most likely is, so 3+ or even 2+ should have been the last line.
Second, these may not be the same batters. It's possible the number two batter is more likely to get more multiple throws, with the leadoff batter on base, than the number four batter with the number three guy on base. You would need to control for batter ID, comparing how each did with various numbers of throws, and then seeing the changes for the group.
Overall, interesting and thought provoking study.
I use Gameday data, and their classifications of grounder, liner, fly or popup. I divide each by the sum of all four, so that the four groups percentages add to 1.00. This would be the same as BP's POP%. I describe foul pops, which are a subset of popups, as a percent of all popups, which is helpful for park factor calculations.
For HRs, I mainly use balls contacted or alternately outfield flies (FB+LD) as the denominator. BTW, there have been eight groundball homeruns in professional baseball in the last four seasons.
Thing is, I am losing faith in a BABIP estimator, as there are a range of hit values between players for grounders, flies and even liners (63% LDH% for Jason Kendall, 83% for Delmon Young). I'm working towards using weighted historical data for each player in each of the categories. Isolated power might be able to be used as a regression value for hits on grounders, flies and liners.
A very big thank you for plugging my article - however, I am doing further research on the BABIP formula your referenced.
First, popups should be broken out as they are outs 99% of the time. Popup rates (from Gameday data 2006-2009) range from Eric Byrnes at 16.9% to Derek Jeter at 1.5%. This has a high correlation to BABIP, better thn LD%.
Second, I've split ground balls into those that stay in the infield, and those that go to the outfield as hits.
Fast players who hit the ball to the left side, especially left handed batters, get the most infield hits. Not including bunt attempts, these range from Ichiro at 17.3% to Sean Casey's 2.5%.
Guys who can hit the ball more sharply get more ground balls to the outfield, ranging from Mike Lamb at 24.6% (and Sean casey 23.4) to Barry Bonds 10.8% (shifting the infield must do some good!). Besides left handed sluggers, the bottom of the list is also populated by guys like Alex Cintron, Angel Berroa and Jose Castillo.
Now to Mark Reynolds -
mlb <2009 2009
GB% .460 .396 .410
PU% .079 .099 .074
FB% .265 .304 .283
LD% .195 .200 .234
IFH% .079 .102 .127
GBH% .176 .217 .210
FBH% .174 .221 .203
LDH% .727 .794 .754
So all these top 15 guys get drafted on June 15th, never talk to the teams for 60 days, then do a deal in the last few hours. Meanwhile, they've lost a year of playing time.
Move the signing deadline up to June 30. Give them two weeks to negotiate and then get them on the playing field.
Switching to a football comparison...I was reading comment at steelerfury.com about John Gruden's announcing debut - mixed opinions, but people liked that he was willing to challenge Jaws on things like the 'Hines Ward Rule' and Michael Vick's signing with the Eagles.
The Steelers are mostly on CBS, which after nearly every play puts on-screen the cumulative stats for the RB or QB & receiver. I hate watching Fox, because they almost never show in game stats.
I guess then that Albert Pujols' batting average is more important than Luis Castillo's, because he gets more total bases on them. Oh wait, that's slugging average. These new stats are so confusing.
Sounds like the same reason I haven't worn a tie since 1991, I fnd it horribly uncomfortable having something wrapped around my neck. Beltre didn't think he could play baseball being uncomfortable somewhere else.
I read Beltre's comments where he said it was too uncomfortable to wear. He said the only time he did was when he was 17 and the Dodgers were fining him for not wearing one, but he was so insistent the team backed down and he's never worn one since. For Beltre, it is not a matter of money.
Carlos Gomez has no power, but a .346 obp isn't that bad - I'd rather have him than Cabrera (.312) in the #2 hole, giving the Twins two no power/better than avg obp guys in front of the three boppers. Then 6 thru 9 just make outs.
Richard - I love baseball, discovering new things, sharing with everyone, and hopefully make a little money. I have several options right now. I have been talking to Christina, and I hope BP is part of my future.
I'm a Pirates fan and I try to look at these trades objectively. I can say that trading LaRoche, Sanchez, Wilson, Morgan, Hinske, Snell, Gorzellany and Grabow was not 'relinquishing good talent they cannot afford'. All of these players were no better than average, and were getting veteran's pay scales to be average. McLouth was the only player traded this season who was better than average for his position. Some of the immediate replacemnts are not as good, but it won't be hard to replace the production at a much lower price. Thing is, we as fans want better production than before, we want to actually cheer for a winning team. I believe the Pirates will quickly have a good starting rotation, but the offense may take awhile, as I do not believe Milledge, Clement, Tabata or Gorkys Hernandez to be better than average for their positions either. Only Pedro Alvarez, if he can stay at 3b.
Roberto Clemente was from Puerto Rico, but fought the same racism when he played on the mainland. Yesterday I rewatched part of the documentary of the 1971 Pirates on FSP. In one part they discussed the 1 Sep 1971 game where the Pirates had a starting lineup of nine black players. The players themselves didn't realize it at first, as Al Oliver said in the interview "we always ran 5 or 6 brothers out there. That lineup included Roberto Clemente, Rennie Stennett, Manny Sanguillen and Jackie Hernandez, all of whom, because of their skin color, and not their national origin, would not have been permitted to play in the majors before Jackie Robinson.
Sanchez is a 'judy' hitter with no patience at the plate.
The only part of Frandsen's mlb batting stats that are out of line with his Triple-A record is his BABIP. Only 35 year old catchers who hit lots of fly balls maintain a .245 BABIP. If Frandsen get's it up to a mlb average .300, his batting average is in the .290's, and he's just about the same hitter as Sanchez.
I watch every Pirates game a day or two later from the mlb.tv archives. Some games have commercials between innings, some do not (lots of Geico geckoes). There is a 'Jump to Inning' button that can be used to skip over the break, but that is disabled during a commercial. They even cut the broadcast a second or two short to avoid quick-handed people jumping to the next half inning before the commercial comes on.
I just want to know when he was injured, which is fixed, and when he is expected back, which can be in flux.
Shouldn't the in-season projection be run the same as the pre-season, except that it's done now? Instead of doing two projections at different times and then blending the results, why not run the projection each day or each week and post the results.
The only issue is how to weight each season. What I was privately describing to you was a rolling or progressive weight, which as I understood your explanation of PECOTA should mathematically be the same. As we go from 0 to 100% of 2009, the weights for 2006, 2007 and 2008 should progressively decrease until the end of the current 2009 season, when 2006 will reach 0% and it effectively becomes the 2010 pre-season.
I thought the current number was total amount of days expected to be out starting from the time he was injured. IIRC there's guys who have '60' that you say are coming back in a few days.
With DXL, I want to know "How much more time will he be out". If the injury changes and he will be out for more, bump up the number.
Now this makes me wonder why, over four seasons, PECOTA never projected an EQA below .275 when Young never produced one above .254
Neil Huntington came on the air during Wednesday afternoon's Pirate game to discuss the LaRoche trade and it's aftermath. He spent a lot of time talking up Steve Pearce, and despite obvious SSS issues, highlighted how Pearce had listened to coaching and become a more patient and productive hitter since his demotion.
I have projected Pearce to be virtually identical to LaRoche in offensive production. Sadly, this makes him one of the best hitters on the Pirates, but in my opinion has never been given a fair shot at playing time in ordet to establish himself. Garret Jones was given the opportunity after Nyjer Morgan was traded, and has performed over any expectations. I would hope that Pearce would be given the first base job, put Jones in one of the outfield corners, and let Moss and Young split the other corner. At the end of the season, it will be clearer which of these four should continue with the team in 2010, and how Lastings Milledge fits into the mix.
Unless injured, Bobby Abreu will play for the Angels, and will probably produce at least $5m in value for his major league team. That pay is for only one year, and he can likely repeat that amount next year.
The 16 year old Dominican who may get a $5m bonus won't play in MLB for five or six years, if at all. That money is a once in a lifetime unless he establish himself as a productive major league regulat. Then, he can get that $5m every year, like Abreu.
Sometimes when I click 'reply' it still puts the post at the very bottom...the above was in reply to
"I was suggesting that the 107 for 400 (400 league average ABs) should be replaced with 400 ABs using his actual HR and K rates and a BABIP generated using his actual batted ball rates and multipliers."
But then you would be regressing Mauer's historical stats to a different set of Mauer's historical stats...and how do you know that the estimate of BA (or BABIP) from batting ball components are any less lucky than the weighted historical record?
I used Oliver as, after being tested vs other projections, I trust it the most for the lower minors.
Here's PECOTA on EY Jr
Year BA OB SA wOBA
2006 .271 .329 .368 .310
2007 .268 .325 .379 .311
2008 .263 .333 .370 .314
Again, very consistent, and just barely above what Oliver says (about 10 pts of BA, 5 of wOBA each year)
For my final day stump speech -
My main area of interest is player valuation - how to measure each player's batting, pitching, fielding, baserunning, etc. Part of that is being able to control for the ballparks and level of competition.
My thinking about these things over the years is largely grounded in two concepts - How good were the players I got to see play as amateurs, and can I design the best simulation game ever?
Much of what I write is about the process, giving the readers a peek into the black box of statistical procedures. I want to intereact with other analysts to ensure that the methods are accurate and the best available, and to have the general fandom understand and trust the process.
Once the numbers are run, you have the players. Who's the best prospect? If two guys hit the same, can we seperate them on their fielding and baserunning? Why does Derek Jeter have Gold Gloves? Which pitchers are toughest to steal on? etc, etc.
I have enjoyed the oast year or so of writing on the internet, and especially the last few weeks here at BP. I know I still have many new things to learn, and I hope to be able to share them with you here.
I do periodically email an announcer for the Pirates. I have seriously considered a list of this sort "Please don't say these kinds of things.." although to this person's credit he's not the one making the most egregious comments.
I probably will do a letter about splits - lefty reliever brought in to face lefty batter, color guy says that although this is by the book, THIS YEAR it's not true for this batter and this pitcher - well that pitcher had killed lefty batters his whole career, but was backward this year in something like 50 batters faced, and the batter had all of 17 AB vs lhp this year to base the comment on. SMALL SAMPLE SIZING WARNING!!!! Please add phrases such as "so far this year" or just use two or three year samples...
We all pretty well know how god Freddy Sanchez is, and this is probably the last best chance to sell high with him.
I wanted to focus on maximizing the return, pointing out that Young's 'raw' stats look impressive enough, but not when translated.
I pointed out that I do not know Young's defense, and I do believe this is a necessary piece of information. I have minor league play by play data, and am working on the database code to extract defensive ratings. If I join BP, they likely have more tailor made data available, as Dan Fox did do a minor league Simple Fielding Runs in 2007.
Andrew McCutchen has an average bat for center field. He will add value through his speed and defense. Eric Young is a below average bat for second base. He obviously has speed, and a big I Don't Know about his defense. Even if Young is a gold-glover, how many of those types do you put in the lineup at one time? I think McCutchen's arrival was part of the reason for Nyjer Morgan's departure.
I've tried to swith from the professor at the blackboard to a couole of guys at a table in how I explain things. Where ever I've written I;ve had to learn what the audience expects.
I did a few articles at SeamHeads, but they don't seem to be available anymore :(
I'm also not sure about the StatSoeak archives, they may have lost some older articles when they switched software last year, and this link only seems to bring up the most recent article http://mvn.com/profile/Brian%20Cartwright
FanGraphs archives look good http://www.fangraphs.com/blogs/index.php?author=11
I also noticed that Maddux's strkeout rate (for 2001-2008 quoted above) was well above what would be expected given his str% and con% - which leads me to believe the guy was just smarter, able to cross up the batter with an unexpected pitch. Maddux also outperforms expectations on BABIP.
Both Perez and Snell like to get ahead with fastballs, then throw sliders low. Some hitters chase, some don't. Plate discipline does vary widely among hitters. There can be great frustration watching a pitcher get ahead 0-2 and then walk the batter.Whether these pitchers are effective or not can depends a great deal on who they are pitching against, and thus the inconsistency.
Going forward I am going to focus on using this as a tool to study batters and pitchers as they are, using str% and con% instead of bb% and so%, to see if is a more accurate way to project minor leaguers.
I was introducing a new concept, looking for a good hook for the reader. Oliver Perez was only mentioned because I was looking for two opposite pitchers, who had average control but widely different contact rates, and he and Chang fit that description. Eric's article did a good job of comparing Perez to Daniel Cabrera, two pitchers who's results look similar, but when you dig down there are differences in the pitch data. I commented here that in general, oitchers with a low contact rate need to be in the strike zone, or they will be putting themselves on the edge, with wildly inconsistent results from small changes in strike percentage.
Yes, there's probably a relationship between str% and con%, as on average pitches in zone has a contact rate of .478, while out of zone of .299. However, my definition of strikes includes called strikes as they are an opportunity for the batter. Adding those to in-zone bring that contact rate down to .315, not very different from out of zone, so on average I'm not considering it. It does need to be looked at for individual players.
I deliberately made this metric simple enough to be derived from a box score, or even while keeping score of a game.
Without getting into all the physics of the flight of the baseball available in Pitchf/x (Enhanced Gameday), what it does offer for this type of analysis is pitch type (curve, fastball, slider, etc) and velocity. For example, Bob Walk has talked a lot this year on Pirates' broadcasts about pitching coach Joe Kerrigan's instruction of Zach Duke - how to use the inside fastball to 'speed up the bats', setting the hitters up for the off speed pitch. FanGraphs data has shown Duke's off-speed pitches to be much more effective than before.
Standard GameDay, in use in all of Triple-A and the Texas and Southern Leagues in Double-A, gives the location and outcome (called ball, swing and miss, etc) for each pitch. Whether standard from the minor leagues, or enhanced in the majors, I would like to classify pitches as in or out of the strike zone, the swing rates in and out of zone, and how these change as batters and pitchers advance to a different level of competition (Double-A, Triple-A, Majors).
Very true, although some pitchers nibble too much when they have good enough stuff. Pitchers with a low contact rate have to stay in the strike zone to avoid the walks.
My main interest will be in improving projections, seeing how these numbers change when a batter or pitcher is put in a new environment (which of course builds a better game engine)
That is the problem with Sanchez, he has to hit over .300 so that his double and singles combined with few homers or walks are productive. 2008 was a lost year with a lingering shoulder injury, but now may be the last chance to sell high. I would not expect a Matt laPorta like talent for Sanchez, but in exchange for helping another team in their pennant chase it would be nice to get someone who at least projects to be average.
8 with Indianapolis, 5 with the Pirates
I was at a game a couple weeks ago, KC @ PIT. Phil Cuzzi was behind the plate. Miguel Olivo checked his swing, and cathcer Robinzon Diaz signalled to the 1b ump for an appeal. My eyes are on the 1b ump, with no call coming, when I see Olivo shouting at Cuzzi, and then getting ejected. Watching the replay, Cuzzi CALLED the pitch a strike, but only after the catcher had asked for an appeal. Cuzzi was so slow on the call that the batter, the catcher and most everyone in stadium didn't see it.
On check swings, the Pirates' telecast uses a camera near the dugout looking at the batter. From that angle it is easy to see if the end of the bat swung past the line of vision, or was held back. The base ump would have a similar view, if he's paying attention. The announcers have claimed it's a coin toss on the appeal.
I only do the three hour commute once a week, from western Pa to the DC suburbs, but at least Matt is on a train and doesn't have to drive. I assume that gives him some laptop time.
Clement's bat is better than average for a catcher, but below average for first base. In his time in the majors, he has been horrible defensively at catcher, both at preventing wp and pb and also stolen bases.
Yeah, told you I didn't know who he was.
But Will Carroll liked it!
You know, I really didn't realize I said 'you know' that often. I will work on that. The 'ums' didn't surprise me.
Reading through the transcript, and seeing where I was dropping in the 'you knows', they were more in the answers where my mind was doing some other things (thinking, reading) while I was talking.
I'm familiar with Homer Bailey, but wanted to see some numbers on him. I have to confess I had no idea who Scott Thompson was (I've worked mainly with hitters projections so far) and started out by addressing Mike Ferrin's setup of the question. I thought I needed to get back to mentioning Thompson before I finished, but it might have worked better to just let it go. Politicians do that kind of answer all the time.
Platooning was a lot more common in the 1970's and 80's when most teams had a ten man pitching staff, leaving seven on the bench. The '79 Pirates used Nicosia vsL and Ott vsR at catcher, and Robinson vsL and Milner vsR in left field. Iy was a strict platoon on hand of the pitcher, everyone knew ahead of time which games they would start.
I did neglect to specifically mention that Marcel and projections in general use regression to the mean, based on the number of PAs in the sample.
If you run vsR and vsL seperately, you are going to regress each of them, so twice for that player. Even though you would regress the overall batting line and also the split delta, the split only affects how much is allocated to vsR or vsL, not the overall total. The idea for this process comes from "The Book" that Eric references in the article, and we've had some discussions of it a few months ago at Inside The Book Blog.
In this method, you only need three pieces of data - the weighted mean of all batting, the weighted mean of the split delta, and the allocation of PAs between vsR and vsL to get the estimate against each hand.
Here's another way to look at it.
Marcel estimates a players 'True Talent Level' by taking a weighted mean (5/4/3) of the last three seasons. Then apply the same weighting to the difference of how the hitter does vs rhp and vs lhp, producing the split delta.
Say a hitter has an overall .360 wOBA, a split delta of .060, and 2/3 of PA vs rhp, 1/3 vs lhp.
(2/3)*R + (1/3)*L = .360
R - L =.060
Solving that gives us that the player has a .380 wOBA vs rhp, .320 vs lhp.
Repeat for every hitter on the roster, and then find your best lineup vs right, and best vs left.
We have nine days before the next deadline. How about we just do whatever we want, as we did in the auditions?
Despite Steve Pearce's 'letdown' the past two years after his monster 2007, I still project him as tied for Adam LaRoche as the bast bat on the Pirates. The two are virtual mirroed clones, LaRoche left, Pearce right, in that they match so well in ba/ob/sa as well as hr%, bb% and so%. In the end it's sad though, as both are below the average of major league firstbasemen. In fact the only player on the major league roster who hits better than his position is Ryan Doumit at catcher. In the minors, only Pedro Alvarez, as long as he stays on third base.
When you are down two in the 6th, and on the road, playing for one run against the other team's ace isn't really a winning strategy either. It would be different if Ankiel was the tying run at 3rd with one out, but Carpenter made the 2nd out. When it came Carpenter's time to bat, the Cardinals only had 11 outs remaining, down two runs. It's hard for me to stomach just conceding that out. Mitchel Lichtman has often voiced his opinion that if trailing, the pitcher should never hit after the 5th.
The St Louis TV announcers did say in the 6th that perhaps it was better if Carpenter didn't reach base, as the last time that happened he had his worst inning afterwards. It is a subject for research, but is still speculation at this point.
On LaRue, I cut an earlier section that introduced him as Jason LaRue, but failed to then edit the second reference into what became the first
On the Adam LaRoche play I told Eric about, he nubbed a looper over the third baseman's head for a single. The description read "singles on a line drive to third base" - makes it sound like LaRoche knocked him over with a smash. I would have called it a 'soft fly ball'.
Here's a link to the second of two articles, which itself has a link back to the first.
In the California and Pacific Coast leagues you also get into altitude, which helps the ball travel further. Yes, the pitching isn't as good, but I was making a point about the parks. And the lesser hit balls do become homers in smaller parks, but if you graph how far guys hit the ball, Manny has more over the fence than he does just short. Lesser players have more of their balls short of the fence, and are more sensitive to the ballparks. Just saying probably wouldn't be hitting 76 HRs...58 maybe!
Great question Richard, and something I had planned to look for. I also have the HITf/x data for April downloaded, but not yet plugged in to my database. I would look for balls that are hit with the same speed and launch angle, group by if they are caught or go for hits, and then see if there is any significant difference in the LD% for each bucket.
Well, the GameDay text descriptions of each play do include phrases such as 'grounder' 'soft grounder' or 'sharp grounder', which can help, but of course Eric and I might disagree on whether a ball was hit sharply or not. If we are told it was hit at 95 mph with a down angle of 5 degrees, then there's no longer any human interpretation needed to classify it.
Being serious for a moment, there's something to consider about home run park factors. If Manny hits the ball 400 feet, does it matter whether the fence is 335 or 400? It is going to be a homerun anywhere. The players who are affected the most ontheir homeruns are those who hit balls in that iffy zone, homeruns in some parks, in the park in others. Short story, different players are affected different amounts, and it's the lower HR hitters who get the biggest boost, while the big boppers like Manny are the least impacted, positively or negatively, by the ballpark.
this week's subject was deliberately delayed until today...find an angle from a game played on Thu, June 25, 2009
I thought it was going to talk about Michael Weiner, host of 'The Savage Nation'
those splits were for Pedro Alvarez, I did reply to the comment on his splits, but it got put at the bottom
vs LHP 104 PA 9 BB 32 SO
vs RHP 163 PA 23 BB 38 SO
was better his last year at Vanderbilt
vs LHP 74 PA 10 BB 11 SO
cs RHP 118 PA 16 BB 17 SO
I commented previously on how I would like to expand this research by looking at inning, outs and score to determine the importance of the steal, and runner, pitcher and catcher to determine the odds of success. The other item is run environment - you don't really know the value of the successful steal or caught stealing in a particular situation until you look at the pitcher and the next couple batters, and what park they are in. In the Astrodome, they knew runs where hard to come by, so the value of a steal was higher and the lost opportunity cost of the caught stealing was lower. There are probably very few times when it's good to run in Coors Field.
haha...thanks - I guess that's a compliment!
#1 starters would get 36-40, sometimes even 41 or 42 starts...they would go every 4 days when possible, not 4 games. 180 days/4 = 45. The starts would drop off quickly for many teams, with the #3 and #4 getting 20 some starts and also relieving. Today there's a more fixed rotation, so while 1 and 2 get fewer, 3, 4 and 5 get more.
Thanks. Unfortunately, with really only two days I couldn't do enough research for a longer article. I made my point with the numbers I was able to crunch, counted around 1100 words, and decided to stop there - I thought any additional words (without the benefit of more research) would just clutter up the readability of the article. This week the judges are deliberately giving us only 24 hours, but also put in a 900 word floor to avoid us being too brief.
The times on 1b is for how many different batters. For example, Ichiro leads off the inning with a single. The next three batters all fly out. That's three different batters where he was at 1b with an opportunity to steal. On ome levels this is fine - there are different win expectancies as each out is made, which effects the decision on whether to attempt a steal. On the other hand, if he hypothetically steals with no one out, the second and third opportunities no longer exist. This complicates the programming of probabilities, but I think it can be accomplished.
Good point that not choosing your spots as wisely will lead to a higher CS%.
Unfortunately, this week's instructions were a day later than usual (Tues 1 pm) which pretty much gave a two and a half days to research and write, then the first idea had to be abandoned because MySQL decided to run like a tortoise and eight hours later hadn't given an query results. Then thursday night to find out that Dan Turkenhopf at day did a historical stolen base article at The Hardball Times.
Two main things should be considered when deciding to steal or not - how important is the run to winning the game, and what are the chances of being successful? I want to add win expectancy table a to my steals database. As Matt Swartz showed in his article this week, WE gives a weight on each event based on how important it is to winning the game, looking at inning, outs, bases occupied and score. Then look at the ability of each pitcher and catcher to prevent a successful steal. The combination of the two would give the expected win probability added with an attempt in that particular situation.
It would be a safe assumption that most steal attempts would be in the situations with the highest eWE (and this can be tested, looking for the average Att% at different levels across time). If the Mariners decide to run Ichiro at a 50% rate instead of 15%, we could look at the 50% of the times he was on 1st that offered the best eWE, and then sum the expected steals, caught stealing and pickoffs given those runner, pitcher and catcher matchups. Just not enough to do in a day or two, but it is now on my to-do list.
Actually, in the Retro db it's START_BASES_CD. There is a field which gives the ID number of the event that allowed the runner to reach base. It's then possible to use the gameID and the EventID to link to the events table. EVENT_CD is a classification of play types, where there are seperate codes for fielder's choice, hit by pitch, walks, singles and more. I'm not sure how much time it will take to process, but it sounds interesting, and it's not like I have a deadline on it now.
I did go through the play by play data in my RetroSheet database (1953-2008), extracting all plays with a runner on 1st only (BASE_OCCUPIED_CD=1) or 1st & 3rd (BASE_OCCUPIED_CD=5), storing the runner, batter and pitcher IDs. These are the opportunities. Also collected are whether there's a successful SB, a CS by the catcher, pickoff or pickoff CS by the pitcher or catcher, pickoff error by the pitcher or catcher, and overthrow errors by catchers.
Unfortunately, the way I have the data extracted I can't tell how they reached base. I can look at the database structure and see how easily I can add that info.
yes, 21 is likely faster than 22...speed might even peak before 21, small sample size problems...I didn't turn 22 until 4 months after I graduated
For most players, speed peaks at age 21 and then starts a steady decline.
After showing how much the elite batters production was reduced by the elite pitching, I would have then repeated the process to show the delta of the pitchers vs all and pitchers vs elite hitters. I would guess that the average delta of the batters is just about the same as the average delta of the pitchers.
The practical effect of this is that the more elite a league is, in terms of both batting and pitching (shown here by subsets of data), the league total slash lines won't change that much - what will change is the variance of the performances. There will be fewer and fewer outlier performances, eventually converging to where all the batters and all the pitchers perform at league average, even though they are the ultimate elite.
What do you do? Try to get another inning or two out of your starter, at the likely cost of potential runs scored, or go for the big inning and go to the bullpen early.
It's because there is a diversity of opinion on this question that makes it interesting, makes it strategy.
Last year I was listening to the Pirates at the Orioles. I came in about the third inning, missing the starting lineup. For the entire game, I had no idea when the Orioles flipped the lineup, who was the leadoff batter and who was batting ninth?
I agree, walking is complex, and then I watch my 13 month old granddaughter learn to do it simply by watching us. The human mind is a wonderful thing.
omehwta related question - during the 2009 WBC I found http://www.beisbolcubano.cu, and was checking out league stats. Since then, it appears to be blocked. I can get a page with today's date from the Google cache, but not a direct link.
Just now, when checking the url, I see a different address http://beisbolcubano.cubasi.cu/ that I had not seen befor, that has similar content.
I probably should have included a link in the original article, but I also don't want to look like I'm pimping myself with links to previous work on other sites
Oliver is a modified Marcel. Marcel, by Tom Tango, is designed for simplicity and uses only 3 years of data weighted 5/4/3, no minor leagues, no park factors and 150 PAs of league average performance. I built upon it by adding more years with similar weighting, minor leagues and park factors. This is a link to it's introductory article http://statspeak.net/2008/08/turning-the-monkey-into-a-gorilla.html.
Oliver is one of the projections hosted at FanGraphs, including Tango's Marcel, Bill James, Dan Szymborksi's ZiPS and Sean Smith's CHONE. I am working on an enhanced version, as I now have a complete set of minor league batting and pitching 1998-2008, college batting and pitching 2002-2008, and play by play for all minor league games 2005-2008.
"After normalizing for league and ballpark, McCutchen's yearly statistics..." The normalized stats are single season, but are adjusted for the ballparks played in and the level of pithing in that league.
"The projections use a three year weighted mean, which will smooth out year to year fluctuations..." Projections are multi-year averages of the normalized seasons, with more recent data given more weight, and then 150 PAs of league average performance added.
I was hoping to find some statistical story in McCutchen's numbers, but it just wasn't there, and there wasn't enough time to research a replacement. I was hoping to find something like him hitting flyballs while being a fast runner with below average power, but it turned out he did hit more grounders than average.
Despire being labeled the best prospect in the Pirates' system, ever since he's been an 18 year old in rookie ball, he's projected as a league average hitter. I used my own Oliver projections because I feel they best show a prospect's year to year progress across levels. As I studied in my audition article, projections which chain tranlations from one level to the hext underestimate talent in the lower minors. PECOTA does do a better job than most, as it's projected wOBA for McCutchen the past 4 years have been 333, 326, 305, 322, while Oliver had 329, 331, 313, 314. They both show, on a major league scale, his talent level has been basically unchanged, and no more than average (330 for cf). The other two projections I have available, ZiPS has na, 302, 286, 317, CHONE has na, 290, 287, 318...below 300 not nearly MLB quality. Once at Triple-A, all four agree within 8 pts, but up to 40 pts apart at Double-A.
Most surprising thing -
1980, I believe, Three Rivers Stadium in Pittsburgh, Expos vs Bucs. I'm sitting a few rows behind home plate, close enough to hear the players on the field.
Bert Blyleven on the mound for the Pirates, two strikes on Larry Parrish, two outs in the inning...then Blyleven signals for catcher Ed Ott to come out to the mound. On arrival, Blyleven hands his glove to Ott and says "I have to go to the bathroom!" and then strolls down off the mound, disappearing into the dugout. About five minutes later, Blyleven emerges, tying the string that formed the belt for his uniform pants, to the cheers of the crowd. Once more atop the mound, Blyleven only needs one pitch to strike out Parrish and end the inning.
btw Jupiter is Stanton's home park.
I always thought Jeff's MLE calculator at minorleaguesplits was way to pessimistic. You can see my Oliver Normal Season is very good match for Clay Davenport's Peak Projection. That's what Clay and I say that Stanton's 201 PA's this year translate to. My full blown projection, using multiple seasons, puts Stanton at 271/338/535
Thanks. Actually it's grandfather, but they're all in the house. I rearranged my work schedule last week so I could be home on Thursday and Friday and be of assistance.
Will - I checked with the email from Kevin announcing the topic for this week. It did not say that there was a limit, but it also did not say that there wasn't - it was silent on the subject. Not having it brought up led me to assume everything was the same. If there is no longer a word limit, please let us know.
"his MLE line falls at .228/.295/.414"
It would be better to give the source of your MLE. As I showed in my first article, different methods can produced widely varied results.
Jupiter is a very tough hitter's park, HR factor of 0.71, team HR factor 0.84.
.250 .298 .451 Davenport Regular
.315 .438 .690 Davenport Peak
.317 .398 .685 Oliver Normal Season
.271 .338 .535 Oliver In Season Projection
Stanton had an excellent two months in a very tough hitters park, and he's only 19! Combined with previous seasons, rgressed to an average Class-A player, etc, and he looks a lot like Craig Wilson 270 BA, 30-35 HRs, 50 BB, 170 K
Regarding the decreasing percentage of 1st rounders who make the majors - I think it's dependant on the length of the round. We would normally think as the 1st rounding allowing eahc team to have one pick, or 30 in total. Round 2 would be 31-60. With compensation picks this is not true. Players drafted as low as 50 have been classified as 1st round picks, diluting the mean talent level of the round. I would be curious to see if calling 1-30 round 1, 31-60 round 2, etc, regardless of the real life designation, would change the round 1 slope.
John Perotto said on Facebook Monday that wathing Snell pitch is like getting a root canal without the novocaine...Ohlendrof's pitching respectably, certainly not painfully. Meanwhile last night Morton threw a 7 inning shutout in his Indianaplois debut.
A salary dump when the Pirates are only commited to $12m for the next two years? The other day, before the trade, Pirates' announcer Bob Walk opined that the risk is not in giving $5m to a draft pick or to Sano, it's giving $120m to Mike Hampton. They should be able to give Sano $5m, their first pick $5m and still pay McLouth.
Morton has looked good in Triple-A the last two years. He had a 6 ERA in 15 starts in Atlanta last year, but the periphals weren't as bad. Despite what Christina says about Snell's contract, he is clearly #5 right now on the starting pitcher depth chart. Morton can go into the rotation, forcing Snell to the bullpen where I think it's quite possible he could succeed as did Tom Gordon.
I have the stats and park factors
but how many MLE articles can you do? I'm working on one of about three ideas I had, none of them using numbers, giving the calculator a week off...baby coming on Friday so I need to wrap this up earlier if possible.
KG - re Jeremy Cleveland.
In each of his last three seasons at UNC, Cleveland improved his stats, coming out of college with a .345 projected wOBA, mlb average for a corner outfielder, and the only season he projected as mlb quality after factoring in past seasons and regression. I would worry if his Sr year was an outlier, but it's a positive that he improved two years in a row.
Wieters was very consistent in his last three years at Ga Tech, all projecting to a 340's wOBA. Although average for a corner of such as Cleveland, this is well above average for a mlb catcher (312), in the range of Russell Martin or Ryan Doumit.
Cleveland went to Class-A the same season he was a Sr in college, and continued to rake, showing power and high BABIP. Ever since that one year, for whatever reason, his HR% tanked (.060 to .020, mlb avg .040) and stayed low the rest of his career. Was he injured, such as Jason Kendall's thumb?
Wieters, coming out of college with the same absolute numbers as Cleveland but much higher compared to his position, spent half a season in Class-A and half in Double-A, outhitting everyone at both stops. In 2009 Wieters was promoted to Triple-A and his stats appeared disappointing compared to his rpevious year, but after adjusting for park factors (home Harbor Park in Norfolk is worst hitters prk in Triple-A) Wieter's 2009 line almost exactly matched the projections based on his college numbers combined with his 2008 minor league stats. As far as I know, PECOTA only used 2008 and relied on the outlier, giving a much higher projection.
I did look at how much each player over or underperformed his own established level of BABIP, and reported those in the two tables.
As for scouting reports, that would probably take another article. I didn't have time to do the necessary research for this piece. We can look at scouting reports for Hanley Ramirez and Miguel Cabrera that turned out to be prophetic, but how many scouts wrote glowing reports of guys who didn't make it? If you don't have a nearly complete list of what the scouts said, you're going to be cherry picking.
I don't know if FanGraphs normalizes their LD%. I did an article there in January about line drives, and Baseball Analysts later did an article on Young that linked back to mine. There was some discussion on 'The Book' blog as well.
Responding to the BA article, I listed Rangers batters since beginning in 2003, and I think only one had more LDs on the road than at home. I still lean towards scorers bias, but it's probably a combination. I'm doing research on pitching at altitude, I can see if LD rate has any correlation.
When I compared parks I used LD% = LD/(LD+OFF). Smaller or larger foul territory should affect line drives and outfield flies equally, so that the ratio won't change.
also remember Michael Young has the advantage of the hightest line drive park factor in the majors
wOBA is widely accepted at other sites that publish this kind of material. I would have used EqA, but I don't have the formula.
I thought the rookie limit was 150 PA, and knowing Sandoval he could fit 146 AB into 150 PA
Nyjer Morgan has bene playing towards left center for a couple weeks. The Pirates announcers thought it was an attempt to cover the large left-center at PNC Park, but obviously he id doing it on the raod as well. Whether he is covering for McLouth or appearing out of position, he has the best outfield UZR in the majors.
After Theriot's first double of the night, he went back to second on a slow grounder hit to short that would have surely been behind Theriot. On the next play he must have decided to make up for his baserunning miscue, getting thrown out at third on a sharp grounder to short in front of him.
Chavez replacing Meek to face Z was more likely a case of wanting a pitcher capable of throwing strikes once the leverege went up.
By the way, it was Adam who ripped a double up the gap, while Andy had his clunk off Bradley's glove.
thanks, I will analyze this
Agreed. I do it the way you say. I just waited until later in the article to explain it.
In the more advanced version of park factors, yes of course you hold for the same oponent...this is how I do it, and how I explained it later on "compare the Pirates and Phillies stats in Pittsburgh with the same two teams stats in Philadelphia. Repeat for every combination of ballpark versions, then compare the total home to road stats for the entire range of years."
Whoops. That was a misstatement. You are absolutely correct. I'm surprised no one (including myself) saw that yet.
I'll label as 'old school' thinking (pre 1980s) in management that if your glove was far enough above average, the bat didn't matter, and if your bat was far enough above average, the glove didn't matter. I remember a national AP article in 1973 taking about how much the Pirates valued Dal Maxvill's ss defense in their pennant drive, even if he only hit .188. The next year, Maxvill was replaced by the immortal Mario Mendoza. Fans could chuckle about Willie Stargell's exploits in lf, as long as he hit 40 or more homers.
I believe you have to look at the total contribution, offense plus defense plus baserunning. Tampa Bay made huge improvements because they went from wrost to best defense at ss while keeping the offense there the same, and then with Longoria upgraded both the glove and bat at 3b. Dunn for Dickerson? I'd say Dunn still contributes a better total package. Sacrificing the bat for the defense goes back to the 'old school'.
1) Valid point on acronyms. I will pay extra attention to that kind of thing.
2) Runs and home runs are probably the two most quoted factors, and they are the ones that vary the most from park to park. Home runs were the 'hook' for this article, and being a component it's a number you can turn around and use for another computation. It's hard to do that with a runs factor and that's mainly why I don't track them, but a runs factor is easy to comprehend.
3) I'm not really familiar with Olney, and I don't assume that he was intentionally exaggerating. Maybe so, but to my reading he showed all the wrong ways to use numbers.
I think you may be off base.
I stepped through this in Excel to make sure everything worked as I expected.
I created four teams, A, B in Division 1, C and D in Division 2. Each team plays the one other team in their division 36 times at home, 36 on the road, and play the two teams in the other division 18 games each at home and on the road, for a total of 72 home and 72 road.
Let's assume we have perfect knowledge. Teams A, B, and C have a home park home run rate of .040 while Team D's home park rate is .060. The mean of all four parks is .045. In real life, we do not know these numbers, all we know is how many home runs were hit by each batter of each pitcher in each ballpark. Traditional factors are expressed as home/road ratios. I am fairly alone in trying to determine 'normal' rates at each park, which is the rate at which a stat will occur if a league average selection of players played there over a long period of time. There's more math than I can ask you to wrap your head around right now.
In our test case, in round one of calculating factors, A and B are both .040 at home and .045 on the road for a factor of .89. C's home rate is .040, but plays twice as many games at D, so it's road rate is .050 for a factor of .80. D's home rate is .060 and it's road is .040, so it's factor is 1.50.
C has the exact same ballpark as A and B, so it should have the same factor (.89) not .80. In round 2, each team's expected road rates are calculated by multipying the number of games against each opponent by the opponent's home park rate divided by their round 1 factor. A and B don't change, as they are in the other division. C's expected road rate is now .043 for a factor of .94, D's road rate is .048 for a factor of 1.26. In round 1 (raw), C's factor was too low and D's was too high. The new estimate is on the other side of the true value, but closer.
Round 3, C's road rate is .046 (true .045), factor .86 (true .89), D's road rate .044, factor 1.37 (true 1.33).
Round 4, C's road rate is .044, factor .90, D's road rate .046, factor 1.32.
One last time, Round 5, C's road rate is .045, factor .88, D's road rate .045, factor 1.34.
D has a home park rate of .060. If a batter there had an observed rate of .060, a raw (round 1) factor would normalize that batter to .040, but we know league average is .045. After round 2 the batter is rated at .048, round 3 .044, round 4 .046, round 5 .045. Three rounds gets the results to within .001, which is close enough, so let's not waste any more time waiting for the computer to do the extra calculations.
In the end, all four teams had an expected road park rate of .045, the same as the mean of the four home parks. You might ask, why not skip this exercise and just use this league average for the road rates? I assigned these values for this test, but in real life we do not know it. Two teams may have identical parks, but A has a lot of boppers while B has all slap hitters, which hides the truth of the park from us. This process is to strip out the players and show us the park.
This particular test shows that the iteration works. Team C played a disproportionate number of road games in ballpark D, causing it to have a different factor than A or B, when we knew that the true value should be the same. Each step of correcting for th road rates brought C's factor closer and closer to A and B.
Also, we may think that A, B and C are 'average', while D is the outlier, which is to say that league average is .040, not .045. A, B and C should then gave a factor of 1.00, while D's is 1.50. In a real life case where there are many more teams, ballparks and seasons, I believe that the long term league average would approach .040 and A, B and C would come out close to 1.00.
I started off with an example that showed bad use of numbers - don't extrapolate a small sample and put numbers in context. Those are basics for any type of numerical analysis. As for context, I explained that the basic park factor is home divided by road, but that it's best to divide rates and not counting numbers, because you may play more or fewer games at home than on the road. Then I showed how that number is still an estimate, one that has less uncertainty as the sample size gets larger, which is another basic concept of any analysis. Then it goes a little higher, but I think it's important to understand that a park factor only applies to the games in that park, and if you want to use these types of numbers to adjust a player's stats, you have to use a weighted mean of all the parks on the schedule. I do not think that mine is any more advanced than Silver's "Science of Forecasting" or Sheehan's "Stolen Bases", two of the articles given to us as examples from the original 'Basics' series of five years ago. Please check them out.
It's a concept. With that, you define that you will measure a factor between changes in the home park. On the most basic level you measure with home/road. Of course, as I pointed out, if road changes, the ratio will change. You try to counteract that by going back and adjusting the road stats with each road park's factors and rerunning. Once you get the uncertainty of the result under .05, consider it stable. Even with a higher variance, using the factor to adjust a player's stats will be less precise but still fairly reasonable. You just want to make sure that adding another season isn't going to result in wild swings in the calculations. This leads in to adding some amount of league average performance (regression) to moderate, cancelling out extreme results from smaller samples.
"Brian Cartwright has one heck of a voice!"
I have sung backup to two different Grammy Award winners.
roughcarrigan - I was not aware of the '600 Club' - it's one of the things that doesn't chow up when looking at a list of park dimensions.
To be fair and objective, I just went back to my database and created a new version for Fenway from 1988 on.
Base hits (babip) was unchanged at 1.07, but all the extra base hits dropped - it does appear that the ball did not carry as well. Doubles went from 1.30 to 1.22, triples from 1.03 to 0.97, and homers from 1.07 to 0.92.
I thought it was a good example, and although now struck down, the point stays the same - if the park hasn't changed, the park factor shouldn't. It's the team factor, the weighted mean of all the parks each team plays in, that can change from year to year as the schedule or any one park changes.
On Standard Deviation, I did not attempt to explain how it's calculated, but instead how it's used to show a range of possible answers, a measure of uncertainty in the calculation. 'SD' was on the second reference, as I chose to vary my wording.
"The chart below shows the standard deviation...If Yankee Stadium still has a homerun factor of 1.45 at the end of the year, with a SD of .149, that means there's a 70% chance the 'true' value is between 1.30 and 1.60, and a 95% chance of it being between 1.15 and 1.75"
'true' being the real underlying value we are estimating, the result we would get with an infinite amount of data.
there should've been a mention about park factors changing possibly because of stadium renovations
"Three Rivers Stadium opened in Pittsburgh in 1970...In 1975, an inner wooden fence was constructed, about 6 feet shorter, creating version 2"
My personal longest streak was 36 batters without issuing a walk...I thought that was good, but Daniel Cabrera nearly doubled it - I'm embarassed.
same email address as last time
It looks great after 555 words. Adding more, but trying to keep a logical progression of ideas.
1) I picked methods 3 and 4 in my comparison because of their similarity - (almost) all of the difference was in chaining or not. I don't understand what you mean (or what your evidence is) that chaining predicts better "for those that were promoted one class at a time".
2) Slight idfferences between MLB Chained and MLB Direct for AAA to MLB are because Direct did not have a requirement that that two seasons be within a year of each other.
3) When I said "is it correct to base the factors partly on the records of players who failed to advance?" I was proposing an alternate scenario, predicting an argument that could be used to back another method. I was simply trying to find several different methods to test.
There will always be some biases. When you get rid of one, you create another. By empirically testing the results I was trying to find the method that was least effected. This was not a case of trying to prove my method was right. Last year when I was designing my projections, I spent a lot of time looking for a method I felt was accurate enough, more so than the other projections available. This article describes my process of discovery.
MLB Chained and MLB Direct used the same sets of players, the same stats. If there are biases in the selection of players, it is cancelled out. The difference in these two methods is whether A+ is being compared to MLB, or A+ to AA to AAA to MLB. For the same players, chaining gives a result that has more error, an error that says the test players (as a group) are not good enough to even play in MLB, when in fact they performed at MLB average. If there's a bias that creates a result that's 95% of true value, by chaining the result is 90% in AA, 85% in A+, 81% in A, 74% in A-...it just keeps getting worse the more steps away from MLB.
The general point is that intuition may tell you that a particular method might be better because it avoids such and such bias, but you should model several different possible methods and then test them empirically to see which is actually more accurate.
A control group has to consist of people or objects that exist in both circumstances. In this case, there are more than two circumstances, so you have a choice between chaing and direct comparison. The tests show that chaining multiplies biases.
If there are two players who hit the same at a given level, and one got promoted and the other didn't, you can loom at other things like speed, defense and defensive position.
Richard, these are 368 guys who have played at A ball sometime in the past 11 seasons (my minor league db covers 1998-2008) and have since made the majors. What about the guy in A ball in 2009? I can judge his chances by comparing him to the players who have come before him. This article was a technical look at only one part, although critical, of developing projections. Some of the comments and questions have ventured outside that into projections in general. Once we determine the best method of analyzing those previous players, we can apply the translation to the current ones. Next, look for similar players in the past. How many of those became stars/scrubs/never made the majors? Why did they? That fills out the projection, such as PECOTA does, but was not intended to be the subject here.
I realize there is a fine line on responding to comments. I don't like to go back and forth too many times on the same point. I'll clarify, but would like to move on to another question. Sometimes the reader doesn't get it - I'll look to see if it's my fault for not explaining it well enough.
I have a BA in Geography with a minor in Math & Computer Science. I have taken several college stats courses, along with Operations Research, but admittedly it's been a while ago, compared to some of you younger guys.
To check how well the means of the predicted and expected compared, sum(e-p)/sum(n)
To check root mean square (error is error, regardless of negative or positive) sqrt(sum((e-p)^2)/sum(n))
To use Pythagoras to calculate simlarity score sqrt(sum((e1-p1)^2+(e2-p2)^2+(e3-p3)^2)/sum(n))
As I said in my last comment. the control group hit 270 in their first 2-3 seasons of MLB. Based on only High A or lower stats, MLB Direct projects them to hit 267. All Chained project 229. Even if MLB Direct has some bias, All Chained has other biases which clearly outweigh and make the results unusable. If a large group of players hits 270, and a projection says they are 229 hitters, I would say that projection is wrong.
Comments and some things I've learned this week -
I've been a BP subscriber for about 10 years, and this is the fourth site where I have published articles. One of the problems when writing for a new audience is knowing the type of article they expect. When I first started reading BP, it was a stats site, but apparently is not so much anymore, although there are some (many?) of us who wish it still was. I am still primarily a stats guy, and would like to write mainly stats articles for BP, but wherever I write will be learning how to tailor my message to the audience.
As someone who works with the stats virtually everyday, the numbers quoted in the article were in my language, and had lots of meaning for me. When someone says "superior projection of a few hundreths of OBA+Slg" I realize I have not made it clear in terms the readers are familiar with.
Restating the test results:
BA OB SA
270 327 427 Marcel of MLB of 368 players in test sample
267 323 419 A+ using MLB Direct
242 295 368 A+ using MLB chained
234 284 345 A+ using MinPA Chained
229 279 338 A+ using All Chained
Collectively the 368 players had a MLB line of 270/327/427 in their first 2-3 seasons. Using only batting data from High A and below, the MLB Direct approach would tells us that they project to 267/323/419, very close. A projection system that uses an All Chained method would project these same players, based on the same set of stats, to have a 229/279/338 line, which is clearly not good enough to even play in the majors, instead of MLB average, and therefor not a useful method.
It's a matter of comparng the accuracy of different approches. After he most accurate of several have been determined, the rest of the projection system can be built.
Not directly promoted, but the betters ones are eventually. You have a player hitting 300/350/500 in Class A. How does that project on the stars/scrubs spectrum? Take all the players who played in this players league, and also in MLB. See how their performances changed. Apply the same translations to all the current players to estimate their propect status. There are other details and nuances, but that's the basics.
I've realized that you are following the show as closesly as possible, as best I understand it. It was just a comment. I will follow whatever instructions the judges send me each week. On the other hand, I'm a reader as well.
The method I used to test the results is a standard technique in industry.
I make my living creating digital maps from aerial photography. Surveyors go into the field and measure a small but sufficient number of points that we can see in the photography, reporting to us the point's east, north and elevation. When putting the cursor on each of these point in the photos on the screen, the software reports back the pixel location, then then compares the ground coordinates to th photo coordinates. Once the translation is establshed, I can go to any point in the photography and the software will give me a calculated ground coordinate. That coordinate translation is the basis of the entire map compilation process, and it's the same concept I used here in the results test.
Mitchel Lichtman said recently on his blog that in calculating his unpublished projections, he only compares players in different levels in the same season. The reasoning is that if you compare 2007 to 2008 stats, the player's true talent may have changed during that time. That approach limits you to a chaining process to compare the lower minors to MLB. If I compare two stat samples that are four years apart, the odds that the player's talent has changed (the thing I'm NOT trying to measure at this step) is higher, but it gives me an opportunity to directly compare a player's performance in Class A to his performance in MLB. Which method gives the best results?
I thought I took a whole paragraph to explain "elapsed time"
How much elapsed time should be allowed between samples?
...By recording statistics for each player in each season, we are taking a sample, over a given period of time, which estimates the player's "true talent" in each of the various categories. As a player ages through his twenties, he will on average lose speed, but gain power, strike zone judgement and contact skills. If it takes two years of statistics to get a good measure of a player, at the end of the two years he is likely not exactly the same player he was before. It would then make sense not to let too much time elapse between the two sets of stats being compared.
With such a time constraint...
But that would be assigning 100% of the DER to the defense. Voros McCracken concluded this, but further research by others suggests it's in the range of 20% pitchers 80% fielders. Maybe it wasn't the writers intention, but I was looking for a dtermination of this pitcher/fielder split and didn't find it. And ballparks do also have an influence.
The factors were calculated in four different ways. The results of each were tested with the 368 rookies in MLB from 2003 to 2008.
I didn't do any rounding other than telling Access to display 3 deciamls.
The purpose the the MLE fctors is to translate minor league stats into 'equivalent' major league stats. How can I test the accuracy of the translation on players who did not perform in both situations?
Say I have a set of ten x,y pairs, each in two different coordinate systems. I can compare the two sets of data to derive a matrix to convert from one to another. Then each of the points are run through this matrix to compare their predicted location to their observed, the residual error. Once the residuals are sufficiently small, the conversion is deemed reliable, and any point from coordinate system one can be converted to system two. The complication in MLEs is that there are more than two systems, with six levels of minors below MLB.
What is the method of modeling the talent levels of the various minor leagues that will create the smallest residual errors? I listed four different methods, and calculated the error rates for each. We need to avoid picking a method because inutitively it sounds good without firt testing it against the alternatives.
We're not voting on ths batch of articles, that will start next week. I too am concerned about having too many articles on one day (even if I am first alphabetically). I would suggest running two a day from Mon thru Fri, so we don't get burned out reading ten at one time.
Any evidence that TTO hitters tend to be poor fielders? The current crop, yes, but I'm not so sure it holds true over time.
Nice. You had me wondering "What the frak is TGF?". Like looking into Putin's eyes.
I liked the article a lot, nice read and I was able to pull some of my own conclusions from the tables. I'm just not sure I agree with your conclusions, about the importance of Opening Day roster selections. There's not a lot of differnce between players 21-25 and 26-30. It's the lack of quality in the top 20, determined in the off-season, that has the biggest effect on a teams won/loss record. I see the Playoff Teams and the contenders being very close in quality of their starting players, but the contenders lack bench depth. The contenders and the also-rans have similar bench strength, but the second division teams lack depth in their starting players.
It seems to me that the Nats simply don't have any pitchers worth spending significant amounts of money on. Not only based on their quality, but that many are pre-arbitration, and those types of players, regardless of team or position, are predominately paid well below their market value.
"Would it not make more sense adjust his statistics based on appropriate aging patterns?"
Yes, a finished projections system should include aging patterns, but these tests were to determine which selection criteria were best for a matched pairs comparison, which is the first of several steps before applying age corrections. In that test I was attempting to minimize other influences such as aging. Several projections that I am familiar with deliberately choose to limit their comparisons to adjacent seasons or only same seasons in order to avoid an aging bias. I wanted to find out if this is a wise approach, or if it creates more problems than it solves.
The Pirates mad a 41 point jump with almost the exact same players on defense. Isit better fielding or pitching? I watch almost evey game, and it does seem that the pitchers aren't giving up as many ropes
and don't forget Indiana University of Pennsylvania and California University of Pennsylvania
I'm still not clear - the first measure was StS/Pitches, while the 2nd was StS/Strikes? Or StS/Swings (StS+Fouls+BIP). I'd do it as a pct of swings - you can't swing and miss until you swing.
The inverse correlation with strike% might be an ability to get the batter to swing at pitches out of the zone, which would be easier to miss on, such as a two strike curve or slider.
My projections show Danks as a likely 250's hitter - .320 to .330 BABIP, BA brought down by .250 SO%, below avg (.040) on HRs (.032) BB% slightly above avg in .090's. A patient, line drive hitter who has contact issues.
On one hand, Will knew me before the contest, but on the other, if he'd read the postings from my Facebook he would have known that I've written before.
If my article doesn't make the Top 10 here, I will put it up at FanGraphs.
My submission was of a topic that I had informally worked out last year, and was part of a larger project I'm in the middle of. So it was on my shelf, but had not yet been published. I believe that's what Will was referring to. So I worked up a formal treatment of it, had a couple friends read it and make comments, do some rewrite, and try to fit into the word restriction. This was a "hard-core research" piece, and I did have to take out several explanations of procedure (aka geek speek) to get down to 1454 words. It's easier to read, but I had originally put the extra words in so that those reading could be sure of my methods. However a friend, who's a baseball fan but not a stats geek, got lost when reading it.
What, read the instructions?
The "PT" wrapped around to the next line, and my eyes apparently skipped over it
That's the first time I heard the deadline was Pacific Time. My entry went out at 11:40 pm Eastern.
Donald Drake Hogestyn, of "Days of Our Lives", 1976-77 in Class-A for the Yankees http://www.baseball-reference.com/minors/player.cgi?id=hogest001don
April 15th? I still have to do my taxes!
Very nice article Ben. Thanks for the link and glad to help out.
When I was rating cathcer's throwing for FanGraphs, one of the things I looked at but didn't publish yet was how the CS% changed with age. I noticed that backup catchers didn't seem to suffer the same decline in CS% in their 30's that the starters did. I need to work this out to a full study, but I suspect that this effect probably correlates better with career innings caught rather than age, much as Ben has shown here with a day to day effect on the catcher's offense.
I'm a Pirates fan too.
Not by very much
Thing is, as Will has helped show with his list, size doesn't mean very much in baseball, so no one bothers to keep accurate records. It's just trivia. In football, a 30 or 40 pound discrepancy does translate into performance. Same thing with height in basketball. Baseball, who cares?
It was my third time at P&P, and definitely the largest crowd. I really enjoyed talking to you all, and I am glad you guys did also, especially after I put Clay on the spot about Matt Wieter's PECOTA.
McLouth lists several variables that are legitimate when measuring defense - however the first was how shallow he plays, which is a decision that can be either a good or bad one. He concedes Carlos Gomez does not give up as many extra base hits, but seems convinced that the fewer singles he allows outweighs it. Truth is, it doesn't. If management has told him to play shallow, then they are probably mistaken. BTW, McLouth has only been thrown out by a catcher ONCE in his mlb career, and that was a bad call by the ump. He's had 5 po/cs.
Moyer's attrition rate hits 100% in 2012...prove them wrong!
My third grandchild will arrive this spring. Moyer made his MLB debut before my daughter was born!
I really felt for the Dutch releief pitcher yesterday. He's basically a semi-pro, as he had to call home to get more time off work to play. And there he is, in a Major League stadium, pitching to Adam Dunn, Ryan Braun, Curtis Granderson and Jimmy Rollins. Never mind that they are lacing hits off him, he is out there pitching to an All-Star team of Americans. I'd gladly take his place, and it gave guys in countries like that hope that maybe one day they could be pitching to MLB stars, even if they don't make it to the Show themselves.
Both Ausmus and Varitek have remained very good at preventing wild pitches and balks, but both of their arms have become Piazza like in recent seasons.
Another thing to consider in the number of Dominicans and Venezuelans is that they have summers leagues which are run as affiliates of the mlb teams. Mexico, Japan, Korea and Taiwan have unaffiliated pro leagues, with restrictions on leaving, and now Puerto Rico is treated as part of the mainland. In the DR and Venezuela, mlb teams can sign the players at 16, and have them play at home for up to 6 years.
The 16 year old minimum came about in the 1980's to avoid teams signing YOUNGER players. The Phillies signed a 12 year old in 1980, and he played 4 or 5 years in the minors. Before the US draft, Bob Feller led the American League in strikeouts as a pitcher, in his 2nd season, between 11th and 12th grades. He was into his 3rd year in the majors before he graduated high school.
Games are also available on demand shortly after they are completed. It is really hard for me to listen to a game live, but no problem to avoid the linescores and the next day put on the headphones at work and listen there. Then I can also pause it when someone needs me.
I had been planning on attending at Politics and Prose...is there anything at the Georgetown event that would be worth attending both?
I\'ve looked at Clay\'s minor league translations, and the weighted mean PECOTA beats both the regular translation and the peak projection by a long shot, and Clay leverages age heavily in the peak projections, so I have to assume that PECOTA isn\'t based on any of Clay\'s projections.
Oliver used three years of college data as well as the one year of pro for Wieters. Each of the college years translated to a wOBA in the 340\'s, MLB average but good for a catcher. 2008\'s translation was a 411 MLE wOBA, but beware of outlier seasons, only about a 30% chance of repeating. You wouldn\'t know it was an outlier without the college stats - the same thing happened with Alex Gordon, where he had a fine year in AA, but in MLB looks much more like he was in college.
I\'ve been a BP subscriber for something like 8 years or so, and so far have attended two events in person, and got to meet Christina at the first four years ago. I enjoyed both, and I am looking forward to being there next month. I\'ve been treated well as a customer and have had friendly email exchanges with several of the folks here. That said, I like research best, and haven\'t seen much here since Dan Fox left last spring. Eric Seidman looks to have picked up that ball lately, but it would be great to have more than one writer doing research. I\'ve been considering whether or not to renew this spring.
Way back when I was a kid, the 5th guy in the bullpen was the mop-up man.
I agree that further tests would help determine if this was coincidence. Make a list of starting pitchers who had a relief appearance of say 2+ innibgs, on less than 4 days rest, then made the next start less than 4 days after the relief game. See if there\'s a pattern, or what percent of the time bad things happen later
If the Pirates would have dropped Davis and Chavez a couple weeks ago they could have protected Jamie Romak. Their 40 man roster had 5 catchers.
I don\'t understand McGehee over Dillon. Dillon has consistently posted translated wOBAs of 340s & 350s, with above avg power, while McGehee has never cracked 310.
As for the Padres 2b, I don\'t see much difference offensively between Antonelli, Gonzalez or Denker. I\'d have to break the tie with youth and defense.
I\'ve got Carp projected at 262/339/432 - walks at lg avg rate, might hit 20-25 HR - total package is well below avg for mlb 1b
I would agree that you cannot control your compulsions. However, you can control if you act on them. I have several compulsions that I won\'t describe here, but are potentially harmful to myself or others. I struggle, but I stay in control.
My number\'s not as low as Evan\'s, but lower than most
as compared to Weeks\' .341, the guy who can\'t get on base
\"the kind of thing you can expect from a 28-year-old in the high minors\"
I would think that an MLE is an MLE, regardless of the age. How many times does an over age player put up a 340/380/602 line is the high minors?
The difference between 24 and 28 in AAA is that the 24 year old is expected to get better next year, while the 28 year old is at or past his peak, and can reasonably be expected to decline.
All 24 year olds in AAA in my database (not yet complete) hit 292/362/469, while 28 year olds in AAA hit 283/352/452. Not a great difference in performance, and most of it is upward attrition bias, as the better players are promoted before reaching 28.
Without age adjustment, I\'ll project Ludwick right now at 270/333/512
I believe that is was Kevin Goldstein himself, on XM Radio, who announced the Alvarez deal at 1 am, so the deadline was not extended for even \"a couple hours\", but perhaps for something like 30 or 40 minutes.