Thanks for the link. It looks like while Colin's graph shows fastball velocity relative to the average of all pitchers, Max's scales each pitch relative to the year average of the pitcher who threw the pitch. That does appear to take a step in the direction of a full-blown hierarchical model. Still you're left without being able to talk about group level parameters. Also Max's study broke down pitchers into three categories. It's at least clear that Strasburg did not have a group 3 pitcher type of year (a pitcher that gains velocity as the year goes on). But from Max's study, that's the minority pitcher type. It's just not clear to me that Strasburg's velocity profile was cause for concern. But than again as Colin mentions, we may know a lot less than the Nationals do.

Can we really conclude that most pitchers gain velocity? Unless I've misunderstood how the line in the second graph was fit, it just shows that it is more likely that you'll observe a faster fastball later in the year, not that any given pitcher gains velocity on their fastball as the year progresses. To answer that question I think you'd need a model in which a line is fit to each pitcher(like the first graph) and then investigate the group-level parameters (intercept, slope) using a hierarchical model. You can fit just such a model using Jags (In R if you like). John Kruschke hosts an R script called "SimpleLinearRegressionRepeatedJags" which will spit out a lot of nice graphs of the credible marginal posterior distributions for the group level and subject level parameters.
http://doingbayesiandataanalysis.blogspot.com/2012/01/complete-steps-for-installing-software.html
If you'd rather use Stan, a recently released free implementation of hamiltonian monte carlo (Jags uses Gibbs sampling instead), here's a way to do it:
http://www.sanjogmisra.com/blog/
http://mc-stan.org/
-@iantist

I wonder if "all models are false," is really just another way of saying, "all models are models." A model is inherently an abstraction and necessarily a simplification. Thus labels of "true" or "false," "right" or "wrong," don't make sense. As you nicely put it, models can only be more or less useful for studying a given feature.

SPSS is fine of course but for the budget concious, this can all be done in R (which is free!). Here's a nice intro from Jim Albert who cowrote "Curve Ball: Baseball, Statistics, and the Role of Chance in the Game":
http://www.amstat.org/publications/jse/v18n3/albert.pdf

Very nice article. I think this article may help answer a related question: "How quickly should we forget the past?"
If you'll bear with me, I like to think of player evaluation from a Bayesian standpoint. We could model past OBP performance as a beta distribution where alpha="the number of times a player reaches base" and beta="the number of times they do not reach base" during the same number of plate appearances. If the player reached base 30 times out of the past 90 plate appearances we'd have a beta distribution with alpha=30 and beta=60 and our estimate of OBP would be 1/3.
Take a look: http://tinyurl.com/chcktx7
Let's say that same player reaches base 8 out of the next 10 times(the likelihood), our new estimate of the player's OBP (our posterior) would be a new beta distribution(alpha=38,beta=62), theta(or OBP) = .38.
The hard part here is knowing when to forget old performance i.e. how strong should the prior be? Do you think your estimation of proper sample size informs that question?

Nice article, a lot to think about. In your first table listing the average speed of fastballs in different situations relative to a bases empty 0 out situation, should we assume that these mean values have resonable standard errors?

Comment rating:

0