I got BP rolling many years ago in large part because of a forecasting system I had created called Vladimir. Vlad was basically a two-step system. The first step was categorization: What type of player is this? What is the shape of his performance? Is he a slow masher? A waterbug? A power-and-speed guy? How old is he? The second step was a neural-net system, which basically "walked" the player in question down their expected career path. I used Clay Davenport's DTs as the inputs for the system, because it helped me out in terms of removing park and league effects.
There are a lot of problems with a system like this.
First, you have to categorize different types of players. I messed around with SPSS (a statistical package I had) a lot, and assessed a lot of things like rates of singles, doubles, triples, stolen bases, missed games, walks, etc. You don't just look at those things individually, or during a single season. You track effects across different performance metrics, and across adjacent seasons. After a lot of equivocating, I basically decided that my best option was to create 26 different career paths, and use those as sort of the baseline for figuring out what a guy's development would be like.
Which, of course, leads me to the second problem. The really interesting guys, the guys whose likely development you'd like to know more about, are the guys who are atypical; the guy whose stat line is different than any you've seen. How do you forecast Rickey Henderson at age 20? Alex Rodriguez at age 19? Tony Phillips at age 35?
I've started to work–in fits and starts–on a new forecasting system. It's very difficult to do well.
One of the best features about the old Vlad system came about because of an error. When I asked it to try to forecast playing time, it kept trying to give crappy players zero or negative plate appearances, and kept trying to find a way to get players like Barry Bonds 1,200-1,500 at bats. While this might be great if you're a Giants fan, it pretty much sucks if you're trying to use your forecasting tool to predict team outcomes.
But before I get down to the hard, cranky numbers that I no longer write about very often, I need to figure out exactly what a forecasting system should try to do. I don't want to just minimize the root mean square error (RMSE) of the BA, OBP, SLG and SB/CS for the players I'm forecasting. There are a lot of good systems that can do a reasonable job of that. Eyeball the previous three years' stats yourself, check the player's age and make a guess, and you'd be surprised how well you can do.
I want to be able to identify those guys who are going to improve, decline or fail to develop more than could be expected by a reasonable person eyeballing the stats, or a simple trend algorithm based on three to five years of performance and age data.
So I've started to work on it backwards, with a qualitative bent. I took a few minutes and thought about a few guys who haven't developed as well as I expected them to, or suffered a more rapid or noteworthy decline. Let's take a look at some of these players:
Let's focus on the first few years of DeShields' career. He demonstrated he could hit for average, had a year with a fair number of doubles, showed lots of speed–albeit not used optimally–and a batting eye in the neighborhood of 80-100 walks per season. In the 1990 and 1991 seasons, that's even a pretty reasonable amount of pop, especially for someone in the majors at age 21. It looked like a truly superlative start. Then there's the failure to develop, and the broad decline across all aspects of offense.
Here's another second baseman, and that makes sense. One of the factors I'm going to have better data on this time, in terms of creating the model, is my hunch that guys who play positions where they occasionally get ground into hamburger, and get a lot of nicks and cuts, will have a rough development or decline slope.
Again, we see someone who showed a lot of promise as a young player, with demonstrated abilities to hit for average and power. Walker's strikeout rate was pretty low, and if memory serves, his platoon splits were pretty much Tony Gwynn against righties and Stevie Wonder against lefties.
Could Walker's relationship with Tom Kelly, such as it was, be a major factor in his rather restrained development? Possibly. Is there any way to model that? Not without a lot more data, but those results would be far more interesting and valuable to potential clients. Find a hitting coach that makes a 20-point difference in OPS? Pay him a fortune. Those age-27 and age-28 years have a nice Coors Field bounce.
Like DeShields and Walker, Cruz was promising as a youngster. His walk rate was very high, up among the league leaders during his first three years in the league, and then you find him on The Jim Rome Show during the 2001 season making jokes about how he likes being a hacker. While his overall level of performance is pretty flat, Cruz has demonstrated almost all the offensive skills except hitting for average. He's even a very efficient basestealer. The performance curve, overall, shows little development.
Stewart's stagnation may be the most frustrating of all. He has significant defensive limitations because of his shoulder, which means he can really only play left field or DH–so he has to hit. His minor-league numbers indicated great plate discipline, both in terms of a high walk rate and a low strikeout rate, including 89 walks in just under 500 at-bats at Double-A Knoxville in 1995. If you have Stewart after his 1998 season, you have to be pretty happy. He looked like Rickey Henderson lite: does everything Rickey did, but not quite as well. That's an awfully good ballplayer. So what happened?
This isn't meant to be definitive. One of the worst things about statistics-based performance evaluation and forecasting is that you're at the mercy of the information. Think about all of the things that are outside the scope of the data collected on player performance in the majors and minors: injury information, coaching information, travel information; the pitchers a guy faced. Just as the run support for pitchers doesn't even out over the course of one season, skewing win totals, neither do these factors, and they're not trivial.
Yes, all this data exists, but it's fairly problematic to get the data, put it all together, then go through the necessary rigor to develop the forecasting models. Finally, if you do put all the data together, you start running into problems in terms of degrees of freedom. In short, if you're trying to slice the data into 900 different categories, you need to have a fair amount of data in each category in order to have any confidence in your model. (Then again, there is a fair amount of artificial turf involved with these guys, isn't there? Hmmmm…something to think about, anyway.)
With these limitations in mind, what's the best approach? Good question. One reason I'm writing this column is because I'd like to receive some wish lists from people. Some people want a range of forecasts, kind of a probability density function, like "Frank Thomas is 75% likely to hit between .284/.362/.448 and .320/.402/.513." That's probably a good approach, perhaps with information on whether or not a player is an outlier in terms of his likely breakout or collapse.
Baseball is exciting and gets into your blood in large part because it's a combination of the highly predictable and highly unpredictable. Over 162 games, you get a much lower chance of a fluke team clawing its way to the top on the back of a few odd bounces or bad calls; there is some semblance of meritocracy here. But you still get to see things like Freddie Patek McGwiring the snot out of the ball, Doug Dascenzo coming in to shut down the opposition for an inning, Jose Canseco turning F9 into HR, or even Brady Anderson hitting 50 bombs.
In many ways, forecasting is my personal windmill. I know it's impossible to do it exceptionally well, but I also believe that putting some parameters around expected performance is crucial to the successful management and operation of a major-league club. We do forecasting implicitly when we comment on whether we like a trade or a free-agent signing; we're making assumptions about the likely future performance of those involved in the deal. The nature of the game–and the universe–makes forecasting a task that is both somewhat unattainable, and even more irresistible because of that. When you learn the answer to a question, you end up with two questions that follow the answer, so you're compelled to keep following the string.