September 16, 2011
Between The Numbers
BP and the Palmer Database
Baseball fans may occasionally (he said drolly) disagree over the relative importance of various statistics, but all baseball fans agree about the importance of keeping track of a record of how players have performed throughout history. Whether you care about a player’s batting average or his True Average, the raw numbers are a vital part of how baseball fans engage with the game.
Unfortunately, keeping those records throughout baseball history has been an arduous and incomplete task—records have been lost, mistabulated, or rendered illegible. Reconstructing an accurate history of the game through its numbers is a laborious process.
Very few people have done as much to tabulate an accurate statistical record of the entirety of professional baseball as Pete Palmer. Palmer, along with Bill James, is one of the founding fathers of sabermetrics—he introduced OPS and linear weights, among dozens of other new metrics. But he’s also done invaluable work correcting baseball’s historical record. Joe Hamrahi and Pete had some discussions about licensing the Palmer dataset, and we’re proud to announce that we’ve partnered with Palmer and Gary Gillette to bring you Palmer’s painstakingly compiled database of player statistics.
This is widely considered to be the most authoritative set of player statistics available; it’s the same stats used by publications like the ESPN Baseball Encyclopedia and available on sites like Baseball-Reference.
How much of a difference does it make? Let’s compare batting lines from Joe Tinker (of the famous trio of Tinkers, Evers, and Chance) during his 1902 season:
Palmer uncovered three games previously unremarked upon by the stats reported by our player cards—Tinker picks up a handful of extra hits and another RBI by correcting the data. More interesting is the total number of strikeouts—a number previously omitted altogether from our records, so we’re not just filling in few of the blanks here and there, we’re adding some significant new information to our evaluation of players.
And for the first time, the entirety of the statistics at our disposal are available through our sortable reports, including the custom sortables. Seasons where we lack play-by-play accounts will not have all the data available for recent years, unfortunately—we're never going to know how many doubles each and every deadball-era pitcher allowed, as much as we may want to. But now we truly have unified access to the entirety of baseball history through the cards and sortables.
We’ve also worked to improve the accuracy of our biographical data for players as well. Invaluable help in this effort came from the Society for American Baseball Research—particulary Data Czar Ted Turocy and Geoff Harcourt—who provided their master player register as well as assistance with integrating it into our master player tables. Some biographical details, chiefly names, are even tricker than player stats—figuring out whether a player went by William, Will, or Bill can be difficult at best, and some players were inconsistent, making the task even tougher. Nobody is more devoted to chronicling the history of baseball than the people at SABR, though, and having their records available to us has dramatically improved the scope and the accuracy of our biographical data for historical players.
Our analysis can only be as good as the data that underlies it, so having this data available to us is a great boost to our efforts here at Baseball Prospectus. But having good data doesn’t guarantee good analysis, and BP was founded on the belief that we could provide insight by analyzing the statistical record of baseball and connecting it to how a player provides value to a team. So this is just a building block; we’re gearing up to provide revised formulas for such mainstays as WARP, TAv, and Fair RA that will help illuminate the raw stats we’re showing here by adding context. So consider this a teaser, if you will.