Thanks largely to the smart and forward-thinking people in its Advanced Media wing, Major League Baseball has moved toward providing more and more information to its fans. One exception is in the department of my column today: All-Star voting. It used to be that baseball provided a rundown of every player’s vote total, including write-ins that received a material number of votes, and all the way down to the player ranked at the bottom of his position pool. But now all we get is the top five at each position. How many Nationals loyalists voted for Nick Johnson–even though he’s yet to play a game this season? Is Trot Nixon or Emil Brown the lowest ranked outfielder in the American League? How close is Curtis Granderson to cracking the top 15 (and how in the hell did Craig Monroe get in there?) Perhaps this is done for political reasons; nobody wants to hurt Nick Punto‘s feelings. But inquiring minds want to know these things.

In fact, not only should MLBAM be providing vote totals for every player listed on the ballot, but it should be breaking those vote totals down in as many ways as possible. Who has received the most votes in the past week? How much of Prince Fielder‘s support comes from Wisconsin? Are there significant differences between Internet and ballpark ballots? Who is winning the foreign vote?

This sort of thing would increase enthusiasm and participation for the balloting process, especially if tied in with community-oriented elements. Create a Facebook page for J.J. Hardy‘s candidacy where fans can write in with their arguments on his behalf. Send text messages to fans when their favorite player rises or falls in the standings. The All-Star Game is the one time each season when baseball just bucks up and has some fun, and yet the balloting counts are treated like some sort of trade secret.

The reason I’m complaining about this is because I’ve long wanted to write a column on team-by-team biases in All-Star balloting. How much does it help to be on the Yankees, or hurt to be on the Pirates? Unlike every other year, I’ve remembered that idea two weeks before the All-Star Game is played, rather than two weeks after. This analysis is made more difficult, of course, by the lack of comprehensive voting results. Nevertheless, we will press forward.

The idea is to create a simple model of the voting results based a player’s performance. For example, the model might conclude that David Wright should have about 1.0 million votes based solely on his performance. If instead David Wright has 1.4 million votes, that tells us something about the popularity of David Wright, or of the team that he plays for.

More specifically, the inputs for the model are a player’s VORP thus far in 2007 and his VORP in 2006. There are certainly other variables that could be considered–do fans like batting average guys more than home run guys?–but since this column is meant to be on the whimsical side, we’ll just keep things simple.

In addition, we need a player’s vote total. We have this information for the top five qualifiers at each position, but for everyone else, we have to guess. It looks like there have been about six million ballots cast thus far–the cumulative totals for the top five players at each position generally sum up to between four million and five million, and you have to make some allowance for the downballot guys. Thus, we start with a baseline of six million votes, subtract out the voting totals for the top five, and divide the remaining votes evenly among the rest of the nominees.

By using 2007 and 2006 VORP alone (technically, the square of 2007 and 2006 VORP, which tracks the results more closely than any sort of linear estimate), it turns out that the model is able to explain about 50 percent of the variance in the vote counts. That’s actually not a bad total, considering how crude both the model and the underlying data set is. We gain a few more points of predictive power if we tweak the results to ensure that the proper number of votes are allocated at each position. Without this adjustment, there are too few predicted votes for National League catchers, for example, since there aren’t too many high VORPs at that position this year, but there are too many at first base in the American League, since the league has cherry-picked some of the stronger DHes and listed them on the ballot as first basemen.

One interesting result is that 2007 performance is much more predictive of a player’s position on the ballot than 2006. Specifically, each point of VORP accumulated in 2007 is worth about four times as much as a point of VORP accumulated in 2006. This is a slightly misleading result, since we’re comparing partial-season and full-season numbers. Nevertheless, it’s safe to conclude that fans are treating the All-Star game as a reward for having a hot six, eight or ten weeks to start the season, rather than necessarily going with the guys they’d pick if their lives depended on winning a ballgame tomorrow.

This is a somewhat ironic result, because for years and years the argument was that the fans were too slow to recognize changes in performance, and kept electing the same veterans year after year. In fact, the process has almost completely reversed itself, to the point where the fans barely consider a player’s pedigree prior to 2007 at all. No doubt this has a lot to do with the introduction of Internet balloting, which puts the current year’s statistics just a mouse click away, and perhaps more specifically the influence of fantasy baseball, since no crowd of fans is more firmly in the what-have-you-done-for-me-lately camp than avid fantasy gamers.

Ultimately, this process speaks pretty well to the influence of the information revolution in baseball, although I side with Joe Sheehan in not liking the result: I tend to prefer picking my All-Stars based on who I think the best players are at any given time, rather than who has been the hottest over a three-month stretch.

As alluded to earlier, the “trick” is to figure out the differences between a player’s predicted vote total and his actual vote total, and to see how those differences play out by team. The model predicts, for example, that Alfonso Soriano should have about 0.8 million votes; he actually has 1.3 million, so we take the difference of 500,000 votes and place it in the Cubs‘ column. We then repeat this process for each of the eight players a team has listed on the ballot to come up with an estimate of the residual number of votes that a player picks up based on playing for a particular team; those results follow below.

Mets          435,030
Red Sox       423,489
Brewers       395,426
Yankees       386,261
Dodgers       379,235
Tigers        338,581
Cardinals     117,479
Twins          98,612
Cubs           28,921
Astros          (100)
Giants       (20,353)
Reds         (27,129)
Braves       (54,679)
A's          (70,231)
Angels       (76,663)
Devil Rays   (77,627)
Mariners     (90,137)
Royals      (104,549)
White Sox   (105,308)
Nationals   (128,867)
Rangers     (135,562)
Pirates     (137,710)
Padres      (149,781)
D'Backs     (171,242)
Orioles     (179,958)
Blue Jays   (194,114)
Phillies    (205,883)
Indians     (211,400)
Rockies     (224,608)
Marlins     (237,113)

The way to read this is that a typical player would pick up about 435,000 votes simply by virtue of playing for the Mets, or lose about 150,000 votes by playing for the Padres. The teams toward the top of the list are pretty much those teams that you’d expect to see. It was obvious that the Mets were going to do well when I saw Jose Valentin‘s name in the top five at his position.

For the most part, those teams that get the biggest boost in the All-Star balloting are those that are doing the best at the box office; the correlation between the All-Star residuals and per-game attendance thus far in 2007 is .64. I do not know how much of this has to do with “ballot stuffing”–the relationship between attendance and All-Star voting is no stronger if you account for the number of home dates thus far in 2007–as opposed to attendance serving as a good proxy for a team’s popularity overall.

There are also different degrees of “homerism” between different sets of fans. Fans in the northeast are very loyal to their clubs, with the notable exception of Philadelphia, a contrarian city where fans will find any excuse to rag on their own players. Fans in the upper Midwest are the next most loyal, especially in Milwaukee and Detroit, where the ballclubs are generating a ton of buzz right now. West Coasters are a lot more equivocal in their voting patterns.

We can subtract out the residual factor for each club to come up with “context-neutral” balloting results. I would not advocate doing this for picking the actual All-Star clubs; it is a popularity contest, after all, and the last thing I’d want to do is punish the Brewers because their fans are excited about them right now. Nevertheless, here is how the top vote-getters at each position would change:

Player          TEAM     LG        POS       Actual      Adjusted

Ivan Rodriguez  DET      AL        C         1363K       1024K
Joe Mauer       MIN      AL        C         952K        853K
Victor Martinez CLE      AL        C         567K        778K

David Ortiz     BOS      AL        1B        1810K       1387K
Justin Morneau  MIN      AL        1B        1063K       964K
Travis Hafner   CLE      AL        1B        503K        714K

Placido Polanco DET      AL        2B        1270K       931K
Robinson Cano   NYA      AL        2B        966K        580K
B.J. Upton      TBA      AL        2B        490K        568K

Alex Rodriguez  NYA      AL        3B        2543K       2157K
Mike Lowell     BOS      AL        3B        892K        469K
Adrian Beltre   SEA      AL        3B        321K        411K

Derek Jeter     NYA      AL        SS        2127K       1741K
Miguel Tejada   BAL      AL        SS        624K        804K
Orlando Cabrera ANA      AL        SS        512K        589K

Vlad Guerrero   ANA      AL        OF        2044K       2121K
Ichiro Suzuki   SEA      AL        OF        1410K       1500K
Magglio Ordonez DET      AL        OF        1446K       1107K
Grady Sizemore  CLE      AL        OF        803K        1014K
Torii Hunter    MIN      AL        OF        1085K       986K
Manny Ramirez   BOS      AL        OF        1387K       964K
Sammy Sosa      TEX      AL        OF        515K        651K
Gary Sheffield  DET      AL        OF        958K        619K

Albert Pujols   SLN      NL        1B        1198K       1081K
Prince Fielder  MIL      NL        1B        1454K       1059K
Nomar Garciap.  LAN      NL        1B        1011K       632K

Russell Martin  LAN      NL        C        1291K        912K
Brian McCann    ATL      NL        C        716K         771K
Bengie Molina   SFN      NL        C        688K         708K

Chase Utley     PHI      NL        2B        1289K       1495K
Craig Biggio    HOU      NL        2B        747K        747K
Jeff Kent       LAN      NL        2B        862K        483K

Miguel Cabrera  FLO      NL        3B        1142K       1379K
David Wright    NYN      NL        3B        1425K       990K
Chipper Jones   ATL      NL        3B        773K        828K

Jose Reyes      NYN      NL        SS        1365K       930K
Jimmy Rollins   PHI      NL        SS        595K        801K
J.J. Hardy      MIL      NL        SS        1152K       756K

Ken Griffey Jr. CIN      NL        OF        1641K       1668K
Alfonso Soriano CHN      NL        OF        1333K       1304K
Carlos Beltran  NYN      NL        OF        1698K       1263K
Barry Bonds     SFN      NL        OF        1213K       1233K
Matt Holliday   COL      NL        OF        866K        1091K
Andruw Jones    ATL      NL        OF        916K        971K
Carlos Lee      HOU      NL        OF        809K        809K
Jim Edmonds     SLN      NL        OF        560K        443K

There are only two positions at which the projected starter would change if everyone played for the American Neutrals, and those are the two infield corners in the National League, where Albert Pujols pulls just back ahead of Prince Fielder, and Miguel Cabrera way ahead of David Wright. If picking Wright over Cabrera is the most we have to criticize in this year’s All-Star balloting, then the fans have come an awfully long way.