February 28, 2006
The World Baseball Classic
Davenport Translations, Part One
Thursday night, at 9:30 eastern, Korea and Taiwan will face off in the first game of the World Baseball Classic. There's a good chance that it will be one of the most important games of the round, as the winner is likely to advance and the loser isn't. For those who are getting desperate for a baseball fix, it will be televised--live on ESPN's Deportes channel, and taped (at 1:30 in the morning) on ESPN2. I, for one, will be setting my TIVO to record it.
If you have been in a box all winter, the WBC (and, just for the record, sharing acronyms with organizations that govern boxing cannot be considered a good thing) is a 16-team tournament that will be played like a baseball version of the World Cup. Players will compete for their "home" country, with some latitude for determining just what their "home" is--although people familiar with soccer's World Cup, or even the Olympics, are already familiar with how national ties can be established. The Italian and Dutch teams, in particular, stand to benefit from having American-born and raised players of the appropriate ancestry to supplement their teams.
Provisional rosters for all teams were released last month; final rosters won't be needed until the teams are actually ready to play. We have good, reliable statistics for the past few years for the vast majority of the players in this tournament; the need to obtain them for this tournament has kicked me into researching some of the others. I think we are now in a position to make a reliable estimate of the relative strengths of each team.
I don't really want to get involved in discussions of how much pride, prestige, or other nationalistic ambitions of various countries are riding on the results of the tournament, except to say that, in a real baseball sense, it won't prove anything. One of BP's catch phrases has always been that small sample sizes are essentially meaningless; the appropriate phrasing for current purposes is that "anything can happen in a short series." The WBC as a whole is a short series--teams will play between three and eight games, depending on how they fare, and it is further broken into even shorter series. The first round of the tournament consists of four four-team round robins; essentially, a three-game tournament. Half the field will be eliminated, and the second round will have two more four-team round robins, a second three-game tournament. The four teams that survive that will move to a two-game single elimination tournament.
With only three games to play in the round robins, the question of who advances is frequently going to be based on tiebreakers. Before considering what kind of tiebreakers will be used, it is useful to realize this: that in a four-team round robin, there are only four ways for the standings to turn out. Yes, you can have different teams filling each slot, producing 38 permutations of the standings (64 if you count the different who-beat-whom forms of ties), but they reduce to just four basic forms.
The most orderly one involves one team going 3-0, one going 2-1, one going 1-2, and one going 0-3. This is the only one of the four that does not require a tiebreaker. The 3-0 team is the winner of the bracket (meaning that they will get home field advantage in the next round), and the 2-1 team finishes second and advances.
Then there is the 2-1, 2-1, 1-2, 1-2 possibility. This one resolves rather easily; whichever of the 2-1 teams won the head-to-head matchup advances as the winner of the bracket, and the other one advances as the second place team.
Then there are the two "circle of death" outcomes, where A beats B, B beats C, and C beats A, and they either all beat D or all lose to D. If they all lose to D, then the standings look like 3-0, 1-2, 1-2, 1-2. In this case, the 3-0 team advances as the winner. The second place team is going to be chosen by whichever team has the fewest runs allowed per inning during this round robin. The second tiebreaker is to use earned runs instead of all runs. If there is still a tie, the third tiebreaker is--I kid you not--batting average. If that doesn't do it, they draw lots.
The fourth case is the circle of death where the team outside the circle loses all their games, so the standings go 2-1, 2-1, 2-1, 0-3. This is the worst case of all, really, because now the tiebreakers not only decide which two teams advance, but also decides which one is the "winner" and which is the "runner-up."
Knowing the structure of the tournament and the provisional rosters, I was able to "play" the tournament a million times, using more or less the same program I use to play out the regular season. I built up a profile of team strength using, roughly, their ten best hitters (with allowances for position) and their ten best pitchers. I expressed the strength rating in terms of a 162-game schedule; a "100" rating means that, if this team played a 162-game balanced schedule against this field (and yes, I know you can't have a balanced 162-game schedule with 16 teams), they would win 100 games. This is the number that I use in place of the team's third-order wins for establishing the head-to-head win percentages.
So what do we have?
Pool A - Japan, Korea, Taiwan, China
The Asian teams were grouped together as a concession to geography and time--they will play their first round games almost a week before everybody else, allowing the two teams that advance a chance to fly to the States and recover from jet lag before playing any games that count. This was the most difficult group to handicap--it had (by far) the fewest major league-affiliated players of any group, as most of them played within their own national leagues. Assuming a web page containing the statistics could be found, it was in a language and character set that I don't know how to read, and that even when batting and pitching statistics could be found, I couldn't figure out what position they played.
The word is that Japan is taking the WBC very seriously, and wants to take the title home more than most of the other teams. Unfortunately for them, a few of their top players, who all happen to be playing in the US now--Hideki Matsui, Tadahito Iguchi, and Kenji Jojima--opted out of the tournament. I think the decisions by Iguchi and Jojima will be more painful, because the next-best players at second base and catcher are so far behind them. Even without them, they are still a very strong team, scoring a 108 for me, fourth-best in the tourney. Their pitching is their strength, though. When I first ran the program two weeks ago, I rated them as the most likely team in the entire tournament to advance to the second round. Since then, I've upgraded the ratings for Korea and Taiwan, and that's no longer true. They're still the favorite for the round. (See translations for all players on the Japanese provisional roster at wbc.JPN.hit and wbc.JPN.pit. And don't be put off by the mysterious file extensions--the files can either be viewed in your browser as a plain text file, or in a text editor like notepad.)
When I first ran the numbers for Korea, I rated them at about a 75. Since then, I have found the website for the Korean Baseball Organization, which has player stats, and I've learned to read the Korean alphabet, which allows me to at least match up player's names--even those of the foreign players. The WBC is the first time that I've been able to set up everything I needed to make a DT for the Korean league, and they came out a little better than I expected. As with the Japanese leagues, I found it necessary to make an allowance for home runs, above and beyond the difficulty rating. After that, their difficulty rated as comparable to American Double-A leagues (compare to Japan, which rates at Triple-A after the adjsutment). Their pitchers look better than the hitters--particularly a relief pitcher by the name of Seung-Hwan Oh, who was their Rookie of the Year. He's 22, with translated SO/BB numbers of 8.5 and 2.1, and I'd sign him up as readily as I would, say, Huston Street. I don't know, yet, if it is a flaw in the system, but I have the Korean pitchers in the US--like Chan Ho Park and Sun-Woo Kim--rated below most of their homegrown counterparts, like Oh and Min Han Son. The hitting looks a bit weak, although I like Tae-Kyun Kim and Jin-Young Lee. Overall, I have them rated at 94. (Korea's Batting DT and Pitching DT are also available.)
Taiwan was an even bigger challenge, and not as well met. They're AKA "Chinese Taipei," the name they have to use for all international sports competitions (but since I'm not officially anything, that will be the last time I refer to it as such). I have to admit that I don't have anything on Taiwan's players beyond what I see in the BA Almanac. There weren't that many players with American experience, and there weren't enough stats listed to even do a complete EqA. From the parts of the EQAs I could construct, the Taiwanese leagues look like they should be around South Atlantic League level…maybe Carolina on a good day. Given the players I could find off of their roster, and guessing that the ones I couldn't find would be somewhat worse, I wound up rating the Taiwanese team at 67. (Taiwan's DTs: Batting DT, Pitching DT.)
The Chinese team was even harder to evaluate, since there don't appear to be any players who have mixed between the Chinese league and any other league. I have two pieces of evidence I can use to evaluate the Chinese team. One is their performance at the 2005 Baseball World Cup. This was an event held in Holland last Novemeber, attended by 18 countries, with Cuba (who takes it very seriously) running away with the title. The Chinese World Cup team looks like it had a lot of the same players who are on their WBC team; they went 3-5 at that tournament, and were outscored 58-42. Comparing that to the Cuban team (whose players in the World Cup were all regulars in the Cuban Serie Nacional, which I can translate), suggests that the Chinese team would rate at about a 22 on the scale I'm using. Now if that were true, it would imply that the Chinese league, based on the stats I see in that BA Almanac, would have a difficulty level well below that of the Appalachian League. My second piece of evidence confirms that. By looking at the top batting average in a league and the lowest ERA in a league, you can use the ratio between them to get a very rough idea of the league difficulty level (although the season length also figures into the equation). The higher the ratio, the worse the league is. For most O.B. leagues, the ratio is around .13. By the time you get down to the Northwest League, we're pulling in a .21; the Appalachian gets to .26. The Dominican Summer League is up to .44. The Japanese leagues and Korea are at .14; Taiwan is slightly weaker at .20. Cuba's score of .19 is consistent with their rating as roughly equal to our short-season leagues. China's score, with a .397 league-leading hitter and a 0.62 ERA leader, is .64, which is the highest score of any league I checked, including the independent leagues, the Italian league (.41), or the Dutch (.52). There's little doubt in my mind that their home league is, in fact, that weak, and that the team as a whole deserves their 22 rating.
Simulation results (to win and finish second in first round):
Team Rating Win Second Advance Japan 108 .5046 .3148 .8195 Korea 94 .3357 .3780 .7137 Taiwan 67 .1481 .2614 .4095 China 22 .0116 .0458 .0574Taiwan's best chance to advance comes from the 3-0, 2-1, 1-2, 0-3 scenario, where Japan wins out, China loses out, and they upset Korea in their head-to-head match…which is why the first game of the tournament will be so important.
This is a good place to stop, so next time we'll take a look at the DTs of Pools B, C and D.