Lies, Damned Lies: Defining a Market, Part One

Are you ready for some geography?

One of my favorite sabermetric pieces is Mike Jones’ study on market sizes. By making a couple of common-sense adjustments to standard assessments of market size based on city-level or metropolitan-level population, Mike was able to go a long way toward creating truer and more intuitive assessments of the relative market sizes of different baseball clubs. I liked his data so much that I used it as the basis for my attendance study in Baseball Between the Numbers.

This past weekend, however, I had a bright idea. Actually, it wasn’t such a bright idea, because I wound up spending the better part of three days on it. Forget about how we define metropolitan areas and how we dole out secondary markets–let’s try and account for the MLB affiliation of every person in the country!

Clearly, definitions of city boundaries are arbitrary. Both Northern California and Southern California are densely populated, but San Francisco occupies 47 square miles, while Los Angeles occupies 470 square miles. Definitions of metropolitan boundaries are arbitrary to a certain extent too. The areas may cross state boundaries, or run up against other cities. In some cases, a metropolitan statistical area (MSA) may be drawn too tightly from a baseball team’s perception, excluding people in the exurbs who could reasonably attend a baseball game. In other cases, they might be drawn too broadly, including people that might have some lukewarm affiliation with the central city, but are probably too far to attend major-league baseball games on a regular basis.

Breaking things down to the county or city level would resolve this problem. Indeed, it turns out that there is quite a wealth of population data available for free at the Census Bureau home page. My base unit of analysis was the 2006 population estimates on a county-by-county basis for the 48 contiguous states, plus Alaska, Hawaii, Puerto Rico and the District of Columbia. In some cases, where a county had a population of two million or more, I drilled down further to the city or census-tract level. In addition, I was able to track down population data at the metropolitan level for Canada. (Although this excludes rural Canada, Canada is highly urbanized, and if we wind up excluding a few farmers in Northern Saskatchewan, I think I can live with that).

Building this database provides several advantages, above and beyond the benefits of not having to rely on someone else’s definition of what constitutes a city or MSA. For one thing, we can very naturally account for a team’s potential secondary markets. For another, the area in between cities has a different character in different parts of the country. It’s often fairly dense along the Eastern Seaboard, and reasonably dense in the Midwest and the South, but generally completely barren in the Mountain West.

The other interesting piece of data available at the Census Bureau–something that required some digging to find–is their estimate for the latitude or longitude of the geographic center of each county. By using something called the Haversine formula, we are able to estimate exactly how far each county is from each major league ballpark.

So far, so good. Everything sounds very precise and very scientific. You can see that what I’m going to do is to build some sort of sliding scale for market size based on the distance between a given person and a given ballpark. But in order to get the model to really sing, we have to account for a couple of other wrinkles that straddle the boundary between the subjective and objective.

For one thing, we need some notion of a team’s sphere of influence. If you viewed things strictly in terms of geography, you’d find that the Cubs and the White Sox have nearly identical market sizes, when in fact the Cubs have quite a bit more influence, particularly once you get outside of Chicago proper and into suburbs and cornfields. A team’s sphere of influence can penetrate outward much farther if it has a strong brand.

There is no one perfect way to define the strength of a team’s brand, so what I did instead was to combine six or seven imperfect metrics in the hopes of coming up with a tasty sausage. In particular, the measures that I looked at were as follows:

The ranking of each team in the ESPN Ultimate Standings in two categories that are closely associated with brand: perception of ownership and fan relations. Data was averaged over the past five years of the ESPN survey.
Baseball avidity in each area, as measured by a 2002 Scarborough Research study.
The number of "hits" for each team using Google Blogsearch. Alternate team names ("Oakland A’s," "Oakland Athletics") were accounted for.
The number of regular-season wins for each team since 1901, provided continuous tenure in its current market. The Giants, for example, start counting upward from when they arrived in San Francisco, and do not get credit for what they did in New York.
The amount of postseason success for each team, again provided continuous tenure in its market. One point was given for each playoff appearance, and a two-point bonus for each World Series championship.
The value of the brand intangible for each club, as estimated by Forbes.
The Forbes data again, this time transformed to a logarithmic scale.

As you can see, I’m trying to house a lot of different definitions of brand under one roof. A team’s "likability" plays a role, as reflected in the ESPN survey, but so too does its history of success, the amount of buzz that it generates in media circles, and so forth. The Forbes data is intentionally given double weight because it’s probably the most reliable data among our metrics in this exercise.

Each team was assigned a rating in each category, ranging from 50 for the lowest team to 100 for the highest team, with the rest of the data linearly extrapolated from there. The rating across each of the seven categories was then averaged to produce the final result.

Influence Ratings


Team                     Absolute Score      Relative Score
 1. New York Yankees         94.8                 1.44
 2. St. Louis Cardinals      82.6                 1.26
 3. Boston Red Sox           80.1                 1.22
 4. Chicago Cubs             72.8                 1.11
 5. New York Mets            72.2                 1.10
 6. Cleveland Indians        71.7                 1.09
 7. Atlanta Braves           70.3                 1.07
 8. Chicago White Sox        68.2                 1.04
 9. Houston Astros           67.9                 1.03
10. Detroit Tigers           67.1                 1.02
11. Los Angeles Angels       67.0                 1.02
12. San Francisco Giants     66.6                 1.01
13. Los Angeles Dodgers      66.3                 1.01
14. Philadelphia Phillies    66.2                 1.01
15. Seattle Mariners         65.1                 0.99
16. San Diego Padres         64.3                 0.98
17. Cincinnati Reds          63.7                 0.97
18. Baltimore Orioles        63.5                 0.97
19. Pittsburgh Pirates       62.0                 0.94
20. Texas Rangers            61.8                 0.94
21. Oakland A's              61.5                 0.94
22. Arizona Diamondbacks     61.4                 0.93
23. Minnesota Twins          59.4                 0.90
24. Toronto Blue Jays        59.2                 0.90
25. Washington Nationals     58.7                 0.89
26. Milwaukee Brewers        57.7                 0.88
27. Florida Marlins          56.4                 0.86
28. Colorado Rockies         56.3                 0.86
29. Kansas City Royals       55.3                 0.84
30. Tampa Bay Devil Rays     52.2                 0.79

You’ll see two sets of ratings reflected in the chart. The first is the “raw” rating on a 50-100 scale, while the second is the score relative to the league average, which is the number that we’re going to use to tweak our market-size estimates. We could probably devote a column or two to the accuracy or lack thereof of these brand ratings–I think they seem pretty darn good–but we have a lot of other things to look at, so let’s move forward.

It became clear to me in thinking about market size that there are two ways to define a team’s market. On the one hand, you have a team’s market for attendance, which is going to involve a smaller geographic radius, since people need to be able to commute to the ballpark to attend a baseball game. On the other hand, you have a team’s media market, which is less subject to geographic constraints, but tends to be more of a winner-take-all affair. In general, the media market will be larger than the attendance market, but the larger is not necessarily a subset of the former. A fan in northeast Pennsylvania is probably going to get the Phillies but not the Mets on TV, even though he could commute to New York about as easily as he could commute to Philadelphia.

We’ll concentrate on the attendance side of the coin first. My process for determining each team’s potential attendance market was as follows:

The distance in miles between each county and each major league ballpark was determined using the Haversine formula. Before you ask, I was able to identify the exact geographic coordinates of each major league stadium.
This raw distance was adjusted for out-of-state commuters. When I was running some gut-checks of the model, I found that many of the counter-intuitive results involved travel across state lines. The Indians were getting too much credit for southern Michigan, for example. Therefore, each team was assigned to its home state(s); the Royals were given both Kansas and Missouri, and the Nationals were given both Virginia and Maryland in addition to the District. The Blue Jays were assigned to the "state" of Canada. A 10 percent penalty was applied to an out-of-state commuter in a state without a home team; for example, a fan in South Carolina is assigned a 10 percent mileage penalty with respect to his distance to Turner Field. If the commuter comes from a state that does have a home team, a much harsher 50 percent mileage penalty is applied. For example, a fan in Western Massachusetts has a 50 percent penalty assigned to all teams but the Red Sox. I provided for a grace period of 10 miles before any penalties were applied, so that immediate border cities (such as Covington, Kentucky for the Reds) were not affected.
The raw distance was further adjusted based on a team’s influence, by dividing the mileage by a team’s relative influence rating. What this does, effectively, is to expand a team’s geographic radius if it has a stronger brand. For example, the Red Sox get to draw attendance from a radius of 252 miles rather than the standard 200, while the Devil Rays are confined to 158 miles.
A team’s 'Claim Percentage' for a given county is assigned based on the following formula (my apologies if this is starting to sound like Win Shares):

Claim Percentage = ((200 – Adjusted Distance) / 200) ^ 2.41

The "200" number you see in the formula corresponds to a maximum radius of 200 miles from which a team might draw attendance. The 2.41 exponent was chosen because it means that a fan 50 miles away from the ballpark is worth about half as much as fan right next door to the ballpark. Both of these constants are arbitrary, since I am not aware of any empirical research that relates distance from the ballpark to the likelihood of attendance at a baseball game. However, I believe my choices produce results that are fairly intuitive, as reflected in this chart:
```
		Adjusted Distance   Claim Percentage
      0                  100.0%
      1                   98.8%
      5                   94.1%
     10                   88.4%
     25                   72.5%
     50                   50.0%
     75                   32.2%
    100                   18.8%
    150                    3.5%
    200                    0.0%
```
The Claim Percentage is multiplied by the county’s population to produce a raw attendance estimate.
The raw attendance estimate is adjusted for dominance. Typically, baseball allegiance in any given area involves a tipping point of one kind or another; the more popular team or teams tend to crowd out all others, since fans of a secondary team will find that they can’t find their team’s games on TV, will have nobody to talk about the team with at the water cooler, and so forth. The mathematics of the dominance adjustment are a bit convoluted, but the basic idea is to reassign fans from one team to another by squaring the raw attendance estimates. So a team with a natural 3:2 advantage based on geography alone instead winds up at a 9:4 advantage.
Finally, we check to see whether the raw attendance estimates between all teams in any given county add up to more than 150 percent of that county’s population. If so, the estimates are prorated downward to the 150 percent cap. Effectively, this means that in a market with two identical clubs, each team is assigned a maximum of 75 percent of its potential fan base. Once again, the selection of this constant is somewhat arbitrary.

Yep, this really is like Win Shares, what with its combination of superfluous precision and extreme subjectivity. Nevertheless, I think the model produces some fairly reliable results. For example, in Lake County in extreme Northeast Illinois, we come up with the following estimates:

Lake County, Illinois (Population 713,076)

Cubs                        500,357         70.2%
White Sox                   384,211         53.9%
Brewers                      90,947         12.8%
Total                       975,515        136.8%

I’d guess that those numbers are just about right. The Cubs have roughly a 7:5 edge over the White Sox, while the Brewers are penalized for being out-of-state, even though Milwaukee isn’t much farther from Lake County than Chicago is. For a somewhat more dramatic example, here is how the Northeast gets divvied up between the Red Sox and the Yankees:

The map is an oversimplification, since it does not account for the Mets, Phillies, Orioles, and so forth, but all of that is accounted for by the model. It’s fairly obvious what’s going on there, so I’ll let the pretty picture speak for itself.

We’ll break everything down on a team-by-team basis Friday, but first let me briefly describe my alternate method for calculating a team’s TV audience. There are several adjustments from the attendance version of the model, most of which are designed to reflect the winner-take-all nature of media coverage:

The radius around each ballpark is expanded from a maximum of 200 miles to a maximum of 400 miles.
The out-of-state penalty is increased to 100 percent for out-of-state markets with a natural home team; it remains at 10 percent for states with no home team.
Teams are only given credit for their TV audience if they either have the highest Claim Percentage in the county, or have a Claim Percentage of at least 50 percent. That means that in most markets, there is only one TV team assigned, unless there are two or more that “obviously” deserve credit.
The mot popular team in the market is given a bonus, which is determined by taking the square root of its Claim Percentage. For example, a team with a natural 20 percent Claim Percentage sees this percentage boosted to 44 percent, provided that it is the most influential team in the market.
In addition, the most influential team in each market is guaranteed a minimum 10 percent share of the TV audience, even if it exceeds the 400-mile radius. This mostly applies to extreme rural areas; for example, the Mariners get assigned 10 percent of Alaska’s population.

Here, then, is what the ping-pong balls say: the attendance and TV markets for each major-league club.


Team     Attendance     Rank     Rel      TV/Media     Rank      Rel
NYA     17,851,140         1     304    21,933,814        1      247
NYN     14,283,315         2     244    15,510,522        3      174
LAN     11,869,232         3     202    13,908,965        4      156
LAA     11,149,730         4     190    13,775,861        5      155
PHI      7,669,007         5     131    11,266,405        6      127
CHN      7,558,066         6     129    10,296,326        8      116
CHA      7,387,544         7     126     8,184,670       16       92
BOS      6,788,847         8     116    10,138,743        9      114
TOR      6,579,560         9     112     9,252,530       12      104
OAK      6,105,012        10     104     7,727,977       18       87
SFN      5,903,008        11     101     9,678,663       11      109
WAS      5,894,698        12     101    10,317,452        7      116
ATL      5,459,976        13      93    15,623,999        2      176
BAL      5,267,088        14      90     6,996,070       21       79
DET      5,206,887        15      89     8,288,697       15       93
HOU      4,990,053        16      85     9,757,806       10      110
TEX      4,922,605        17      84     9,065,761       14      102
FLO      4,226,982        18      72     5,737,102       23       65
CLE      3,854,535        19      66     5,760,772       22       65
ARI      3,730,833        20      64     5,371,744       24       60
CIN      3,681,420        21      63     9,067,568       13      102
SEA      3,470,303        22      59     8,145,144       17       92
SDN      3,433,886        23      59     4,713,421       26       53
SLN      3,003,085        24      51     7,502,913       19       84
MIN      3,001,789        25      51     5,239,711       25       59
TBA      2,999,411        26      51     7,329,379       20       82
COL      2,779,034        27      47     4,532,396       27       51
PIT      2,486,991        28      42     3,593,959       30       40
MIL      2,431,916        29      41     3,868,097       29       44
KCA      1,917,808        30      33     4,164,140       28       47

       175,903,763                     266,750,607

I hope I haven’t lost you guys, because now the (comparatively) fun part is up next: our team-by-team breakdown. In addition to the attendance and TV estimates from my model, I have provided a comparison to Mike Jones' figures, and the raw census data from each team’s primary MSA. All of this runs in a Friday edition of LDL.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Lies, Damned Lies: Defining a Market, Part One

Thank you for reading

Latest Articles

Fantasy Starting Pitching Planner ’24: Week Four $

Next Man Up: Week Four $

Something’s Off $

MLU: ‘Tugboat’ Wilkinson is Cruising $

TA94: April $

Nate Silver

Latest Articles

Fantasy Starting Pitching Planner ’24: Week Four $

Next Man Up: Week Four $

Something’s Off $