keyboard_arrow_uptop

People love to talk about the mood of a franchise, or the collective feeling of its fanbase. Are they dispirited, optimistic? Ecstatic following a World Series win, or broken after an agonizing walkoff loss? For the most part, we leave it to the beat writers to gauge mood (which is not necessarily a bad thing), without any kind of backing for their proclamations (which might be a bad thing).

Hypothetically, fans are a reservoir of great wisdom (collectively, although perhaps not individually). So tapping into the mood of a fanbase could be more than interesting, it could be useful. But, beyond inquiring with potentially biased observers, there was little we could do to objectively or quantitatively measure a fanbase’s mood.

In this article, I’m going to present one way to gauge the happiness of a fanbase, using a text analysis of the website Reddit. Reddit is an aggregation engine, to which individual users can submit links to other websites or original content, which is then upvoted, downvoted, and commented upon. Importantly, Reddit self-organizes into communities of like-minded individuals, one category of which is fans of a sports team. As a result, there is one team-specific subreddit (community) for each MLB teams’ fans, along with a huge body of text from that teams’ fans.

I used a freely-available program[1] to harvest Reddit comments and posts en masse, over a month-long time period (roughly Jan. 5-Feb. 5). The program spits out a list of words, along with the number of times each word occurs. So, for example, the Yankees subreddit uses the word “money” 25 times in the past month. The small-market Rays, on the other hand, used the same word merely five times.

To figure out how happy each team’s fanbase is, I did what’s called ‘sentiment analysis’ on each list of words. The idea is like this: Some words tend to be used in positive situations, and indicate that the writer is happier, while others are more negative in connotation, and suggestive of despair. For example, ‘excellence’ is a very positive word, and ‘deception’ an unpleasant one. If a team’s comments are filled with words like excellence, and bereft of words like deception, they are probably happy, and vice versa.

To do the sentiment analysis, I used a list of words (called AFINN-111[2]) which had been manually assigned levels of positivity from -5 to 5. To give you an idea of how it works, the word ‘excellence’ is rated a +3 on this list, while ‘deception’ is rated -3. Then I matched up words from the Reddit analysis with the sentiment list and multiplied by the number of times each word was used in each subreddit. The higher the total score, which I called the total affect rating, the more happy the fanbase[3].

Here’s what I found, for all 30 teams, sorted by total affect rating, our proxy for fanbase happiness.

Name

Total Affect Rating

Projected Wins

Last Year's Wins

Affect Ratio

San Francisco Giants

12082

84

88

1.983636

New York Mets

8087

81

79

1.823188

St. Louis Cardinals

7185

89

90

1.868383

Atlanta Braves

6967

75

79

2.008833

Los Angeles Dodgers

5214

97

94

1.574419

Toronto Blue Jays

4263

83

83

1.88444

Seattle Mariners

4172

87

87

2.596021

Chicago Cubs

4096

81

73

2.007131

Boston Red Sox

3914

88

71

2.100056

Washington Nationals

3706

91

96

1.988267

Oakland Athletics

2816

85

88

2.222753

Baltimore Orioles

2623

78

96

1.852454

Chicago White Sox

2214

79

73

2.410191

Detroit Tigers

2163

83

90

1.916525

Milwaukee Brewers

1984

80

82

2.242329

Texas Rangers

1849

79

67

1.912185

Cincinnati Reds

1618

79

76

2.091032

Pittsburgh Pirates

1574

81

88

2.569292

San Diego Padres

1540

85

77

2.295206

Philadelphia Phillies

1475

70

73

1.866627

Houston Astros

1289

77

70

2.141718

Miami Marlins

1184

80

77

2.624143

Kansas City Royals

1000

71

89

1.996016

Minnesota Twins

791

70

70

1.873068

Cleveland Indians

771

80

85

2.164653

New York Yankees

684

80

84

2.055556

Arizona D-backs

624

73

64

1.794904

Colorado Rockies

504

71

66

1.760181

Los Angeles Angels

433

91

98

1.80334

Tampa Bay Rays

320

86

77

2.5311

It’s Always Sunny in {Insert City Here}
First of all, let’s get this out of the way: Fanbases are all, without exception, pretty optimistic compared to other subreddits. On average, every fanbase maintains a substantially positive total affect. This finding makes a lot of sense, when you take into account the powerful selection bias involved in contributing to a team-specific subreddit—you probably aren’t going to do it unless you have some positive feelings (or at least hope) for the team of interest.

But perhaps these fanbases aren’t any happier than the rest of the internet. To check that, I looked at a few other subreddits, and calculated their levels of positive affect. For example, I scrutinized a collection of texts from city-based subreddits (for example, /r/Chicago, /r/Miami, etc.). No city I looked at had higher than the lowest affect ratio for a team-specific subreddit. All in all, this makes a lot of sense: baseball is an optional hobby, so if someone doesn’t like participating in it, they probably won’t.

The Causes of Fan Happiness
Next, I was curious about what factors correlate with the happiness of the redditors. The first and most obvious factor that might influence the happiness of a fanbase is its past performance. The Tigers, for example, are perennial contenders and finished last year with 90 wins. They’ve been to a World Series recently, and are known as a great organization. How much does that contribute to their mood? As a rough proxy for past success, I used last year’s number of wins.

Previous year wins contribute surprisingly little to total happiness, is what I would say. The correlation is there (r=.3[4]), but not quite significant.

Another possibility is that the fanbase is less concerned about the past performance, and more with the future. It’s possible that fans are already over the results of last season, and have moved on in their mood to thinking about next season. We can check this by going to PECOTA, which objectively projects the performance of every team for the next year. PECOTA stands in here for the conventional wisdom, reflecting what we think we know about next year’s likely performance.

Here, there is a slightly more substantial (r=.39) and also significant (p=.032) relationship. So it seems, on the surface at least, that Reddit fanbases are much more concerned with the future than they are dwelling on their past success.

Individually, past performance and future projections contribute relatively little to explaining a fanbase’s mood. But perhaps together, there are some synergistic effects that can explain more of the variation. I put both predictors into a combined regression, and checked to see how well I could predict the resulting affect ratio.

Surprisingly, when combining the variables together[5], a very substantial improvement is possible. Using the complete model[6], I can predict the total affect rating astoundingly well (r=.7). So maybe fan happiness is, in aggregate and to a first approximation, a simple function of past success and future expectations.

Irrational Exuberance
Doing the predictions in this way allows us to also look at fanbases that are irrationally happy or sad. Here are the top five fanbases that are happier than their performances suggest that they should be:

Name

Total Affect Rating

Predicted Affect Rating

Difference

San Francisco Giants

12082

8008

4074

Seattle Mariners

4172

3338

834

Atlanta Braves

6967

6522

445

Chicago White Sox

2214

1846

368

New York Mets

8087

7814

273

There’s no surprise in number one. The Giants total happiness is off the charts, which I think must be the result of winning the World Series (again and again and again, in all even-numbered years since 2010). The magnitude of the effect is kind of incredible: The Giants fans have a total affect number about 50 percent higher than the next happiest fanbase.

The other teams are a bit more surprising. The Seattle Mariners were significant to the playoff picture last year for the first time in a few seasons, and they project to be above average this year as well. Maybe this excess happiness is the side effect of that return to relevancy. A similar argument could be made for the White Sox, whose shrewd offseason has seen their postseason odds increase substantially. The Braves confuse me, both at the organizational and fanbase levels. The team is not projected to be competitive, nor were they last year, and yet their hopes spring eternally enough to invest $44 million in the dubious defense of Nick Markakis. On top of that, the team is undergoing a gruesome publically-funded stadium controversy, with allegations of political corruption. How the fans remain so optimistic is anybody’s guess.

And the reverse, the fanbases that are most groundlessly unhappy:

San Diego Padres

1540

1813.261696

-273.262

New York Yankees

684

962.1412489

-278.141

Los Angeles Angels

433

1162.718562

-729.719

Tampa Bay Rays

320

1183.282363

-863.282

Toronto Blue Jays

4263

5183.414512

-920.415

Three of the top five are in the AL East, and that might be more than coincidence. It must be frustrating to see your team regularly compete with great teams outside of the division, only to contend for division titles and wild cards with two of the richest teams in baseball, along with three less wealthy but exceedingly well-run teams (one of whom possesses occult powers). Beyond them, we have the Angels, who are as puzzling as the Braves above. They are good, young, and projected to win 91 games after pacing all of baseball with 98 wins last year. Their continuing despair is mysterious.

There could be a variety of reasons which explain deviations from their expected behavior, some of which I’ve explained above. I have a faint and probably baseless hope that some of the deviations in expected happiness are the result of the fanbases being able to weigh and take into account factors beyond PECOTA’s considerable purview, like changes in coaching staff (the Rays and the Cubs) or other positive or negative indications from their organization. If that’s the case, than maybe the teams with exceptionally happy or sad redditors (relative to expectations) might be able to tell us something about the accuracy of the projections.

To that end, as the season goes on, I’m hoping to continue tracking the mood of the redditors, checking back in a few times during the year to see how their sentiment scores have changed. It would be fun to see when each fanbase gives up on a team, or if they simply don’t until the very last gasp; or how they react to winning or losing streaks, injuries to their core players, and so on. On top of that, although it’s a very long shot, maybe the mood of the fans will be able to tell us something PECOTA doesn’t know.



[1] Thanks to github user rhiever for making this script.

[2] Check out this paper for some details about the word sentiment list.

[3] Fan bases also differed in terms of their levels of Reddit particpitation, so in addition to the total affect rating, I calculated the ratio of positive to negative affect scores, which I term the affect ratio. The latter statistic corrects for the variation in participation, and could be used as another measure of fanbase ‘happiness’. Surprisingly, however, affect ratio was not correlated with total number of words in a Reddit, indicating the participation and happiness are somewhat decoupled. The other results also mostly hold if I look at affect ratio instead of total, although some of the surprisingly happy/unhappy teams change.

[4] For these correlations, I am using the Spearman, i.e. rank-order, correlation coefficient, because the relationships don’t look linear to me.

[5] Along with the total number of words on each subreddit, to account for the level of participation.

[6] To guard against overfitting, I built a support-vector machine model with 2-fold cross-validation, because that’s all this small sample of data could bear. However, there still exists the possibility of overfitting, with so few datapoints. I would like to have more data than just the 30 teams, but unfortunately I am not yet able to harvest subreddit information from earlier than a year ago.

You need to be logged in to comment. Login or Subscribe
Skuggs
2/18
I think you're completely underselling the selection bias here, if you're trying to say "fanbase" is the whole collection of people who go to or watch a team's games. If you're going to define "fanbase" as the smarter subset of those fans, then this is more believable. However, the fact is that 80% of the actual (total) fanbase sees their team in a much more black and white sense...aka Mets fans feel perpetually hopeless, Yankee fans have an irrational cockiness, etc., and this study seems to miss out on that.
nada012
2/18
Fair enough. The selection bias is, in Rumsfeldian parlance, a known unknown, and you may be believe that it is larger than I do, which is completely plausible. I certainly didn't mean to undersell it; perhaps I should have been more careful to clarify that this method measures a section of the fanbase, not the whole thing. I also think that the stereotypical aspects of fanbases you mentioned might begin to emerge over a longer term study like this. In the near term, I think it's plausible that each fanbase is mostly concerned with how their team did recently and how it will do going forward. But in the long term, we might start to see consistent patterns, like Mets fans always being a little less happy than their W-L would suggest, and so on.
Deadheadbrewer
2/18
This kind of clever and engaging article is why I keep my BP subscription going. Nice work!
nada012
2/18
Thank you!
jfranco77
2/18
Really interesting stuff. I think the White Sox fans' exuberance is higher than expected because PECOTA hates the White Sox (see other article posted today) and their fans probably have more reason to be optimistic than you would project.
nada012
2/18
That's exactly what I'm looking for. Maybe the White Sox fans know something PECOTA doesn't, and they will do better than we thought. Based on these results, we'd expect the Mariners, Braves, White Sox, and Mets to do better than PECOTA says, and the Rays, Yanks, Padres, Jays, and Angels to do worse (exempting the Giants for obvious reasons).
Skuggs
2/18
Wait I hope you're not suggesting that fans on Reddit can outpredict PECOTA...that's foolish.
nada012
2/18
Individually? No, certainly not. In aggregate? Yes, I think it's possible. Bear in mind that the fans have access to PECOTA *and* all sorts of additional information, for example about the coaching, the ownership, the training staff, and so on. The wisdom of crowds is a real effect and has been demonstrated in many contexts. I don't think it's probable, certainly. But I think it's possible, and worth checking, because why not?
mitchiapet
2/18
Mining Reddit data is interesting. I can't help but think some fanbases (and thus some subreddits) are more active than others, and this issue is probably affecting your analysis as well. Some control for the number of users or comments in a subreddit might be worth exploring.
rweiler
2/18
Goes to show you though that there is some merit in making sure every kid gets a trophy. Sure it may be irrational to be happy about a last place finish, but you won something and there is always next year.
newsense
2/18
I think you're exaggerating fan pessimism about the Rays because they have a low Affect Rating but a high Affect Ratio.
MrPizzacoli
2/18
"The Braves confuse me, both at the organizational and fanbase levels" As a lifelong Braves fan I am not surprised by this and am often confused by the attitudes held by the majority of the fan base as well as the organization. There are two very distinct camps that exist within the Braves fan community. The first camp is made up of the fans who I beleive make up the majority of the fanbase. These fans tend to look at the Braves through rose colored glasses,for the most part. These are the fans who I beleive tend to make up a solid amount of posts on the Braves sub reddit. Sure, not everyone who posts there sees the glass half full, but the only way for them to get their point across without being ambushed is to post their criticisms in a less critical sarcastic manner. For instance the fans who post there were the same ones who a mere year 9 months ago would get defensive when someone would post an negative comment regarding Frank Wren. They tend to only acknowledge the most egregious errors such as the signings of BJ Upton, Derek Lowe, Kenshin Kawakami, and extending Uggla well after it has become apparent that those moves were indeed terrible. The other camp often comes across as extremely pessimistic in regard as to how they view the organization. These are the individuals who tend to post comments on sites like sbnation,fangraphs,etc. Instead of being just being happy when things inexplicably work out, Aaron Harrang last season, they are quick to say something along the lines of "Awesome glad Harrang is doing well but the only reason we are in this situation is because Wren signed BJ Upton to that awful deal" They seem to be jaded by the general incompetance that the organization has displayed in recent years which makes it extremely difficult for them to acknowledge positive events while they are happening. I try to fall somewhere in the middle because both sides can be equally obnoxious. However, I often tend to side with the group of fans who tend to be more pessimistic. I really do try to keep an open mind to avoid clouded judgement, but that is easier said than done when the team you grew up loving signs Nick Marksakis to a $44 million dollar deal and still chooses to employ Fredi Gonzalez.
swarmee
2/19
Yeah, the Braves fans I know on Facebook (mostly college friends/Atlanta residents) have been way down on the Braves this offseason.
gtgator
2/20
Every fan base has these classes of fans. The Braves are no different. The difference, IMO, is that the "casual observers" who watch the Braves this off-season and see the moves made and wonder what the team is doing simply do not understand how dysfunctional the team was in 2014 or the reasons why. They see players like Heyward, Gattis and Upton leave and think "Oh, the team will suck worse now." I (another life-long Braves' fan) could write a treatise on why the team sucked so bad last year and how the moves made this off-season helped, not hurt, my view of the team. For example, I understand the Markakis signing even if others seem to miss it. I'm not the only close fan I know who could do this. But no amount of writing is going to get the casual observer to understand all this. So I'm not going to bother. Instead, I will state that most fans I know were looking at 2015 as the end. Because Heyward and Upton were gone after the year with no viable internal options to replace them and no budget to retain them. They had no SP to replace Santana and Harang and only Sims as a possible option in 2016. The future was bleak, even if they had a chance at the WC in 2015 (though as 2014 showed, this team wasn't a given to make that either). But now, 2015 isn't the end . . . it is the beginning. There's optimism. And with optimism, even if the casual observer doesn't see it or understand it, comes positive comments. Maybe it is delusional. We'll know more in a few years. But I can definitely understand why the Braves are showing as a positive among the fans (especially as stadium controversies and political corruption are pretty much the norm and irrelevant to most fans).
MrPizzacoli
2/25
In your opinion, what was the purpose of the Markakis signing? I am not trying to be confrontational by any means, but I am curious as to why you believe the Braves signed him. I have nothing against Markakis as a player and believe he could put up solid numbers for the club as long as he is able to stay on the field. However, I find it odd that they would give him a four year deal when they are not expecting to seriously compete for the majority of that time period.
gtgator
3/02
Well, first, they are expecting to compete in 2016. The 2017 goal everyone references is having a WS-caliber team (Hart's words, not mine). Even if one believes that is too optimistic, 2017 is still intended to be a contending year. So half the contract, at worst would be for a "non-contender" (though, again, the Braves don't see 2016 that way). Next, the Braves in the 90s and 2000s had strong veteran presence in the clubhouse. If one doesn't believe in chemistry, then this is all moot. But the team does think it matters (and I agree). One of the biggest reasons (IMO) for the drop-off between 2013 and 2014 (despite the majority of the team being the same) was the loss of Hudson and McCann. Several players from last year's team have noted as such. Markakis is well-regarded as a clubhouse leader and helps to fix that deficiency. They will also need this in 2016 and 2017. Now, while other players could also help here, Markakis 1) fills a hole in RF; 2) is from Georgia; 3) offers a skill set that hopefully ages well; and 4) isn't going to be bumped by anyone in the minors since team had no OF of note outside of low-A at the time of signing (arguably Mallez Smith might apply now - and he's a CF). So, sum it up and Markakis provides something (leadership) the team felt was lacking and will be needed as this team (hopefully) becomes a contender over the next 4 years. He plays a position that 1) they needed to fill and 2) has no one in the minors ready to fill. If he can be a 2 WAR player over the life of the contract, it isn't as if he'd be overpaid. And considering he plays solid defense, draws walks and doesn't suffer horrible platoon splits, that should be attainable with a modicum of health. Again, you have to believe in the value of clubhouse leadership. If you do, and based on the Braves current roster and expected goals, Markakis fits very well.
shadiabu
2/18
As someone who is subscribed to the padres subreddit, there's a lot of excitement over there for the most part, along with a large sense of slightly-guarded optimism. But on the opposite hand, there was a post submitted last week in which someone used a meme to call out the influx of bandwagon fans. After the last few years of underwhelming results on the field, coupled with the disaster that was the season up until the All-Star break last year, I completely understand why the happiness score would be low. I wonder if the results were run from December to January (during A.J. Preller's busiest moments of activity) if the Padres' fans happiness score would be higher. Regardless, very interesting article!
belewfripp
2/19
Very interesting stuff. One thing I would also look at isn't just which words are used, but what other words are they used with? If fans are saying, "I expect excellence, but am not seeing it" or "Our farm system lacks excellence" those wouldn't be positive sentiments at all. I have used in the past a free program called KHCoder that uses nodal concepts to do text analysis. It will generate a lot of info not just of commonly-used words but also what other words they are used with.
BrewersTT
2/20
I wonder though whether it wouldn't balance out. There will also be comments like "signing Kablotnik isn't as terrible a move as people think" and "why do all of you hate Schmutzler?"
lipitorkid
2/19
I ran my own research. I used Twitter's Advanced Search feature with the following search criteria. 1. Tweets since Feb 1st of 2015 2. That mentioned their Team Twitter account. 3. That were considered positive by Twitter :) The Los Angeles Angels of Anaheim Near Disneyland= 40 Tweets The San Francisco Giants= 219 Tweets If you add up the number of angry Tweets sent to @thejoshhamilton, in the same time period, the Angels also have 219 Tweets.
playballtexas
2/20
I think you are onto something with the "future" factor of happiness, especially since your selection was in the middle-end of the offseason, as everyone is looking forward to the new season. However, I think that another indication of how the team will fare in the future is its farm system. You could use farm system rankings or number of prospects in the top 100 (or 101) lists to find a correlation. This might give a better picture as to how long a fanbase will stay happy, as these prospects contribute to the big league team either through their own performance or through a trade.