People love to talk about the mood of a franchise, or the collective feeling of its fanbase. Are they dispirited, optimistic? Ecstatic following a World Series win, or broken after an agonizing walkoff loss? For the most part, we leave it to the beat writers to gauge mood (which is not necessarily a bad thing), without any kind of backing for their proclamations (which might be a bad thing).
Hypothetically, fans are a reservoir of great wisdom (collectively, although perhaps not individually). So tapping into the mood of a fanbase could be more than interesting, it could be useful. But, beyond inquiring with potentially biased observers, there was little we could do to objectively or quantitatively measure a fanbase’s mood.
In this article, I’m going to present one way to gauge the happiness of a fanbase, using a text analysis of the website Reddit. Reddit is an aggregation engine, to which individual users can submit links to other websites or original content, which is then upvoted, downvoted, and commented upon. Importantly, Reddit self-organizes into communities of like-minded individuals, one category of which is fans of a sports team. As a result, there is one team-specific subreddit (community) for each MLB teams’ fans, along with a huge body of text from that teams’ fans.
I used a freely-available program[1] to harvest Reddit comments and posts en masse, over a month-long time period (roughly Jan. 5-Feb. 5). The program spits out a list of words, along with the number of times each word occurs. So, for example, the Yankees subreddit uses the word “money” 25 times in the past month. The small-market Rays, on the other hand, used the same word merely five times.
To figure out how happy each team’s fanbase is, I did what’s called ‘sentiment analysis’ on each list of words. The idea is like this: Some words tend to be used in positive situations, and indicate that the writer is happier, while others are more negative in connotation, and suggestive of despair. For example, ‘excellence’ is a very positive word, and ‘deception’ an unpleasant one. If a team’s comments are filled with words like excellence, and bereft of words like deception, they are probably happy, and vice versa.
To do the sentiment analysis, I used a list of words (called AFINN-111[2]) which had been manually assigned levels of positivity from -5 to 5. To give you an idea of how it works, the word ‘excellence’ is rated a +3 on this list, while ‘deception’ is rated -3. Then I matched up words from the Reddit analysis with the sentiment list and multiplied by the number of times each word was used in each subreddit. The higher the total score, which I called the total affect rating, the more happy the fanbase[3].
Here’s what I found, for all 30 teams, sorted by total affect rating, our proxy for fanbase happiness.
Name |
Total Affect Rating |
Projected Wins |
Last Year's Wins |
Affect Ratio |
12082 |
84 |
88 |
1.983636 |
|
8087 |
81 |
79 |
1.823188 |
|
7185 |
89 |
90 |
1.868383 |
|
6967 |
75 |
79 |
2.008833 |
|
5214 |
97 |
94 |
1.574419 |
|
4263 |
83 |
83 |
1.88444 |
|
4172 |
87 |
87 |
2.596021 |
|
4096 |
81 |
73 |
2.007131 |
|
3914 |
88 |
71 |
2.100056 |
|
3706 |
91 |
96 |
1.988267 |
|
2816 |
85 |
88 |
2.222753 |
|
2623 |
78 |
96 |
1.852454 |
|
2214 |
79 |
73 |
2.410191 |
|
2163 |
83 |
90 |
1.916525 |
|
1984 |
80 |
82 |
2.242329 |
|
1849 |
79 |
67 |
1.912185 |
|
1618 |
79 |
76 |
2.091032 |
|
1574 |
81 |
88 |
2.569292 |
|
1540 |
85 |
77 |
2.295206 |
|
1475 |
70 |
73 |
1.866627 |
|
1289 |
77 |
70 |
2.141718 |
|
1184 |
80 |
77 |
2.624143 |
|
1000 |
71 |
89 |
1.996016 |
|
791 |
70 |
70 |
1.873068 |
|
771 |
80 |
85 |
2.164653 |
|
684 |
80 |
84 |
2.055556 |
|
Arizona D-backs |
624 |
73 |
64 |
1.794904 |
504 |
71 |
66 |
1.760181 |
|
433 |
91 |
98 |
1.80334 |
|
320 |
86 |
77 |
2.5311 |
It’s Always Sunny in {Insert City Here}
First of all, let’s get this out of the way: Fanbases are all, without exception, pretty optimistic compared to other subreddits. On average, every fanbase maintains a substantially positive total affect. This finding makes a lot of sense, when you take into account the powerful selection bias involved in contributing to a team-specific subreddit—you probably aren’t going to do it unless you have some positive feelings (or at least hope) for the team of interest.
But perhaps these fanbases aren’t any happier than the rest of the internet. To check that, I looked at a few other subreddits, and calculated their levels of positive affect. For example, I scrutinized a collection of texts from city-based subreddits (for example, /r/Chicago, /r/Miami, etc.). No city I looked at had higher than the lowest affect ratio for a team-specific subreddit. All in all, this makes a lot of sense: baseball is an optional hobby, so if someone doesn’t like participating in it, they probably won’t.
The Causes of Fan Happiness
Next, I was curious about what factors correlate with the happiness of the redditors. The first and most obvious factor that might influence the happiness of a fanbase is its past performance. The Tigers, for example, are perennial contenders and finished last year with 90 wins. They’ve been to a World Series recently, and are known as a great organization. How much does that contribute to their mood? As a rough proxy for past success, I used last year’s number of wins.
Previous year wins contribute surprisingly little to total happiness, is what I would say. The correlation is there (r=.3[4]), but not quite significant.
Another possibility is that the fanbase is less concerned about the past performance, and more with the future. It’s possible that fans are already over the results of last season, and have moved on in their mood to thinking about next season. We can check this by going to PECOTA, which objectively projects the performance of every team for the next year. PECOTA stands in here for the conventional wisdom, reflecting what we think we know about next year’s likely performance.
Here, there is a slightly more substantial (r=.39) and also significant (p=.032) relationship. So it seems, on the surface at least, that Reddit fanbases are much more concerned with the future than they are dwelling on their past success.
Individually, past performance and future projections contribute relatively little to explaining a fanbase’s mood. But perhaps together, there are some synergistic effects that can explain more of the variation. I put both predictors into a combined regression, and checked to see how well I could predict the resulting affect ratio.
Surprisingly, when combining the variables together[5], a very substantial improvement is possible. Using the complete model[6], I can predict the total affect rating astoundingly well (r=.7). So maybe fan happiness is, in aggregate and to a first approximation, a simple function of past success and future expectations.
Irrational Exuberance
Doing the predictions in this way allows us to also look at fanbases that are irrationally happy or sad. Here are the top five fanbases that are happier than their performances suggest that they should be:
Name |
Total Affect Rating |
Predicted Affect Rating |
Difference |
San Francisco Giants |
12082 |
8008 |
4074 |
Seattle Mariners |
4172 |
3338 |
834 |
Atlanta Braves |
6967 |
6522 |
445 |
Chicago White Sox |
2214 |
1846 |
368 |
New York Mets |
8087 |
7814 |
273 |
There’s no surprise in number one. The Giants total happiness is off the charts, which I think must be the result of winning the World Series (again and again and again, in all even-numbered years since 2010). The magnitude of the effect is kind of incredible: The Giants fans have a total affect number about 50 percent higher than the next happiest fanbase.
The other teams are a bit more surprising. The Seattle Mariners were significant to the playoff picture last year for the first time in a few seasons, and they project to be above average this year as well. Maybe this excess happiness is the side effect of that return to relevancy. A similar argument could be made for the White Sox, whose shrewd offseason has seen their postseason odds increase substantially. The Braves confuse me, both at the organizational and fanbase levels. The team is not projected to be competitive, nor were they last year, and yet their hopes spring eternally enough to invest $44 million in the dubious defense of Nick Markakis. On top of that, the team is undergoing a gruesome publically-funded stadium controversy, with allegations of political corruption. How the fans remain so optimistic is anybody’s guess.
And the reverse, the fanbases that are most groundlessly unhappy:
San Diego Padres |
1540 |
1813.261696 |
-273.262 |
New York Yankees |
684 |
962.1412489 |
-278.141 |
Los Angeles Angels |
433 |
1162.718562 |
-729.719 |
Tampa Bay Rays |
320 |
1183.282363 |
-863.282 |
Toronto Blue Jays |
4263 |
5183.414512 |
-920.415 |
Three of the top five are in the AL East, and that might be more than coincidence. It must be frustrating to see your team regularly compete with great teams outside of the division, only to contend for division titles and wild cards with two of the richest teams in baseball, along with three less wealthy but exceedingly well-run teams (one of whom possesses occult powers). Beyond them, we have the Angels, who are as puzzling as the Braves above. They are good, young, and projected to win 91 games after pacing all of baseball with 98 wins last year. Their continuing despair is mysterious.
There could be a variety of reasons which explain deviations from their expected behavior, some of which I’ve explained above. I have a faint and probably baseless hope that some of the deviations in expected happiness are the result of the fanbases being able to weigh and take into account factors beyond PECOTA’s considerable purview, like changes in coaching staff (the Rays and the Cubs) or other positive or negative indications from their organization. If that’s the case, than maybe the teams with exceptionally happy or sad redditors (relative to expectations) might be able to tell us something about the accuracy of the projections.
To that end, as the season goes on, I’m hoping to continue tracking the mood of the redditors, checking back in a few times during the year to see how their sentiment scores have changed. It would be fun to see when each fanbase gives up on a team, or if they simply don’t until the very last gasp; or how they react to winning or losing streaks, injuries to their core players, and so on. On top of that, although it’s a very long shot, maybe the mood of the fans will be able to tell us something PECOTA doesn’t know.
[1] Thanks to github user rhiever for making this script.
[2] Check out this paper for some details about the word sentiment list.
[3] Fan bases also differed in terms of their levels of Reddit particpitation, so in addition to the total affect rating, I calculated the ratio of positive to negative affect scores, which I term the affect ratio. The latter statistic corrects for the variation in participation, and could be used as another measure of fanbase ‘happiness’. Surprisingly, however, affect ratio was not correlated with total number of words in a Reddit, indicating the participation and happiness are somewhat decoupled. The other results also mostly hold if I look at affect ratio instead of total, although some of the surprisingly happy/unhappy teams change.
[4] For these correlations, I am using the Spearman, i.e. rank-order, correlation coefficient, because the relationships don’t look linear to me.
[5] Along with the total number of words on each subreddit, to account for the level of participation.
[6] To guard against overfitting, I built a support-vector machine model with 2-fold cross-validation, because that’s all this small sample of data could bear. However, there still exists the possibility of overfitting, with so few datapoints. I would like to have more data than just the 30 teams, but unfortunately I am not yet able to harvest subreddit information from earlier than a year ago.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
I also think that the stereotypical aspects of fanbases you mentioned might begin to emerge over a longer term study like this. In the near term, I think it's plausible that each fanbase is mostly concerned with how their team did recently and how it will do going forward. But in the long term, we might start to see consistent patterns, like Mets fans always being a little less happy than their W-L would suggest, and so on.
I don't think it's probable, certainly. But I think it's possible, and worth checking, because why not?
As a lifelong Braves fan I am not surprised by this and am often confused by the attitudes held by the majority of the fan base as well as the organization. There are two very distinct camps that exist within the Braves fan community.
The first camp is made up of the fans who I beleive make up the majority of the fanbase. These fans tend to look at the Braves through rose colored glasses,for the most part. These are the fans who I beleive tend to make up a solid amount of posts on the Braves sub reddit. Sure, not everyone who posts there sees the glass half full, but the only way for them to get their point across without being ambushed is to post their criticisms in a less critical sarcastic manner. For instance the fans who post there were the same ones who a mere year 9 months ago would get defensive when someone would post an negative comment regarding Frank Wren. They tend to only acknowledge the most egregious errors such as the signings of BJ Upton, Derek Lowe, Kenshin Kawakami, and extending Uggla well after it has become apparent that those moves were indeed terrible.
The other camp often comes across as extremely pessimistic in regard as to how they view the organization. These are the individuals who tend to post comments on sites like sbnation,fangraphs,etc. Instead of being just being happy when things inexplicably work out, Aaron Harrang last season, they are quick to say something along the lines of "Awesome glad Harrang is doing well but the only reason we are in this situation is because Wren signed BJ Upton to that awful deal" They seem to be jaded by the general incompetance that the organization has displayed in recent years which makes it extremely difficult for them to acknowledge positive events while they are happening.
I try to fall somewhere in the middle because both sides can be equally obnoxious. However, I often tend to side with the group of fans who tend to be more pessimistic. I really do try to keep an open mind to avoid clouded judgement, but that is easier said than done when the team you grew up loving signs Nick Marksakis to a $44 million dollar deal and still chooses to employ Fredi Gonzalez.
The difference, IMO, is that the "casual observers" who watch the Braves this off-season and see the moves made and wonder what the team is doing simply do not understand how dysfunctional the team was in 2014 or the reasons why. They see players like Heyward, Gattis and Upton leave and think "Oh, the team will suck worse now."
I (another life-long Braves' fan) could write a treatise on why the team sucked so bad last year and how the moves made this off-season helped, not hurt, my view of the team. For example, I understand the Markakis signing even if others seem to miss it. I'm not the only close fan I know who could do this. But no amount of writing is going to get the casual observer to understand all this. So I'm not going to bother.
Instead, I will state that most fans I know were looking at 2015 as the end. Because Heyward and Upton were gone after the year with no viable internal options to replace them and no budget to retain them. They had no SP to replace Santana and Harang and only Sims as a possible option in 2016. The future was bleak, even if they had a chance at the WC in 2015 (though as 2014 showed, this team wasn't a given to make that either).
But now, 2015 isn't the end . . . it is the beginning. There's optimism. And with optimism, even if the casual observer doesn't see it or understand it, comes positive comments. Maybe it is delusional. We'll know more in a few years. But I can definitely understand why the Braves are showing as a positive among the fans (especially as stadium controversies and political corruption are pretty much the norm and irrelevant to most fans).
Next, the Braves in the 90s and 2000s had strong veteran presence in the clubhouse. If one doesn't believe in chemistry, then this is all moot. But the team does think it matters (and I agree). One of the biggest reasons (IMO) for the drop-off between 2013 and 2014 (despite the majority of the team being the same) was the loss of Hudson and McCann. Several players from last year's team have noted as such. Markakis is well-regarded as a clubhouse leader and helps to fix that deficiency. They will also need this in 2016 and 2017.
Now, while other players could also help here, Markakis 1) fills a hole in RF; 2) is from Georgia; 3) offers a skill set that hopefully ages well; and 4) isn't going to be bumped by anyone in the minors since team had no OF of note outside of low-A at the time of signing (arguably Mallez Smith might apply now - and he's a CF).
So, sum it up and Markakis provides something (leadership) the team felt was lacking and will be needed as this team (hopefully) becomes a contender over the next 4 years. He plays a position that 1) they needed to fill and 2) has no one in the minors ready to fill. If he can be a 2 WAR player over the life of the contract, it isn't as if he'd be overpaid. And considering he plays solid defense, draws walks and doesn't suffer horrible platoon splits, that should be attainable with a modicum of health.
Again, you have to believe in the value of clubhouse leadership. If you do, and based on the Braves current roster and expected goals, Markakis fits very well.
After the last few years of underwhelming results on the field, coupled with the disaster that was the season up until the All-Star break last year, I completely understand why the happiness score would be low. I wonder if the results were run from December to January (during A.J. Preller's busiest moments of activity) if the Padres' fans happiness score would be higher.
Regardless, very interesting article!
If fans are saying, "I expect excellence, but am not seeing it" or "Our farm system lacks excellence" those wouldn't be positive sentiments at all.
I have used in the past a free program called KHCoder that uses nodal concepts to do text analysis. It will generate a lot of info not just of commonly-used words but also what other words they are used with.
I used Twitter's Advanced Search feature with the following search criteria.
1. Tweets since Feb 1st of 2015
2. That mentioned their Team Twitter account.
3. That were considered positive by Twitter :)
The Los Angeles Angels of Anaheim Near Disneyland= 40 Tweets
The San Francisco Giants= 219 Tweets
If you add up the number of angry Tweets sent to @thejoshhamilton, in the same time period, the Angels also have 219 Tweets.
However, I think that another indication of how the team will fare in the future is its farm system. You could use farm system rankings or number of prospects in the top 100 (or 101) lists to find a correlation. This might give a better picture as to how long a fanbase will stay happy, as these prospects contribute to the big league team either through their own performance or through a trade.