Kierankirk

Global Happiness and Mobility in Twitter

The relatively new field of big data analytics utilizes various social media sites as hubs from which to gather data. Twitter is arguably the most commonly used social media site because it offers one of the most geographically differentiated, and data-rich, pools of data due to the fact that it is so widely used throughout the globe. Twitter data can be applied across virtually any field of study for the use of data analytics. The focus of this page is on the applications of Twitter data within the fields of classifying human sentiment, ^[1] analyzing global mobility patterns, ^[2] and how geographic location affects societal happiness. ^[3]

Classifying Human Sentiment
Global Mobility Patterns
The Geography of Happiness

Other studies which utilized Twitter data in their analyses focused on subjects such as where obesity is most commonly seen in the U.S. and what Tweeters mentioning the topic of obesity are referencing most in their tweets. ^[4] Another study made use of Twitter data in order to investigate how diurnal, or daily, and seasonal mood vary with differing amounts of workload, sleep, and day length. ^[5] As a final example of the range of data analysis made possible with Twitter data, one study utilized geo-location of tweets within the U.S. from 2011-2014 to improve the accuracy of tracking diseases (specifically influenza was studied). ^[6]

Classifying Human Sentiment

Human moods and affective states can be analyzed on social media. Predicting and classifying human affect in social media is important because social media sites such as Twitter are beginning to be identified more often as platforms for the expression of human sentiment and affect expression, the analysis of which can be useful in many different ways. Market campaigns, monitoring responses to local and global events, and deciphering geographic and temporal mood trends are all examples of scenarios in which sentiment and affect analysis can be useful. As a result of affect analysis, researchers can create new information-seeking approaches on social media, such as identifying search features given an affect attribute.

By developing a classifier of several human affective states using Twitter posts, human moods and affective states can be derived more accurately and effectively. The development of an affect classifier is more efficient due to the fact that it does not rely on any hand-built list of features or words, except for the approximate 200 mood hashtags used in one study as a supervised ground truth signal. These mood hashtags were established as a ground truth signal of the data because they were assigned to 11 different affective states by humans and were not subject to human error. Tweets containing mood hashtags could then be plugged into the classifier, which used a maximum entropy classification framework in order to predict the affective state, given a post. Initial results indicated a wide variation in classifier performance across different affects. After analyzing the results it was found that the affects that performed better within the classifier were those that had a large number of moods associated with them and so their feature space was less sparse because they were able to span a greater variety of topical and linguistic contexts in Twitter posts. The affects that performed poorly within the classifier had fewer corresponding moods, and so their feature space was sparse because they were typically used in more limited context on Twitter. The Overall results illustrated that the different affect states have a wide range of usage patterns, and they exhibit diversity in the linguistic context that they are shared in. ^[1]

Global Mobility Patterns

Social media can also be utilized as a tool for analyzing global mobility patterns using Twitter messages. Monitoring worldwide mobility patterns is important because it allows researchers to study migration flows, monitor tourist activity, and examine the way diseases spread for epidemic modeling. Studies using tweets in order to establish data to analyze mobility patterns utilize the geolocating aspect of these tweets to ascertain the tweeters' locations. These geo-located tweets are filtered out of certain studies if they were tweets of those who were traveling on a plane, web advertisements, web gaming, or web reporting. In order to correctly establish when users travel, researchers first established a country of origin for every user. In one study, researchers looked at Twitter penetration rate across countries of the world as well as mobility rate, so how much do people in a country use Twitter, and how much do those people travel. The researchers also calculated a radius of gyration, meaning how often users traveled locally versus how often they traveled long distances. The researchers of this study also analyzed the mobility profile of each country, determining how often it was the origin or the destination of travel in order to calculate the inflow and outflow of visitors. Geo-located tweets can be utilized to collect data of not only the aforementioned category, but also to discover how many Twitter users are active outside their country of residence each day, or they can be analyzed using certain algorithms in order to determine whether users traveled more often within their continental division of the world or outside of it. Within the same study mentioned above, researchers compared their results with worldwide tourism statistics, and found the results of their data yielded that geo-located Twitter is an effective method for studying human mobility patterns.

Long distance travelers tend to come from developed countries.

Their research also illustrated that West European and developed countries have an increased mobility, meaning those users were more likely to travel, traveled to more diverse places, and traveled to a wider geographical range of locations around the world. Their results followed logical expectations; for example, the researches found that users in isolated countries tended to travel farther (e.g. Australia, New Zealand). There was a global increase in mobility at the end of the year and in many countries an increase in mobility during the summer months. Finally, it was also found that special events in certain countries and cultural factors also influenced mobility. ^[2]

The Geography of Happiness

Finally, the use of Twitter data to analyze human sentiment and mobility patterns can be combined in order to analyze how geographic place correlates with and influences societal levels of happiness. Data from Twitter also allows researchers to analyze the correlation between happiness and a wide range of emotional, demographic, and health characteristics. Most studies focus on gathering their data from cities. With the vast number of people living in urban areas, over half the world’s population and growing, cities are central to human society and are also data-rich places. So, due to the vast number of the world’s population living and migrating to urban areas, the question many researchers seek to answer is one of immense importance: how does living in urban areas relate to well being?

City residents enjoying a day at a park in Houston, Texas.

One study utilized geo-tagged tweets on Twitter, U.S. census data, annual survey characteristics, and a set of words scored for their happiness independently by Amazon Mechanical Turk users in order to answer the question of how living in urban areas affects well being. In the analysis of their study the researchers measured the happiness of different states and cities in the U.S., determined the happiest and saddest states and cities, compared their results from on the different cities with census data, and correlated word usage with common social and economic measures. The 5 happiest states in order were Hawaii, Maine, Nevada, Utah, and Vermont. The 5 saddest states in order were: Louisiana, Mississippi, Maryland, Delaware, and Georgia. Hawaii’s tweets contained many happy words like ‘beach,’ food-related words, and displayed tweets from users who traveled a lot and generally tweeted with greater happiness than the average user. Louisiana was mainly the saddest state due to its Twitter user’s large use of profanity relative to other states.

After using cluster analysis to display cross-correlations between word frequency distributions for all states, it was found that much of the similar word usage between states could be attributed to geographical proximity. The happiness distribution within New York City illustrated that the areas of Harlem and Washington Heights, as well as the surrounding area of Waterfront, New Jersey, were less happy than the downtown/midtown area. They looked at the entire U.S. and found that cities in the southeastern U.S. were generally less happy than others, and the Florida peninsula and coasts of North and South Carolina are significantly happier than the regions immediately inland of them. After graphing the distribution of average happiness values for cities within the census data set, it was found that more cities have happiness scores higher than the average, which suggests that over all living in a city promotes well being. Areas with a higher density of tweets tended to be less happy, and there was a slight negative correlation between higher population and happiness (the greater the population, the less happy). Focusing their attention on individual cities, researchers looked at the average word happiness for the 15 happiest and the 15 saddest cities to score all the cities in to find the overall happiest and the overall saddest. Overall, Napa, California is the happiest city and Beaumont, Texas is the saddest. The average word happiness varies across urban areas due to the fact that these relative happiness scores were most influenced by certain key happy/sad words (i.e. lol, haha, love, no, don’t, never, wrong, & profanity (last five are negative words)). In their later analysis, these researchers examined in greater detail how happiness and word usage relate to underlying social factors. It has been found in this study, in support of popular belief, that high socioeconomic status related to higher happiness scores and respectively, lower socioeconomic status related to lower happiness scores. The use of both food-related words and the presence of poverty have been found to correlate both negatively and positively to obesity, so essentially word use does not have a significant correlation to obesity. However, researchers did discover that the happier the city the lower the obesity rate. The 3rd happiest city of Boulder, Colorado had the lowest obesity rate, and the saddest city of Beaumont, Texas had the 5th highest obesity rate.

Not only has Twitter data been utilized by researchers in identifying the happiest and saddest states and cities, finding the positive correlation between higher socioeconomic status and happiness, finding the correlation between certain derived words and demographic attributes, noted the connection between happier cities and lower obesity and vice versa, but Twitter also proved itself capable of potentially being used to estimate real-time levels and changes in population-scale measures. ^[3]

References

[DeChoudhury2012-1] De Choudhury, Munmun, Michael Gamon, and Scott Counts. "Happy, Nervous or Surprised? Classification of Human Affective States in Social Media." ICWSM. 2012..

[Hawelka2014-2] Hawelka, Bartosz, et al. "Geo-located Twitter as proxy for global mobility patterns." Cartography and Geographic Information Science 41.3 (2014): 260-271..

[Mitchell2013-3] Mitchell, Lewis, et al. "The geography of happiness: Connecting twitter sentiment and expression, demographics, and objective characteristics of place." PloS one 8.5 (2013): e64417..

[4] Ghosh, Debarchana, and Rajarshi Guha. "What are we ‘tweeting’about obesity? Mapping tweets with topic modeling and Geographic Information System." Cartography and geographic information science 40.2 (2013): 90-102..

[5] Golder, Scott A., and Michael W. Macy. "Diurnal and seasonal mood vary with work, sleep, and daylength across diverse cultures." Science 333.6051 (2011): 1878-1881..

[6] Paul, Michael J., Mark Dredze, and David Broniatowski. "Twitter improves influenza forecasting." PLOS Currents Outbreaks (2014)..

[1]

[2]

[3]

[4]

[5]

[6]