World Happiness Report 2023

World Happiness Report 2023 148 Gen 1 using Level 2 machine learning methods More advanced language analysis approaches, including Level 2 (machine learning) and Level 3 (large language models), have been applied to random Twitter feeds. For example, random tweets aggregated to the U.S. county level were used to predict life satisfaction (r = .31; 1,293 counties)82 and heart disease mortality rates (r = .42, 95% CI = [.38, .45]; 1,347 counties; Gen 1, Level 1–2)83; in these studies, machine learning models were applied to open-vocabulary words, phrases, and topics (see supplementary material for social media estimates with a spatial resolution below the county level). In addition, researchers have used text data from discussion forums at a large online newspaper (Der Standard) and Twitter language to capture the temporal dynamics of individuals’ moods.84 Readers of the newspaper (N = 268,128 responses) were asked to rate their mood of the preceding day (response format: “good,” “somewhat good,” “somewhat bad,” or “bad”), which were aggregated to the national level (Gen 1, Level 1 and 3).85 Language analyses based on a combination of Level 1 (German adaptation of LIWC 2001)86 and Level 3 (German Sentiment, based on contextual embeddings, BERT) yielded high agreement across days with the aggregated Der Standard self-reports over 20 days (r =.93 [.82, .97]). Similarly, in a preregistered replication, estimates from Twitter language (more than 500,000 tweets by Austrian Twitter users) correlated with the same daily-aggregated self-reported mood at r = .63 (.26, .84). Gen 1: Random post aggregation - Summary To aggregate random tweets directly into geographic estimates is intuitively straightforward and relatively easy to implement; and it has been used for over a decade (2010+). However, it is susceptible to many types of noise, such as changing sample composition over time, inconsistent posting patterns, and the disproportionate impact of super-posting accounts (e.g., bots, see Box 5.1), which may decrease measurement accuracy. Figure 5.5 Level 1: Dictionaries Level 2: Machine-Learning Models LIWC 2015 LabMT Swiss Chocolate World Well-Being Project Gallup surveys Positive Emotion Negative Emotion Happiness Positive Sentiment Negative Sentiment Life Satisfaction Model Direct County-Level Prediction Life Satisfaction -.21 -.32 -.27 .24 -.29 .39 .62 Happiness -.13 -.27 -.07 .24 -.30 .23 .51 Sadness .25 .22 .19 -.20 .33 -.23 .64 Figure 5.5. Using different kinds (“levels”) of language models in the prediction for Gallup-reported county-level Life Satisfaction, Happiness, and Sadness (using a Gen 2: User-level-aggregated 2009-2015 10% Twitter dataset) across 1,208 US counties. Level 2-based estimates, such as those based on Swiss Chocolate – a modern Sentiment system derived through machine learning – yield consistent results.80 However, estimates derived through the Level 1 Linguistic Inquiry and Word Count (LIWC 2015) Positive Emotions dictionary or the word-level annotation-based Language Assessment by Mechanical Turk (labMT) dictionary anti-correlate with the county-level Gallup-reported survey measure for Life Satisfaction.81

RkJQdWJsaXNoZXIy NzQwMjQ=