World Happiness Report 2023

World Happiness Report 2023 145 machine learning; this includes using modern sentiment systems or predicting county-level Gallup well-being survey outcomes directly using machine learning cross-sectionally. Gen 1 with Level 1 dictionary/ annotation-based methods In the U.S. In 2010, Kramer analyzed 100 million Facebook users’ posts using word counts based on the Linguistic Inquiry and Word Count (LIWC) 2007 positive and negative emotions dictionaries (Gen 1, Level 1).63 The well-being index was created as the difference between the standardized (z-scored) relative frequencies of the LIWC 2007 positive and negative emotion dictionaries. However, the well-being index of users was only weakly correlated with users’ responses to the Satisfaction with life scale,64 a finding that was replicated in later work65 in a sample of more than 24,000 Facebook users. Surprisingly, SWLS scores and negative emotion dictionary frequencies correlated positively across days (r = .13), weeks (r = .37), and months (r = .72), whereas the positive emotion dictionary showed no significant correlation. This presented some early evidence that using Level 1 closed- vocabulary methods (here in the form of LIWC 2007 dictionaries) can yield unreliable and implausible results. Moving from LIWC dictionaries to crowdsourced annotations of single words, the Hedonometer project (ongoing, https://hedonometer.org/, Fig. 5.4A)66 aims to assess the happiness of Americans on a large scale by analyzing language expressions from Twitter (Gen 1, Level 1; Fig. 5.4B).67 The words are assigned a happiness score (ranging from 1 = sad to 9 = happy) from a crowdsourced dictionary of 10,000 common words called LabMT (“Language Assessment By Mechanical Turk”).68 The LabMT dictionary has been used to show spatial variations in happiness over timescales ranging from hours to years69 – and geospatially across states, cities,70 and neighborhoods71 based on random feeds of tweets. However, applying the LabMT dictionary to geographically aggregated Twitter language can yield unreliable and implausible results. Some researchers examined spatially high-resolution well-being assessments of neighborhoods in San Diego using the LabMT dictionary72 (see Fig. 5.4C). The estimates were, however, negatively associated with self-reported mental health at the level of census tracts (and not at all when controlling for neighborhood factors such as demographic variables). Other researchers found additional implausible results; using person-to-county-aggregated Twitter data73 (Gen 2), LabMT estimates of 1,208 US counties and Gallup-reported county Life Satisfaction have been observed to anti-correlate, which is further discussed below (see Fig 5.5). Outside in the U.S. To date, Gen 1 approaches have been applied broadly, in different countries, with different languages. In China, it has been Figure 5.3 Raw Tweets Final County Language 1.29 billion tweets 1208 counties Then They Them So It were was Figure 5.3. Example of a Gen 1 Twitter pipeline: A random collection of tweets is aggregated directly to the county level.

RkJQdWJsaXNoZXIy NzQwMjQ=