World Happiness Report 2023 133 Summary Abstract Social media data has become the largest cross-sectional and longitudinal dataset on emotions, cognitions, and behaviors in human history. To use social media data, such as Twitter, to assess well-being on a large-scale promises to be cost-effective, available near real-time, and with a high spatial resolution (for example, down to town, county, or zip code levels). The methods for assessment have undergone substantial improvement over the last decade. For example, the cross-sectional prediction of U.S. county life satisfaction from Twitter has improved from r = .37 to r = .54 (when training and comparing against CDC surveys, out-of-sample),1 which exceeds the predictive power of log. income of r = .35.2 Using Gallup phone surveys, Twitter-based estimation reaches accuracies of r = .62.3 Beyond the cost-effectiveness of this unobtrusive measure- ment, these “big data” approaches are flexible in that they can operate at different levels of geographic aggregation (nations, states, cities, and counties) and cover a wide range of well-being constructs spanning life satisfaction, positive/ negative affect, as well as the relative expression of positive traits, such as empathy and trust.4 Perhaps most promising, the size of the social media datasets allows for measurement in space and time down to county–month, a granularity well suited to test hypotheses about the determinants and consequences of well-being with quasi-experimental designs. In this chapter, we propose that the methods to measure the psychological states of populations have evolved along two main axes reflecting (1) how social media data are collected, aggregated, and weighted and (2) how psychological estimates are derived from the unstructured language. For organizational purposes, we argue that (1) the methods to aggregate data have evolved roughly over three generations. In the first generation (Gen 1), random samples of tweets (such as those obtained through Twitter’s random data feed) were aggregated – and then analyzed. In the second generation (Gen 2), Twitter data is aggregated to the person-level, so geographic or temporal language samples are analyzed as a sample of individuals rather than a collection of tweets. More advanced Gen 2 approaches also introduce person-level weights through post-stratification techniques – similar to representative phone surveys – to decrease selection biases and increase the external validity of the measurements. We suggest that we are at the beginning of the third generation of methods (Gen 3) that leverage within-person longitudinal designs (i.e., model individuals over time) in addition to the Gen 2 advances to achieve increased assessment accuracy and enable quasi-experimental research designs. Early results indicate that these newer generations of person- level methods enable digital cohort studies and may yield the greatest longitudinal stability and external validity. Regarding (2) how psychological states and traits are estimated from language, we briefly discuss the evolution of methods in terms of three levels (for organizational purposes), which have been discussed in prior work.5 These are the use of dictionaries and annotated word lists (Level 1), machine-learning-based models, such as modern sentiment systems (Level 2), and large language models (Level 3). These methods have iteratively addressed most of the prominent concerns about using noisy social media data for population estimation. Specifically, the use of machine-learning prediction models applied to open-vocabulary features (Level 2) trained on relatively reliable population estimates (such as random phone surveys) allows the language signal to fit to the “ground truth.” It implicitly addresses (a) self-presentation biases and social desirability biases (by only fitting on the signal that generalizes), as evidenced by high out-of-sample prediction accuracies. The user-level aggregation and resultant equal weighting of users in Gen 2 reduce the error due to (b) bots. The size of the social media datasets allows for measurement in space and time down to county–month.
RkJQdWJsaXNoZXIy NzQwMjQ=