World Happiness Report 2023 151 Figure 5.7: Twitter Prediction of U.S. County Life Satisfaction Pearson R (Out-of-sample prediction accuracies) log. Income Gen 1: Tweets to county Gen 2a: User-level to county Gen 2b: Post-stratified user-level log. Income Gen 1: Tweets to county Gen 2a: User-level to county .30 .40 .50 .60 Gallup Life Sat. (2009-2016) CDC BRFSS Life Sat. (2009-2010) Figure 5.7. Cross-sectional Twitter-based county-level cross-validated prediction performances using (Gen 1) direct aggregation of tweets to counties, Gen 2a: person-level aggregation before county aggregation, and Gen 2b: robust post-stratification based on age, gender, income, and education.99 Life satisfaction values were obtained from: Top, the CDC’s Behavioral Risk Factor Surveillance System (BRFSS) estimates (2009 to 2010, N = 1,951 counties)100; Bottom: the Gallup-Sharecare Well-Being Index (2009-2016, N = 1,208 counties).101 Twitter data was the same in both cases, spanning a random 10% sample of Twitter collected from 20092015.102 Publicly released here. https://github.com/wwbp/county_tweet_lexical_bank. Figure 5.6 Raw Tweets Final County Language Post-stratification User-level Aggregation It were was Giorgi et al. 2018 Adjust Twitter sample towards US census Giorgi et al. 2018 Figure 5.6. Example of a Gen 2 Twitter pipeline: Person-level aggregation and post-stratification. Then They Them So
RkJQdWJsaXNoZXIy NzQwMjQ=