World Happiness Report 2023 142 Table 5.2: Advances in data sampling and aggregation methods Data sampling and aggregation method Typical examples Advantages Disadvantages Gen 1: Past (2010–) Aggregation of Random Sampling of Posts Aggregate posts geographically, extract language features, use machine learning to predict outcomes (cross-sectionally) Relatively easy to implement (e.g., random Twitter API + sentiment model). Suffers from the disproportionate impact of super-posting accounts (e.g., bots). For longitudinal applications: A new random sample of individuals in every temporal period. Gen 2: Present (2018–) Person-Level Aggregation and Sampling (some with sample bias correction) Person-level aggregation51 and poststratification to adjust the sample towards a more representative sample (e.g., U.S. Census).52 Addresses the impact of super-posting social media users (e.g., bots). With post-stratification: known sample demographics and correction for sample biases. Increases measurement reliability and external validity. For longitudinal applications: A new random sample of individuals in every temporal period. Gen 3: Near future Digital Cohort Sampling (following the same individuals over time) Robust mental health assessments in time and space through social media language analyses.53 All of Gen 2 + Increases the temporal stability of estimates. Defined resolution across time and space (e.g., county-months), enables quasi-experimental designs Higher complexity in collecting person-level time series data (security, data warehousing). Difficult to collect enough data for higher spatiotemporal resolutions (e.g., county-day).
RkJQdWJsaXNoZXIy NzQwMjQ=