World Happiness Report 2023

World Happiness Report 2023 140 and unfamiliar audiences sometimes dismiss social-media-based measurement on these grounds. We discuss them in relation to selection, sampling, and presentation biases. Selection biases include demographic and sampling biases. Demographic biases – i.e., that individuals on social media platforms are not representative of the larger population (refer to Figure 5.2),41 reveal concerns that assessments do not generalize to a population with another demographic structure. Generally, social media platforms differ from the general population; Twitter users, for example, tend to be younger and more educated than the general U.S. population.42 These biases can be addressed in several ways; for example, demographic biases can be addressed by applying post-stratification weights to better match the target population on important demographic variables.43 Sampling biases involve concerns that a few accounts generate the majority of content,44 including super-posting social bots, and organizational accounts, which in turn have a disproportionate influence on the estimates. Robust techniques to address these sampling biases, such as person-level aggregation, largely remove the disproportionate impact of super- posting accounts.45 It is also possible to identify and remove social bots with high accuracy (see Box 5.1).46 Presentation biases include self-presentation (or impression management), and social desirability biases, and involve concerns that individuals “put on a face” and only present curated aspects of themselves and their life to evoke a positive perception of themselves.47 However, empirical studies indicate that these biases have a limited effect on machine learning algorithms that take the whole vocabulary into account (rather than merely counting keywords). As discussed below, machine- learning-based estimates (Level 2) reliably converge with non-social-media assessments, such as aggregated survey responses (out-of-sample convergence above Pearson r of .60).48 These estimates thus provide an empirical upper limit on the extent that these biases can influence machine learning algorithms. Taken together, despite the widespread prima facie concern about selection, sampling, and presentation biases, the out-of-sample prediction accuracies of the machine learning models demonstrate empirically that these biases can be handled49 – as we discuss below. The out-of-sample prediction accuracies of the machine learning models demonstrate empirically that these biases can be handled.

RkJQdWJsaXNoZXIy NzQwMjQ=