World Happiness Report 2023

World Happiness Report 2023 139 The first axis of development – data collection and aggregation strategies – can be categorized into three generations which have produced stepwise increases in prediction accuracies and reductions in the impact of sources of error, such as bots (detailed in Table 2): Gen 1: Aggregation of random posts (i.e., treating each communities’ posts as unstructured “bags of posts”). Gen 2: Person-level sampling and aggregation of posts, with the potential to correct for sample biases (i.e., aggregation across persons). Gen 3: Aggregation across a longitudinal cohort design (i.e., creating digital cohorts in which users are followed over time and temporal trends are described by extrapolating from the changes observed within users). The second axis of development – language models– describes how language is analyzed; that is, how numerical well-being estimates are derived from language. We argue that these have advanced stepwise, which we refer to as Levels for organizational purposes. These iterations improve the accuracy with which the distribution of language use is mapped onto estimates of well-being (see Table 3 for a detailed overview). The Levels have advanced from closed-vocabulary (dictionary- based) methods to machine learning and large language model methods that ingest the whole vocabulary.34 We propose the following three levels of developmental stages in language models: Level 1: Closed-vocabulary approaches use word-frequency counts that are derived based on defined or crowdsourced (annotation-based) dictionaries, such as for sentiment (e.g., ANEW)35 or word categories (e.g., Linguistic Inquiry and Word Count 2015 or 2022).36 Level 2: Open-vocabulary approaches use data-driven machine learning predictions. Here, words, phrases, or topic features (e.g., LDA)37 are extracted and used as inputs in supervised machine learning models, in which language patterns are automatically detected. Level 3: Contextual word embedding approaches use large language models to represent words in their context; so, for example, “down” is represented differently in “I’m feeling down” as compared with “I’m down for it.” Pre-trained models include BERT,38 RoBERTa,39 and BLOOM.40 Generations and Levels increase the complexity with which data is processed and analyzed – and typically also, as we detail below, the accuracy of the resultant well-being estimates. Addressing social media biases The language samples from social media are noisy and can suffer from a variety of biases, Table 5.1: Overview of generations of aggregation methods and levels of language models Sampling and data aggregation methods Language models Gen 1: Aggregation of random posts Level 1: Closed vocabulary (curated or word-annotation-based dictionaries) Gen 2: Aggregation across persons Level 2: Open vocabulary (data-driven AI, ML predictions) Gen 3: Aggregation across a longitudinal cohort design Level 3: Contextual representations (large pre-trained language models) Note: AI = Artificial Intelligence, ML= Machine Learning. See Table 5.2 for more information about the three generations of data aggregation methods and Table 5.3 for the three levels of language models.

RkJQdWJsaXNoZXIy NzQwMjQ=