World Happiness Report 2023 152 Gen 3: Digital Cohort Sampling – the Future of Longitudinal Measurement Most of the work discussed thus far has been constrained to cross-sectional, between-community analysis, but social media offers high-resolution measurement over time at a level that is not practically feasible with survey-based methods (e.g., the potential for daily measurement at the community level). This abundance of time-specific psychological signals has motivated much prior work. In fact, a lot of early work using social media text datasets focused heavily on longitudinal analyses, ranging from predicting stock market indices using sentiment and mood lexicons (Gen 1, Level 1)103 to evaluating the temporal diurnal variation of positive and negative affect within individuals expressed in Twitter feeds (Gen 1, Level 1).104 For example, some analyses showed that individuals tend to wake up with a positive mood that decreases over the day.105 This early work on longitudinal measurement seemed to fade after one of the most iconic projects, Google Flu Trends (Gen 1, Level 1),106 began to produce strikingly erroneous results.107 Google Flu Trends monitored search queries for keywords associated with the flu; this approach could detect a flu outbreak up to a week ahead of the Center for Disease Control and Prevention’s (CDC’s) reports. While the CDC traditionally detected flu outbreaks from healthcare provider intake counts; Google sought to detect the flu from something people often do much earlier when they fall sick – google their symptoms. However, Google Flu Trends had a critical flaw – it could not fully consider the context of language;108 for example, it could not distinguish symptom discussions because of concerns around the bird flu from that of describing one’s own symptoms. This came to a head in 2013 when its estimates turned out to be nearly double those from the health systems.109 In short, this approach was susceptible to these kinds of noisy influences partly because it relied on random time series analyzed primarily with dictionary-based (keyword) approaches (Gen 1 and Level 1). After the errors of Google Flu Trends were revealed, interest at large subsided, but research within Natural Language Processing began to address this flaw, drawing on machine learning methods (Level 2 and 3). For infectious diseases, researchers have shown that topic modeling techniques could distinguish mentions of one’s symptoms from other medical discussions.110 For well-being, as previously discussed, techniques have moved beyond using lists of words assumed to signify well-being (by experts or annotators; Level 1) to estimates relying on machine learning techniques to empirically link words to accepted well-being outcomes (often cross-validated out-of-sample; Level 2).111 Most recently, large language models such as (contextualized word embeddings, RoBERTa) have been used to distinguish the context of words (Level 3).112 Here, we discuss what we believe will be the third generation of methods that take the person-level sampling and selection bias correction of Gen 2 and combine them with longitudinal sampling and study designs. Pioneering digital cohort samples Preliminary results from ongoing research demonstrate the potential of longitudinal digital cohort sampling (Fig. 5.8). This takes a step beyond user-level sampling while enabling tracking variance in well-being outcomes across time: Changes in well-being are estimated as the aggregate of the within-person changes observed in the sample. Digital cohort sampling presents several new opportunities. Changes in well-being and mental health can be assessed at both the individual and (surrounding) group level, opening the door to studying their interaction. Further, short-term (weekly) and long-term patterns (changes on multi-year time scales) can be discovered. Finally, the longitudinal design unlocks quasi-experimental designs, such as difference-in-difference, instrumental variable or regression discontinuity designs. For example, Short-term (weekly) and long-term patterns (changes on multi-year time scales) can be discovered.
RkJQdWJsaXNoZXIy NzQwMjQ=