Strategies to Minimize Selection Bias in Digital Population-Based Studies in Sport and Health Sciences: Methodological and Empirical Insights from the COMO Study
Strategien zur Minimierung von Selektionsverzerrungen in digitalen bevölkerungsbasierten Studien in den Sport- und Gesundheitswissenschaften: Methodische und empirische Einblicke aus der COMO-Studie
Summary
Objectives: Population-based surveys increasingly face challenges of declining response rates and selective participation, particularly since the COVID-19 pandemic. This article presents the COMO study’s design and sampling approach and discusses strategies to minimize selection bias in digital health surveys, focusing on participation enhancement, response rates, and sample composition from the first survey wave of the COMO Study.
Methods: The COMO study is a nationwide, prospective panel survey with three annual online waves (2023–2025). In the first wave (COMO1), a probability-based sample of 35,157 families with children aged 4–17 was recruited using a two-stage register-based sampling procedure. Participation was encouraged via a multimodal communication strategy including postal invitations, email reminders, non-monetary incentives, and social media outreach. Weighting procedures were developed based on design weights and calibration to national microcensus benchmarks.
Results: A total of 6,097 families submitted at least one complete questionnaire, corresponding to a 17.3% overall participation. The final analytical sample with full parent and child data comprised 5,240 families (15.0%). Response behavior varied by age, gender, and parental education as one proxy of socio economic status (SES), with adolescents, boys, and lower-educational households underrepresented. Reminder strategies boosted participation, particularly in early stages. A weighting procedures was developed to corrected key demographic imbalances. Qualitative analyses of 50 inquiries revealed technical, communicative, and emotional barriers, including privacy concerns and pandemic-related distress.
Conclusions: The COMO study demonstrates that targeted reminders, inclusive materials, and post-survey weighting can reduce—but not fully eliminate—selection bias. Findings underscore the need for participatory, equity-oriented , incentives designs to improve representativeness in health research, especially in underserved populations.
Key Words: Response Behavior, Sampling Bias, Panel Attrition, Inclusive Recruitment, Health Equity
Introduction
Population-based panel studies face the persistent challenge of ensuring sample representativeness. Selective participation and cumulative dropouts can lead to systematic distortions that reduce external validity and limit the generalizability of study findings (11). This issue is particularly critical in sport and health sciences, where empirical evidence is frequently translated into preventive and health-promoting interventions (12). The COVID-19 pandemic has further emphasized the urgent need for timely and continuous data on the health of the general population and specific subgroups. At the same time, empirical evidence indicates that response behavior in population surveys has changed substantially since the pandemic began.
Recent health surveys report comparatively low and stagnating or declining response rates, increasing the risk of nonresponse bias (11). An analysis of the U.S. Current Population Survey (CPS) found that, during the first four months of the pandemic (March–June 2020), the average monthly nonresponse rate rose by 58%, while the inflow of new participants declined by 37% compared to the previous 15 months (18).
These developments increase the methodological demands on population-based studies to integrate systematic strategies to minimize selection bias. Ding and Ekelund (2024) emphasize that future research in the field of physical activity must prioritize “diversity, inclusion, equity, and equality” as foundational principles.
As a probability-based online panel, the COMO study offers a methodological framework for testing, documenting, and evaluating practical strategies to reduce selection bias in digital population surveys. The study aims to investigate long-term changes in physical and mental health and health behavior among children and adolescents in Germany in the post-pandemic context, taking socioecological factors into account. A probability-based sample of children aged 4 to 17, representative for selected variables at the national level, was recruited to participate in three annual online surveys in 2023 (baseline), 2024, and 2025.
This article presents the design of the COMO study, its sampling strategy, the strategies implemented to enhance participation, achieved response rates, and the composition of the net sample from the first COMO wave of online data collection. It addresses the following research questions:
- At the point of recruitment: What effect did reminder strategies have on participation behavior in the COMO study?
- During data processing: How were design and calibration weights developed to minimize selection bias?
- Following the first data collection: What qualitative reasons for nonresponse were identified to inform future fieldwork strategies?
Methods
The following section describes the methodology in accordance with the STROBE guidelines (17).
Study Design
The COMO study (13) is designed as a multicenter, interdisciplinary panel study based on a prospective longitudinal design with three annual waves of data collection (fall 2023, 2024, and 2025). Its aim is to investigate changes in physical and mental health as well as health behaviors among children and adolescents aged 4 to 17 years in Germany in the aftermath of the COVID-19 pandemic, taking into account socioecological contexts. The first survey wave (COMO 1) was conducted between October 2023 and February 2024. The study was funded by the German Federal Ministry of Education and Research (BMBF) as part of the funding initiative “Societal Impacts of the COVID-19 Pandemic – Research for Integration, Participation, and Renewal” (funding reference: 01UP2222A).
Participants
The target population comprised children and adolescents aged 4 to 17 years living in private households in Germany. Sampling followed a two-stage, register-based procedure. In the first stage, 185 (later 177) municipalities were selected as Sample Points (SPs) by the GESIS – Leibniz Institute for the Social Sciences (3, 15), stratified by federal state and proportional to the number of 3- to 17-year-olds living there, using the controlled rounding method (6). In the second stage, 200 addresses per SP were randomly drawn from local population registries in cooperation with municipal registration authorities. To ensure feasibility of fieldwork, structurally similar substitute municipalities were identified using a k-nearest neighbor algorithm (9). In total, 35,157 households were invited by post to participate (16).
Children and adolescents were eligible if they were aged between 4 and 17 years on the reference date (June 1, 2023) and lived in a private household in Germany. Individuals with disclosure restriction or incomplete registry data were excluded.
Exposures, Outcomes, and Covariates
Key exposures included indicators of physical activity (e.g., meeting WHO guidelines, participation in club sports or informal activity), assessed using the MoMo physical activity questionnaire. Further variables included physical health measures (e.g., physical fitness, pain), psychological difficulties (Strengths and Difficulties Questionnaire, SDQ), and health-related quality of life (KIDSCREEN-10). Covariates included age, sex, and socioeconomic status, operationalized using a multidimensional SES index based on parental education, occupation, and income (10).
Data Collection
Data were collected online using three age-specific questionnaires: (a) a parent questionnaire, (b) a self-report questionnaire for children aged 11 and older, and (c) a parent-child questionnaire for children aged 4 to 10. Access was provided via individual login codes. Questions and support requests were handled via a centralized support email and a telephone hotline.
Communication and Reminder Strategies
To maximize participation in the voluntary online survey, a multi-stage communication and reminder strategy was employed. Key principles included enhancing perceived benefit, reducing burden, and fostering trust through clear, audience-appropriate messaging. The initial postal invitation included a personalized access code, a QR code (square barcode that stores information and can be scanned with a smartphone to access it quickly), and detailed information on study objectives, data protection, and contact options. All materials—such as a study flyer, paper airplane template, and fold-out leaflet—were developed collaboratively by the Karlsruhe Institute of Technology (KIT) and the SOKO Institute, with attention to inclusion and diversity. Up to five postal and electronic reminders were sent, supported by social media (Instagram, X/Twitter), a project website, and a service hotline. Incentives included a COMO-branded notepad and digital materials. Fieldwork was scheduled to avoid school holidays to ensure data reflected everyday conditions. The use of incentives was based on evidence suggesting that such strategies, especially in extensive surveys, can significantly boost response rates (1).
Statistical Analysis
This paper focuses on descriptive analyses and qualitative evaluations. The aim is to systematically describe distributions, and characteristic features of the collected data and to gain a deeper understanding through interpretative methods. In the results section, the calculation of design weights and calibration weights are also presented, which are used to adjust the sample to known characteristics of the target population. To capture the socioeconomic status of the respondents, a three-dimensional index was created based on questions regarding parents’ education, occupation, and income. Only information provided by the parents was used. Three household groups are distinguished: ‚low‘ (lowest 20% of index values), ‚medium‘ (middle 60%), and ‚high‘ (highest 20% of index values).
Ethical Considerations
The study received ethical approval from the Ethics Committee of the Karlsruhe Institute of Technology (KIT), reference number A2023-078. Prior to participation, informed digital consent was obtained from legal guardians. Data protection complied with the German DSGVO (GDPR, General Data Protection Regulation, ); personal data were stored separately from survey data and will be deleted after project completion. The study is registered (13).
Results
Effect of Reminder Communications on Willingness to Participate
The development of response rates in the first wave of the COMO study (COMO 1) clearly highlights the importance of a systematically planned contact management strategy. As illustrated in Figure 1, the initial postal invitation, as well as the subsequent reminders (sent by post and email, respectively), led to noticeable increases in log-ins to the survey. Each of these mailings was followed by a temporary spike in log-ins to the survey mailing reminders, underscoring the effectiveness of targeted reminder communication.
Figure 1 shows the number of families who logged in daily to the questionnaires for children. The x-axis displays the survey period from October 2023 to February 2024. Envelope icons indicate the timing of contact strategies: Envelopes 1-4 represent the initial postal invitations, 4–5 the postal reminders, and 6–8 the email reminders sent to individuals who had logged in but had not started or completed the questionnaire.
Response Rates and Participant Characteristics
As part of the first wave of the COMO study (COMO 1), a total of 35,157 families from 177 sample points were invited to participate via letter mail. The field period extended from October 2023 to February 4, 2024. In total, 6,524 families (18.5% of the gross sample) logged into the online survey at least once. Of these, 5,596 families completed the child or parent-child questionnaire, and 5,741 families fully completed the parent questionnaire (i.e., ≥80% of relevant items answered). Complete data for both questionnaires are available for 5,240 families, corresponding to a response rate of 15.0%, according to the Response Rate 1 definition of the American Association for Public Opinion Research (2). The final dataset includes 6,097 families who completed at least one of the two questionnaires (table 1).
Response rates decrease with age (range of response rates: 9%-17%). In addition, the distribution of parents’ education in the net sample of the COMO study (i.e., both questionnaires completed, N=5240) slightly differs from official population statistics in Germany specifically the 2022 Microcensus: The proportion of children with parents who are educated on a higher level is larger in the net sample compared to the Microcensus. For families who completed both questionnaires the sample characteristics are presented in table 2. On average children and adolescents were 9.89 years old (SD=3.64). Implementation of a Weighting Procedure
To ensure national representativeness, the COMO study employed a multistage weighting approach addressing both the complex sampling design and observed selection biases. The procedure was based on a two-step-approach model comprising design and calibration weights. Design weights were derived from inverse selection probabilities using the Horvitz-Thompson estimator (5) to correct unequal inclusion probabilities arising from the two-stage sampling design (6).
Calibration weights were computed via iterative proportional fitting using marginal distributions from the 2022 German Microcensus. Variables included age, gender, federal state, urbanicity (BIK classification; BIK Aschpurwis + Behren GmbH, 2001) and parental education. The weighting process was jointly developed and implemented by KIT, the SOKO Institute, and GESIS.
The weighting strategy aimed to minimize bias from selective participation and improve generalizability to the target population. Application of both design and calibration weights brought key distributions – particularly parental education – closer to 2022 Microcensus benchmarks. While this enhances statistical accuracy and population-level estimates, residual nonresponse bias cannot be fully excluded.
Findings from the Qualitative Nonresponse Analysis
To explore potential reasons for nonresponse, approximately 50 documented telephone and written inquiries received during the COMO 1 survey were systematically analyzed. The content of these communications allowed for categorization of frequently mentioned barriers and motivations for nonparticipation. The findings revealed a range of relevant factors, including technical, informational-communicative, and socio-emotional aspects.
Reported technical issues included incorrect access codes, difficulties submitting the questionnaire, and temporarily inaccessible websites. Address-related problems were also noted (e.g., households without eligible children, relocation, or inaccurate contact data), as well as miscommunication—such as reminder letters sent after participation had already occurred. In addition, some participants expressed intentional refusal to take part, citing reasons such as lack of interest, concerns over data privacy, or skepticism toward the study’s relevance or its association with the COVID-19 pandemic.
Other responses highlighted language barriers, limited accessibility due to disability, or emotional distress linked to the pandemic. Of particular note were messages expressing a desire to be heard, as well as critical comments about pandemic-related policies – reflecting a broader need for participation and recognition. A summary of the identified nonresponse reasons is presented in table 3.
Discussion, Conclusion and Outlook
The first wave of the COMO study shows that differentiated communication strategies, incentives, and systematic weighting can partially reduce selection bias in population-based panel studies. Especially the staggered postal and electronic contact measures led to a measurable increase in registration rates. Nevertheless, the response rate remained low at 15%, possibly reflecting decreased willingness to participate in surveys since the COVID-19 pandemic.
Internationally, the COMO response rate is not exceptional. Ward and Edwards (2021) reported a 58% drop in response rates in the U.S. Current Population Survey after switching from face-to-face to telephone interviews during the pandemic, particularly among lower-educated and non-white groups, leading to systematic sample bias. In Germany, Lemcke et al. (2024) note that response rates below 20% have become a structural feature of health surveys despite intensive recruitment efforts. Still, they emphasize that low response rates do not necessarily reduce data quality if appropriate weighting and nonresponse analyses are used.
The findings of the COMO study can be contextualized alongside those of the COPSY study, which reported a higher response rate of 45.8% among parents of 7- to 17-year-olds in its first wave (14). However, that study also found selective nonresponse among younger children and disadvantaged families, confirming patterns seen in COMO – particularly underrepresentation of male, older, and socioeconomically weaker participants.
Strengths and Limitations
The COMO study has several methodological strengths: a two-stage stratified register-based sampling design, a systematically implemented multimodal contact and reminder system, and a weighting procedure calibrated to Microcensus data (2022) provide a solid basis for population-level analyses. Qualitative nonresponse analyses further offer valuable insights into structural and subjective barriers, such as technical issues, perceived irrelevance, or political disengagement – crucial for making future survey waves more inclusive and participatory.
Nonetheless, limitations remain. Despite diversity-sensitive materials, language barriers persisted. Underrepresentation of specific groups – especially older adolescents, and families with lower education –must be considered in data interpretation. These groups showed lower participation, and repeated contact attempts could not fully offset exclusion mechanisms. Non-monetary incentives were used in COMO to promote engagement. Previous research shows that higher monetary incentives can significantly increase response rates in large-scale surveys. In COMO, either non-monetary incentives or only low-value monetary incentives were used, which may have limited their impact on participation (1). Moreover, the qualitative nonresponse analyses were based on a limited dataset, leaving the reasons for non-participation unknown for the majority of non-respondents.
Conclusion and Outlook
The COMO study underscores the importance of a methodologically robust and socially sensitive approach to population-based research in sport and health sciences. Combining digital and analog outreach – such as community-based contact, low-threshold communication (e.g., email), and mobile platforms (e.g., websites, social media) – is essential to reach underrepresented groups. These findings provide practical methodological guidance for future health studies, particularly in combining traditional survey methods with participatory and accessible formats.
Results from COMO – and similar epidemiological studies – must always be interpreted in light of the realized sample. Despite careful recruitment, weighting, and qualitative follow-up, some subgroups – especially socioeconomically disadvantaged, low-education, or families facing language barriers—remain underrepresented. This can lead to systematic over- or underestimation of outcomes such as mental health, physical activity, or quality of life. Ding and Ekelund (7) highlight a structural tendency in physical activity research to overestimate health effects, especially when based on healthier, selective cohorts. This bias is reinforced by the “healthy volunteer effect” and by relying on single time-point data that fail to capture temporal confounding. Lesser et al. (12) similarly note the overrepresentation of privileged groups – white, well-educated, health-conscious individuals – in physical activity studies. This challenges external validity and raises ethical concerns, as those most in need of health promotion are often excluded. The authors argue that recruitment bias must be addressed as a central methodological issue. Epidemiological studies in sport and health science must therefore continually ask: for whom are these findings valid, and where do blind spots persist? Incorporating structural variables into weighting, engaging participants through ongoing feedback, and using participatory recruitment can help reduce bias and improve representativeness. Future waves of the COMO study offer the opportunity to further refine and test these strategies to support health equity – also at the methodological level.
Conflict of Interest
The authors have no conflict of interest.
Acknowledgements
The authors thank all participating families for their involvement, as well as the staff of the SOKO Institute and GESIS for their support in the sampling process, and the COMO Study Group for their collaboration.
Disclosure
The content of this paper reflects only the authors’ views and the rest of COMO study members are not responsible for it. The COMO study group was responsible for the design of the study and the acquisition of the data.
Ethical Approval
The study received ethical approval from the Ethics Committee of the Karlsruhe Institute of Technology (KIT), reference number A2023-078. Prior to participation, informed digital consent was obtained from legal guardians. Data protection complied with the German DSGVO; personal data were stored separately from survey data and will be deleted after project completion. The study is registered (13).
Funding
“This work has been developed in the research project “COMO-Study” (funding period 2023 - 2026). The national study to investigate the impact of the COVID-19 pandemic on the physical and mental health as well as the health behavior of children and adolescents against the background of socio-ecological contexts is funded by the German Federal Ministry of Education and Research (BMBF) under the grant number 01UP2222 in the funding line “Research on social consequences of the Corona pandemic”.”
Summary Box
This article addresses a key methodological challenge in contemporary population-based health research: declining and selective participation in digital surveys. It highlights the growing importance of reducing selection bias in longitudinal studies among youth, particularly in post-pandemic contexts. The COMO study contributes a novel framework for inclusive, equity-oriented recruitment and contact strategies in digital panel designs.
The study employs a probability-based, register-sampled online panel with a rigorous two-stage sampling design and calibrated weighting. It combines structured reminder protocols, inclusive materials, and non-monetary incentives to foster participation. This methodological approach advances current standards by integrating diverse engagement and weighting strategies to reduce bias in youth health surveys.
The first wave of the COMO study demonstrates the partial effectiveness of multimodal reminders and inclusive outreach in increasing participation rates. While selection bias persisted, especially among adolescents and lower-SES groups, the application of design and calibration weights improved representativeness. Qualitative insights further illuminate social, technical, and emotional barriers to participation.
The COMO study provides practical, evidence-based guidance on reducing selection bias in digital population surveys. It underscores the necessity of integrating participatory and inclusive elements in study design to ensure external validity. These findings contribute to the methodological discourse on health equity and the validity of population-level research outcomes.
- The effectiveness of incentives for research participation: A systematic review and meta-analysis of randomized controlled trials. PLoS One. 2022; 17: e0267534.
- Standard Definitions: Final Dispositions of Case Codes and Outcome Rates for Surveys. 10th ed. AAPOR; 2023: 1-90. https: //aapor.org/wp-content/uploads/2024/03/Standards-Definitions-10th-edition.pdf [7 May 2025].
- BIK Regionen. Ballungsräume, Stadtregionen, Mittel-/Unterzentrengebiete. Methodenbeschreibung zur Aktualisierung 2000. Hamburg; 2001.
- The CASMIN educational classification in international comparative research. In: Hoffmeyer-Zlotnik JH, Wolf C, eds. Advances in Cross-National Comparison. A European Working Book for Demographic and Socio-Economic Variables. New York: Kluwer Academic/Plenum Publishers; 2003: 221-244.
- Foundations of Inference in Survey Sampling. New York: John Wiley & Sons; 1977.
- A constructive procedure for unbiased controlled rounding. J Am Stat Assoc. 1987; 82: 520-524.
- From London buses to activity trackers: A reflection of 70 years of physical activity research. J Sport Health Sci. 2024; 13: 736-738.
- Internet, Phone, Mail, and Mixed-Mode Surveys: The Tailored Design Method. 4th ed. Hoboken, NJ: Wiley; 2014.
- The distance-weighted k-nearest-neighbor rule. IEEE Trans Syst Man Cybern. 1976; SMC-6: 325-327.
- Socioeconomic status and subjective social status measurement in KiGGS Wave 2. Journal of Health Monitoring. Robert Koch Institute, Berlin. 2018; 3.
- Health in Germany: Establishment of a population-based health panel - Journal of Health Monitoring S2/2024. J Health Monit. 2024; 9: 2-22.
- Participant bias in community-based physical activity research: A consistent limitation? J Phys Act Health. 2024; 21: 109-112.
- COMO Study – The impact of the COVID-19 pandemic on the physical and mental health and health behavior of children and adolescents against the background of socioecological contexts of Germany. Open Science Framework. Published 2023. https: //osf.io/68fdn [30 April 2025]
- Three years into the pandemic: results of the longitudinal German COPSY study on youth mental health and health-related quality of life. Front Public Health. 2023; 11: 1129073.
- Gewichtung in der Praxis (Version 1.0). GESIS Survey Guidelines. Mannheim: GESIS – Leibniz-Institut für Sozialwissenschaften; 2020.
- Methodenbericht für die COMO-Studie. Unpublished; 2025.KIT, Institute for Sport and Sport Science.
- STROBE Initiative. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. J Clin Epidemiol. 2008; 61: 344-349.
- Assessing the link between survey interview method and survey outcomes: Evidence from the CPS and the COVID-19 pandemic. Labour Econ. 2021; 72: 102060.
Dr. Claudia Niessner
Institute for Sport and Sport Science
Karlsruhe Institute of Technology
Engler-Bunte Ring 15, 76131 Karlsruhe, Germany
claudia.niessner@kit.edu