
Why Diversity in Genomic Research Matters for Precision Medicine
Introduction
Precision medicine seeks to tailor prevention strategies, diagnostic tools, and therapeutic interventions to the biological characteristics of individual patients. Central to this approach is the use of genomic data to understand disease susceptibility, identify therapeutic targets, and predict treatment responses. Over the past two decades, advances in sequencing technologies and large-scale genomic initiatives have generated extensive datasets that have accelerated discoveries in human genetics and disease biology.
However, the effectiveness of precision medicine depends heavily on the diversity and representativeness of the genomic datasets on which clinical knowledge is based. Many genetic associations, risk models, and variant interpretations are derived from studies involving large populations of participants. If these populations do not reflect global genetic diversity, the resulting knowledge may not be equally applicable to all individuals.
Recent analyses of genomic research cohorts have demonstrated a substantial imbalance in participant representation. Individuals of European ancestry account for a disproportionate share of participants in genome-wide association studies (GWAS) and other genomic datasets. As a result, genomic discoveries and predictive models derived from these datasets may not perform as accurately for individuals from other ancestral backgrounds.
The implications of this imbalance extend beyond research. Clinical tools such as polygenic risk scores, variant interpretation frameworks, and genomic diagnostics rely on reference datasets that may not adequately represent global populations. Addressing this disparity is therefore essential for ensuring that precision medicine benefits diverse patient populations.
The Current Problem: Limited Diversity in Genomic Research
Historical Patterns of Representation
Genome-wide association studies have been instrumental in identifying genetic variants associated with disease risk. Since the early 2000s, thousands of GWAS studies have linked genomic variants to conditions such as cardiovascular disease, diabetes, autoimmune disorders, and cancer. These discoveries have contributed significantly to the development of precision medicine.
However, analyses of GWAS datasets have consistently shown a strong overrepresentation of individuals of European ancestry. A widely cited analysis published in Nature Genetics reported that more than 75–80% of participants in GWAS studies were of European descent, with much smaller representation from African, Asian, Latin American, and Indigenous populations (Martin et al., 2019).
Several factors have contributed to this imbalance. Many large genomic studies were initially conducted in North America and Europe, where research infrastructure and funding for large-scale genetic studies were more readily available. Recruitment efforts often relied on established biobanks and clinical cohorts that predominantly included participants from these regions.
In addition, logistical and ethical considerations have historically limited the inclusion of diverse populations in genomic research. These include disparities in research funding, differences in healthcare infrastructure, regulatory barriers, and historical mistrust between certain communities and biomedical institutions.
Implications for Genomic Databases
The lack of diversity in genomic datasets has implications for the reference databases used in variant interpretation. Clinical genetic testing frequently relies on reference population databases to determine whether a variant is common or rare within a population. Variants that are rare in one population may be more common in another.
If reference datasets disproportionately represent certain populations, clinicians may misinterpret genetic variants in individuals from underrepresented groups. This can lead to uncertainty in diagnostic interpretation or incorrect classification of variants.
Clinical Consequences of Limited Diversity
Miscalibrated Polygenic Risk Scores
One of the most prominent consequences of limited diversity in genomic research is the reduced accuracy of polygenic risk scores (PRS) in underrepresented populations. Polygenic risk scores combine the effects of many genetic variants to estimate an individual's risk of developing a particular disease.
PRS models are typically developed using datasets derived from genome-wide association studies. When these datasets predominantly include individuals of European ancestry, the resulting models may not perform well in individuals from other populations.
Studies have shown that polygenic risk scores developed in European populations often lose predictive accuracy when applied to individuals of African, East Asian, or Latin American ancestry. This reduction in accuracy occurs because allele frequencies and linkage disequilibrium patterns differ among populations.
For example, a polygenic risk score designed to predict cardiovascular disease risk in European populations may not provide the same predictive value in individuals of African ancestry. As a result, relying on such models without proper calibration could contribute to disparities in disease risk assessment and prevention strategies.
Inaccurate Variant Interpretation
Another clinical concern relates to the interpretation of genetic variants identified through clinical sequencing. Genetic testing laboratories classify variants according to their potential clinical significance, often using categories such as pathogenic, likely pathogenic, variant of uncertain significance (VUS), or benign.
Variant interpretation relies heavily on population frequency data derived from genomic reference databases. When these databases lack sufficient representation of diverse populations, clinicians may encounter greater uncertainty when interpreting variants identified in patients from underrepresented groups.
Research has demonstrated that individuals of African ancestry are more likely to receive results categorized as variants of uncertain significance during genetic testing. This occurs in part because fewer individuals from these populations have been included in genomic reference datasets.
Increased rates of uncertain variant classification can complicate clinical decision-making and may limit the utility of genomic testing in certain populations.
Implications for Precision Medicine
The goal of precision medicine is to deliver more accurate and individualized healthcare. However, if genomic datasets fail to represent global genetic diversity, precision medicine may inadvertently reinforce existing health disparities.
For clinicians, this means that genomic tools and predictive models must be interpreted within the context of population diversity. For researchers, it highlights the importance of developing inclusive genomic studies that reflect the full spectrum of human genetic variation.
Solutions: Expanding Diversity in Genomic Research
Expanding Genomic Datasets
One of the most direct strategies for addressing disparities in genomic representation is expanding the diversity of participants in genomic studies. Increasing representation of individuals from historically underrepresented populations will improve the accuracy of genomic databases, variant interpretation frameworks, and disease risk models.
Several large-scale initiatives have been launched to increase diversity in genomic research. Programs such as the All of Us Research Program in the United States aim to recruit participants from a wide range of racial, ethnic, and socioeconomic backgrounds. Similarly, international initiatives are working to build genomic datasets that reflect global populations.
Increasing diversity in genomic studies requires careful attention to community engagement, ethical considerations, and equitable access to research participation. Building trust with communities that have historically been excluded from biomedical research is essential for successful recruitment and long-term collaboration.
International Research Collaborations
Global collaboration is another critical component of improving diversity in genomic research. Many countries and regions have unique genetic backgrounds shaped by migration patterns, population history, and environmental factors. International partnerships can help integrate genomic data from diverse populations and improve the global applicability of genomic discoveries.
Examples of international genomic initiatives include efforts to sequence large cohorts in Africa, Latin America, and Asia. These projects aim to generate genomic datasets that better reflect global genetic diversity and support research into population-specific disease risk factors.
Collaborative frameworks that share data, methodologies, and computational resources can accelerate progress in this area while ensuring that participating communities benefit from the research outcomes.
Ethical and Policy Considerations
Expanding diversity in genomic research also requires attention to ethical issues such as informed consent, data privacy, and equitable access to genomic technologies. Policies governing genomic data sharing must balance scientific progress with the protection of participant rights and cultural considerations.
Many experts emphasize the importance of community engagement and transparent governance in genomic research initiatives. Collaborative approaches that involve local researchers and institutions can help ensure that genomic research benefits participating populations.
Future Research Directions
Improving diversity in genomic datasets will not only address current limitations but also create opportunities for new discoveries. Studies involving diverse populations have already revealed previously unidentified genetic variants associated with disease risk and therapeutic response.
Future research may focus on developing population-specific genomic reference panels, improving computational methods for cross-population risk prediction, and integrating genomic data with environmental and social determinants of health.
Advances in sequencing technologies are also making it increasingly feasible to conduct large-scale genomic studies in diverse populations. As sequencing costs continue to decline and bioinformatics tools improve, researchers will be able to generate more comprehensive and representative genomic datasets.
Ultimately, improving diversity in genomic research will enhance the scientific foundation of precision medicine and help ensure that genomic discoveries translate into equitable clinical benefits.
Conclusion
Precision medicine relies on genomic data to guide diagnosis, prevention, and treatment strategies. However, the effectiveness of these approaches depends on the diversity of the datasets used to generate clinical knowledge. Current genomic research has been shaped by historical imbalances in participant representation, with individuals of European ancestry comprising the majority of participants in many genomic studies.
This imbalance has important clinical implications, including reduced accuracy of polygenic risk scores and increased uncertainty in variant interpretation for individuals from underrepresented populations. Addressing these challenges requires expanding the diversity of genomic research cohorts, strengthening international collaborations, and developing ethical frameworks that support inclusive research practices.
By improving representation in genomic datasets, the scientific and medical communities can ensure that precision medicine evolves in a way that benefits all populations. As genomic technologies continue to advance, promoting diversity in research will remain a critical priority for achieving equitable and effective healthcare.
References
Martin, A. R., et al. (2019). Clinical use of current polygenic risk scores may exacerbate health disparities. Nature Genetics.
https://www.nature.com/articles/s41588-019-0373-8
Popejoy, A. B., & Fullerton, S. M. (2016). Genomics is failing on diversity. Nature.
Sirugo, G., Williams, S. M., & Tishkoff, S. A. (2019). The missing diversity in human genetic studies. Cell.