The Genomic Data Gap: Why Some Populations Are Missing from Precision Medicine

The Genomic Data Gap: Why Some Populations Are Missing from Precision Medicine

June 24, 20268 min read

Introduction

Precision medicine aims to tailor disease prevention, diagnosis, and treatment according to the biological characteristics of individual patients, including genetic variation, environmental exposures, and lifestyle factors. Advances in genomic sequencing and large-scale biomedical datasets have made it increasingly possible to identify genetic variants associated with disease risk, drug response, and therapeutic outcomes. These developments have accelerated the integration of genomic information into clinical decision-making across fields such as oncology, cardiology, and pharmacogenomics.

However, the success of precision medicine depends heavily on the representativeness of genomic data used in biomedical research. If genetic studies primarily include individuals from a narrow set of populations, the insights derived from these studies may not be applicable to the broader global population. This issue often described as the genomic data gap refers to the underrepresentation of many ancestral populations in genomic research and biomedical datasets.

Historically, biomedical research has disproportionately focused on populations of European ancestry. As a result, large genomic databases and genome-wide association studies (GWAS) contain far fewer samples from African, Latin American, Indigenous, Middle Eastern, and many Asian populations. The consequences of this imbalance extend beyond research findings: they can influence the accuracy of genetic tests, the development of predictive models, and ultimately the equity of precision medicine.

Understanding the causes and implications of this genomic data gap is essential for clinicians, researchers, and healthcare leaders working to ensure that precision medicine benefits diverse populations worldwide.


Historical Underrepresentation in Biomedical Research

The lack of diversity in genomic research is well documented. Analyses of genome-wide association studies have shown that a large majority of participants are of European ancestry. In some assessments, more than 85–90% of GWAS samples originate from European populations, despite the fact that these populations represent a minority of the global population.

This imbalance has historical roots in the geographic distribution of biomedical research infrastructure. Many early genomic initiatives were established in North America and Europe, where large biobanks and sequencing centers were developed. These projects generated extensive genomic datasets but often lacked participation from populations outside these regions.

The early stages of genomic research also prioritized diseases prevalent in high-income countries and populations with access to large academic medical centers. As sequencing technologies became more accessible, the global diversity of research participants gradually increased. Nevertheless, substantial disparities remain in the representation of many populations.

The consequences of this imbalance extend to multiple aspects of biomedical research. Because genetic variation differs across populations due to evolutionary history and demographic patterns, underrepresentation can limit the ability of researchers to fully understand the genetic basis of disease across humanity.


Contributing Factors

Several structural and historical factors contribute to the genomic data gap. These factors often intersect and reinforce each other, shaping which populations are included in biomedical research and which remain underrepresented.

Access to Research Infrastructure

One of the most significant contributors to the genomic data gap is the uneven distribution of research infrastructure. Conducting large-scale genomic studies requires specialized laboratories, sequencing platforms, data storage systems, and computational resources.

High-income countries typically possess well-established biomedical research ecosystems, including national biobanks and large clinical research networks. In contrast, many low- and middle-income countries face challenges related to funding, infrastructure, and technical capacity.

These disparities limit opportunities for researchers in underrepresented regions to participate in large genomic studies or establish population-specific genomic databases. As a result, global genomic datasets may reflect the populations where research infrastructure is concentrated rather than the diversity of the world’s population.


Funding Disparities

Research funding is another critical factor shaping participation in genomic studies. Funding for large-scale sequencing initiatives often comes from national research agencies, philanthropic organizations, or private sector investments.

Countries with well-funded biomedical research programs are better positioned to conduct population-scale genomic studies. In contrast, researchers in lower-resource settings may face challenges obtaining funding for genomic projects, particularly when competing with institutions that already possess extensive infrastructure.

Funding disparities can also affect international collaborations. While global research partnerships can expand genomic diversity, they sometimes rely on data collection in underrepresented regions without equal investment in local research capacity. Ensuring equitable partnerships is therefore essential for addressing the genomic data gap.


Trust and Historical Injustices

Beyond logistical challenges, social and historical factors also influence participation in biomedical research. In many communities, historical injustices in medical research have contributed to mistrust toward scientific institutions.

Examples such as unethical clinical studies, lack of informed consent, and exploitation of vulnerable populations have shaped perceptions of biomedical research in certain communities. These experiences can affect willingness to participate in genetic studies, particularly when concerns exist regarding privacy, data ownership, or potential misuse of genetic information.

Building trust requires transparent governance structures, community engagement, and culturally sensitive research practices. Involving communities as partners in research design and decision-making can help address concerns and encourage participation in genomic studies.


Impact on Healthcare

The genomic data gap has significant implications for clinical practice and healthcare equity. Because many genomic tools are developed using datasets dominated by specific populations, their clinical performance may vary across different patient groups.

Limited Clinical Applicability of Genomic Tests

Genomic tests including risk prediction models, pharmacogenomic markers, and polygenic risk scores are often developed using genetic associations identified in GWAS datasets. When these datasets lack diversity, the resulting models may perform less accurately for populations that were not well represented in the original research.

For example, genetic variants associated with disease risk in one population may not have the same predictive value in another population due to differences in allele frequencies and linkage disequilibrium patterns. This can lead to reduced accuracy in risk prediction models when applied across diverse populations.

Similarly, pharmacogenomic tests that guide medication selection or dosing may not capture relevant genetic variants present in underrepresented populations. This limitation could influence treatment effectiveness or increase the risk of adverse drug reactions.

Studies have emphasized that disparities in genomic data can directly affect the clinical usefulness of precision medicine tools across populations.


Risk of Widening Health Disparities

If genomic research continues to rely primarily on datasets from a limited number of populations, there is a risk that precision medicine could inadvertently reinforce existing health disparities.

Patients from populations that are well represented in genomic datasets may benefit more from advances in genetic diagnostics and targeted therapies. In contrast, patients from underrepresented populations may receive less accurate risk assessments or have fewer opportunities to benefit from genomic-guided therapies.

This potential imbalance underscores the importance of expanding genomic diversity to ensure that the benefits of precision medicine are distributed equitably.


Solutions: Toward Inclusive Precision Medicine

Addressing the genomic data gap requires coordinated efforts across multiple sectors of biomedical research and healthcare systems. Several strategies are emerging to increase diversity in genomic datasets and ensure that precision medicine benefits a broader range of populations.

Inclusive Research Initiatives

Large-scale initiatives are being developed to recruit participants from historically underrepresented populations. These initiatives aim to build more representative genomic databases by including diverse populations in sequencing projects and clinical research studies.

Programs that focus on global genomic diversity are particularly important for expanding representation in genomic datasets. Such initiatives often combine genomic sequencing with clinical data collection to better understand disease risk across populations.

Expanding participation in genomic research not only improves the generalizability of research findings but also enables the discovery of genetic variants that may be unique to specific populations.


Building Local Research Capacity

Investing in research infrastructure and training programs in underrepresented regions is another critical strategy. Capacity-building initiatives can support the development of local sequencing facilities, bioinformatics expertise, and clinical genomics programs.

Strengthening local research ecosystems enables scientists from diverse regions to lead studies relevant to their populations. It also promotes more equitable research collaborations and ensures that genomic data are generated and interpreted within appropriate cultural and clinical contexts.


Community Engagement and Ethical Governance

Ethical frameworks and community engagement strategies are essential for building trust in genomic research. Transparent policies regarding data ownership, consent, and benefit sharing can help address concerns about the use of genetic information.

Community-based participatory research models emphasize collaboration between researchers and community stakeholders throughout the research process. This approach ensures that research priorities align with community needs and that participants understand how their data will be used.

Such practices are increasingly recognized as essential components of responsible genomic research.


Future Directions in Precision Medicine

Efforts to close the genomic data gap are gaining momentum as the biomedical community recognizes the importance of diversity in genomic datasets. Future research directions may include:

  • Expanding global biobank initiatives that include diverse populations

  • Developing statistical methods that improve the transferability of genetic risk models across populations

  • Integrating genomic data with environmental and social determinants of health

  • Strengthening international collaborations focused on genomic diversity

Advances in sequencing technologies and data-sharing frameworks may also make it easier to collect and analyze genomic data from diverse populations while maintaining strong privacy protections.


Conclusion

Precision medicine holds significant promise for improving healthcare by tailoring prevention and treatment strategies to individual biological characteristics. However, realizing this potential requires genomic datasets that reflect the diversity of the global population.

The genomic data gap driven by historical underrepresentation, disparities in research infrastructure and funding, and challenges related to trust and community engagement remains a major barrier to equitable precision medicine.

Addressing this gap will require sustained efforts to expand diversity in genomic research, invest in global research capacity, and develop ethical frameworks that foster trust and collaboration. By ensuring that genomic research includes diverse populations, the biomedical community can help ensure that the benefits of precision medicine are accessible to patients worldwide.


References

  1. Sirugo G, Williams SM, Tishkoff SA. The Missing Diversity in Human Genetic Studies. Cell.

  2. Hindorff LA et al. Prioritizing diversity in human genomics research. Nature Reviews Genetics.

  3. Popejoy AB, Fullerton SM. Genomics is failing on diversity. Nature.

  4. Corpas M et al. Bridging genomics’ diversity gap. Cell Genomics.

  5. Bustamante CD et al. Genomics for the world. Nature Reviews Genetics.


Back to Blog