Polygenic Risk Scores and the Genomic Data Gap: Promise, Limitations, and Equity in Precision Medicine

Polygenic Risk Scores and the Genomic Data Gap: Promise, Limitations, and Equity in Precision Medicine

May 26, 20269 min read

Introduction

Precision medicine aims to tailor disease prevention, diagnosis, and treatment based on individual biological characteristics, including genomic variation. Among the emerging tools in this field are polygenic risk scores (PRS), which estimate an individual’s genetic predisposition to complex diseases by aggregating the effects of multiple genetic variants across the genome. PRS models have demonstrated potential in predicting risk for conditions such as coronary artery disease, type 2 diabetes, breast cancer, and psychiatric disorders.

However, as PRS methodologies move closer to clinical implementation, significant scientific and ethical questions remain. One of the most prominent concerns is the uneven performance of PRS across populations, largely due to the historical underrepresentation of diverse populations in genomic research. This disparity reflects a broader issue in precision medicine often referred to as the genomic data gap, which limits the equitable application of genomic tools in healthcare.

This article examines the development and promise of polygenic risk scores, the challenges posed by population bias in genomic datasets, and ongoing efforts to improve the inclusivity and clinical utility of precision medicine.


Polygenic Risk Scores: Promise and Limitations Across Populations

What Are Polygenic Risk Scores?

Polygenic risk scores quantify the cumulative impact of many genetic variants typically single nucleotide polymorphisms (SNPs) associated with disease risk. Unlike Mendelian conditions, which are caused by mutations in a single gene, many common diseases arise from the combined influence of numerous variants, each contributing a small effect.

A polygenic risk score is calculated by summing the number of risk alleles carried by an individual, weighted by the estimated effect size of each variant derived from large-scale genetic studies. The resulting score represents a statistical estimate of an individual’s genetic susceptibility relative to a reference population.

In research settings, PRS has shown promise in identifying individuals who may benefit from early screening or preventive interventions. For example, individuals in the highest percentiles of PRS for coronary artery disease can have a risk comparable to that conferred by certain monogenic mutations, suggesting potential clinical relevance in population risk stratification.


How Polygenic Risk Scores Are Developed

Most PRS models are derived from genome-wide association studies (GWAS). These studies examine millions of genetic variants across the genomes of large cohorts to identify associations between specific variants and disease outcomes.

GWAS identify variants that occur more frequently in individuals with a particular disease compared with controls. Each associated variant is assigned an effect size that reflects the strength of the association. Polygenic risk scores aggregate these variants using statistical models that combine their weighted contributions.

The development of PRS typically involves several steps:

  1. Discovery phase – GWAS identifies variants associated with disease in large cohorts.

  2. Variant selection – Researchers determine which variants to include in the score.

  3. Weighting – Each variant is assigned a coefficient based on its estimated effect size.

  4. Validation – The model is tested in independent cohorts to evaluate predictive performance.

Advances in computational genomics and the increasing availability of biobank-scale datasets have accelerated the development of PRS models. Several national and international biobanks now contain genetic and phenotypic data from hundreds of thousands of participants, enabling large-scale risk prediction research.

Despite these advances, the ancestral composition of these datasets remains highly uneven, creating significant limitations for global clinical implementation.


Challenges: Reduced Predictive Accuracy in Non-European Populations

A major limitation of current PRS models is that their predictive accuracy varies substantially across populations. The majority of GWAS participants to date have been of European ancestry, meaning that the variants and effect sizes used in PRS models are often derived from European populations.

As a result, PRS developed from these datasets frequently perform less accurately in individuals from other ancestral backgrounds. This reduced performance is driven by several biological and methodological factors:

Differences in Genetic Architecture

Patterns of genetic variation differ across populations due to evolutionary history, migration, and demographic events. Variants that are common in one population may be rare in another, and the linkage patterns between variants can differ significantly.

Because GWAS findings rely on these patterns, variants associated with disease in one population may not capture the same risk information in another.

Linkage Disequilibrium Variation

GWAS signals often identify markers that are correlated with causal variants through linkage disequilibrium (LD) rather than the causal variants themselves. LD patterns vary between populations, meaning that the same marker may not accurately represent the causal variant in different ancestries.

Sample Size Disparities

The statistical power of GWAS depends heavily on sample size. European ancestry cohorts remain disproportionately large relative to other populations, resulting in more precise effect size estimates for those populations.

Consequently, PRS models trained on these datasets can lose predictive performance when applied to underrepresented populations.

Empirical studies have demonstrated that PRS predictive accuracy can decrease substantially when applied across populations. In some cases, performance declines by more than 50% when European-derived scores are applied to African ancestry populations, highlighting the importance of expanding diversity in genomic research.


Improving PRS Models Through Multi-Ancestry Genomic Datasets

Recognizing these limitations, researchers are developing strategies to improve the generalizability of polygenic risk scores.

Increasing Diversity in Genomic Cohorts

One of the most direct approaches is to expand genomic studies to include participants from diverse ancestral backgrounds. Several international initiatives are working to address this gap by building large-scale genomic resources representing populations that have historically been underrepresented in biomedical research.

Examples include national precision medicine programs and international collaborations focused on genomic diversity.

Multi-Ancestry GWAS

Another strategy involves conducting multi-ancestry genome-wide association studies that combine data from multiple populations. These studies can identify variants that are shared across populations as well as population-specific risk loci.

Incorporating diverse populations into discovery cohorts improves the robustness of effect size estimates and increases the likelihood that PRS models will perform consistently across populations.

Improved Statistical Methods

Researchers are also developing computational approaches that explicitly model population structure and ancestry-specific genetic architecture. These methods aim to optimize PRS performance by integrating information from multiple datasets while accounting for differences in linkage disequilibrium patterns.

Although these strategies show promise, they depend on continued investment in global genomic data infrastructure and equitable participation in biomedical research.


Future Applications: Population Screening Programs

If these challenges can be addressed, polygenic risk scores may eventually support several clinical applications.

Risk Stratification

PRS could complement traditional clinical risk factors such as age, family history, and lifestyle in identifying individuals at elevated risk for complex diseases. This information could inform earlier screening or preventive interventions.

Preventive Medicine

Genetic risk information may enable clinicians to identify individuals who might benefit from intensified lifestyle interventions, pharmacologic prevention, or more frequent monitoring.

Population Screening

In the future, PRS could potentially be incorporated into population-level screening programs. For example, individuals with high genetic risk for cardiovascular disease might receive earlier lipid screening or targeted preventive therapy.

However, the clinical implementation of PRS requires careful evaluation of clinical validity, utility, and equity. Without representative genomic datasets, there is a risk that these tools could inadvertently widen existing health disparities.


The Genomic Data Gap: Why Some Populations Are Missing from Precision Medicine

Historical Underrepresentation in Biomedical Research

The limitations of polygenic risk scores reflect a broader structural issue in biomedical research: the underrepresentation of many populations in genomic studies.

Historically, most large-scale genetic studies have been conducted in Europe and North America. As a result, individuals of European ancestry make up the majority of participants in genomic databases and biobanks.

Analyses of GWAS datasets have shown that more than three-quarters of participants are of European ancestry, despite representing a minority of the global population. This imbalance has significant implications for the clinical translation of genomic research.


Contributing Factors

Several factors contribute to the uneven representation of populations in genomic research.

Access to Research Infrastructure

Large-scale genomic research requires advanced laboratory infrastructure, computational resources, and biobanking capabilities. These resources are unevenly distributed globally, limiting the ability of some regions to participate fully in genomic studies.

Funding Disparities

Research funding is concentrated in high-income countries. As a result, investigators in lower-resource settings often face barriers in launching large-scale genomic research initiatives.

Funding disparities can also influence which populations are prioritized in international research collaborations.

Trust and Historical Injustices

Historical abuses in biomedical research have created understandable mistrust in some communities. Ethical concerns surrounding data ownership, consent, and the potential misuse of genetic information can influence participation in genomic studies.

Building trust requires transparent governance, community engagement, and equitable partnerships between researchers and participating populations.


Impact on Healthcare

The genomic data gap has direct consequences for clinical practice and healthcare systems.

Limited Clinical Applicability of Genomic Tests

Many genomic tests including PRS models and pharmacogenomic markers have been validated primarily in populations of European ancestry. This can limit their predictive accuracy or clinical relevance in other populations.

For clinicians, this means that the performance of genomic tests may vary depending on patient ancestry, complicating clinical decision-making.

Risk of Widening Health Disparities

If precision medicine tools are developed using non-representative datasets, they may disproportionately benefit populations already well represented in research. This could exacerbate existing health disparities rather than reducing them.

Addressing this challenge requires a deliberate focus on equity in both research design and clinical implementation.


Solutions: Toward Inclusive Precision Medicine

Efforts to close the genomic data gap are underway across multiple sectors of biomedical research.

Inclusive Research Initiatives

Several global initiatives aim to increase the diversity of genomic datasets by recruiting participants from underrepresented populations. These programs emphasize community engagement, ethical governance, and equitable benefit sharing.

Capacity Building

Building genomic research capacity in low- and middle-income countries is another key strategy. Investments in infrastructure, training programs, and collaborative networks can enable local researchers to lead genomic studies relevant to their populations.

Policy and Ethical Frameworks

Policymakers and research institutions are also developing frameworks to ensure that genomic research is conducted responsibly. These frameworks address issues such as data sovereignty, informed consent, and equitable access to research benefits.

Comprehensive reviews have emphasized that improving diversity in genomic research is essential for both scientific validity and health equity.


Conclusion

Polygenic risk scores represent a promising tool in the evolving landscape of precision medicine. By integrating genetic information across the genome, PRS models offer new opportunities for disease risk prediction and preventive healthcare.

However, the scientific and clinical utility of these tools is closely tied to the representativeness of the genomic data used to develop them. The historical underrepresentation of many populations in genomic research has created significant limitations in PRS performance and broader challenges for the equitable implementation of precision medicine.

Addressing these issues will require sustained efforts to diversify genomic datasets, strengthen global research infrastructure, and foster trust with communities historically excluded from biomedical research.

As precision medicine continues to advance, ensuring that genomic innovations benefit diverse populations will be essential to achieving its central goal: delivering more accurate, equitable, and personalized healthcare.

Back to Blog