Skip to main content

Methodology

How DNA Explore analyzes your genetic data

Polygenic Risk Score (PRS) Calculation

Polygenic risk scores combine the effects of many genetic variants to estimate predisposition for a specific condition. For each PRS model, we use published effect sizes (beta coefficients) from genome-wide association studies (GWAS) to weight individual SNP contributions.

The calculation follows these steps:

  1. For each SNP in the model, we count the number of risk alleles (0, 1, or 2) in your genotype
  2. Each count is multiplied by the GWAS-derived beta coefficient for that SNP
  3. The weighted contributions are summed to produce a raw polygenic score
  4. Your raw score is compared against a simulated population distribution to determine a percentile
  5. Percentiles are mapped to risk tiers: Low (<25th), Average (25th–75th), Elevated (75th–90th), High (>90th)

GWAS Reference Populations

The majority of GWAS studies in our models were conducted in European-descent cohorts. This means:

  • Effect sizes and allele frequencies may differ in non-European populations
  • Risk percentiles may be less accurate for individuals of non-European ancestry
  • Population-specific linkage disequilibrium patterns can affect score transferability

We display this limitation prominently alongside PRS results. As more diverse GWAS data becomes available, we will update our models accordingly.

Genotyping Chip Coverage

Consumer genotyping chips (such as 23andMe's Illumina-based arrays) test a subset of the genome — typically 600,000–700,000 SNPs out of approximately 10 million known common variants. This means:

  • Not all SNPs in a PRS model may be present on your chip — we report coverage percentage for each model
  • Rare variants (<1% minor allele frequency) are often missing from consumer chips
  • Structural variants, copy number variants, and insertions/deletions may not be captured
  • Imputation is not performed — we only use directly genotyped variants

Pharmacogenomics Inference

Drug metabolism profiles are inferred using star allele nomenclature based on Clinical Pharmacogenetics Implementation Consortium (CPIC) guidelines. The process:

  1. Key SNPs for each pharmacogene (CYP2C19, CYP2D6, CYP2C9, DPYD, TPMT, SLCO1B1) are looked up in your genotype data
  2. Detected variants are matched to known star alleles (e.g., *2, *3, *17)
  3. Star allele combinations form a diplotype (e.g., *1/*2)
  4. Diplotypes are mapped to phenotypes (poor, intermediate, normal, rapid, ultrarapid metabolizer)
  5. Affected drugs and clinical recommendations are sourced from CPIC guidelines

Limitations: Consumer genotyping chips may not capture all star alleles for a given gene. CYP2D6 is particularly complex due to gene deletions, duplications, and hybrid alleles that cannot be reliably detected from microarray data alone. Clinical pharmacogenomic testing uses sequencing and copy number analysis for more comprehensive results.

Nutrigenomics Rules

Nutrition recommendations are based on a curated rule set that maps specific genotype combinations to nutrient metabolism effects. Each rule specifies:

  • The gene(s) and SNP(s) involved (e.g., MTHFR rs1801133)
  • The triggering genotype pattern (e.g., homozygous TT)
  • The affected nutrient and metabolic pathway
  • General dietary patterns to discuss with a healthcare provider

Recommendations are framed qualitatively (e.g., “consider increasing folate-rich foods”) rather than with specific dosages, which should be determined in consultation with a registered dietitian or healthcare provider.

Trait Interpretation

Individual SNP interpretations in the Trait Explorer are based on published research associations. Each variant in our curated catalog includes the source study, effect allele, and observed genotype associations. Interpretations use probabilistic language to reflect the statistical nature of genetic associations.

AI-Generated Content

The AI chat, health report, and variant explanation features use Anthropic's Claude (Sonnet 4.5) to generate natural language summaries. These features:

  • Receive variant summaries (not raw files) as context
  • Are instructed to use probabilistic language and never provide specific dosages
  • May contain errors or inaccuracies inherent to large language model outputs
  • Should be treated as educational starting points, not clinical guidance

Data Sources

Our SNP catalog, PRS models, and pharmacogenomics rules are derived from:

  • Published genome-wide association studies (GWAS Catalog)
  • Clinical Pharmacogenetics Implementation Consortium (CPIC) guidelines
  • PharmGKB pharmacogenomics knowledge base
  • ClinVar variant database (for monogenic variants)
  • Peer-reviewed nutrigenomics literature