Methodology
How DNA Explore analyzes your genetic data
Polygenic Risk Score (PRS) Calculation
Polygenic risk scores combine the effects of many genetic variants to estimate predisposition for a specific condition. For each PRS model, we use published effect sizes (beta coefficients) from genome-wide association studies (GWAS) to weight individual SNP contributions.
The calculation follows these steps:
- For each SNP in the model, we count the number of risk alleles (0, 1, or 2) in your genotype
- Each count is multiplied by the GWAS-derived beta coefficient for that SNP
- The weighted contributions are summed to produce a raw polygenic score
- Your raw score is compared against a simulated population distribution to determine a percentile
- Percentiles are mapped to risk tiers: Low (<25th), Average (25th–75th), Elevated (75th–90th), High (>90th)
GWAS Reference Populations
The majority of GWAS studies in our models were conducted in European-descent cohorts. This means:
- Effect sizes and allele frequencies may differ in non-European populations
- Risk percentiles may be less accurate for individuals of non-European ancestry
- Population-specific linkage disequilibrium patterns can affect score transferability
We display this limitation prominently alongside PRS results. As more diverse GWAS data becomes available, we will update our models accordingly.
Genotyping Chip Coverage
Consumer genotyping chips (such as 23andMe's Illumina-based arrays) test a subset of the genome — typically 600,000–700,000 SNPs out of approximately 10 million known common variants. This means:
- Not all SNPs in a PRS model may be present on your chip — we report coverage percentage for each model
- Rare variants (<1% minor allele frequency) are often missing from consumer chips
- Structural variants, copy number variants, and insertions/deletions may not be captured
- Imputation is not performed — we only use directly genotyped variants
Pharmacogenomics Inference
Drug metabolism profiles are inferred using star allele nomenclature based on Clinical Pharmacogenetics Implementation Consortium (CPIC) guidelines. The process:
- Key SNPs for each pharmacogene (CYP2C19, CYP2D6, CYP2C9, DPYD, TPMT, SLCO1B1) are looked up in your genotype data
- Detected variants are matched to known star alleles (e.g., *2, *3, *17)
- Star allele combinations form a diplotype (e.g., *1/*2)
- Diplotypes are mapped to phenotypes (poor, intermediate, normal, rapid, ultrarapid metabolizer)
- Affected drugs and clinical recommendations are sourced from CPIC guidelines
Limitations: Consumer genotyping chips may not capture all star alleles for a given gene. CYP2D6 is particularly complex due to gene deletions, duplications, and hybrid alleles that cannot be reliably detected from microarray data alone. Clinical pharmacogenomic testing uses sequencing and copy number analysis for more comprehensive results.
Nutrigenomics Rules
Nutrition recommendations are based on a curated rule set that maps specific genotype combinations to nutrient metabolism effects. Each rule specifies:
- The gene(s) and SNP(s) involved (e.g., MTHFR rs1801133)
- The triggering genotype pattern (e.g., homozygous TT)
- The affected nutrient and metabolic pathway
- General dietary patterns to discuss with a healthcare provider
Recommendations are framed qualitatively (e.g., “consider increasing folate-rich foods”) rather than with specific dosages, which should be determined in consultation with a registered dietitian or healthcare provider.
Trait Interpretation
Individual SNP interpretations in the Trait Explorer are based on published research associations. Each variant in our curated catalog includes the source study, effect allele, and observed genotype associations. Interpretations use probabilistic language to reflect the statistical nature of genetic associations.
AI-Generated Content
The AI chat, health report, and variant explanation features use Anthropic's Claude (Sonnet 4.5) to generate natural language summaries. These features:
- Receive variant summaries (not raw files) as context
- Are instructed to use probabilistic language and never provide specific dosages
- May contain errors or inaccuracies inherent to large language model outputs
- Should be treated as educational starting points, not clinical guidance
Data Sources
Our SNP catalog, PRS models, and pharmacogenomics rules are derived from:
- Published genome-wide association studies (GWAS Catalog)
- Clinical Pharmacogenetics Implementation Consortium (CPIC) guidelines
- PharmGKB pharmacogenomics knowledge base
- ClinVar variant database (for monogenic variants)
- Peer-reviewed nutrigenomics literature