
What's in Your 23andMe Raw Data File?
Your 23andMe raw data file contains roughly 600,000 genetic markers that hold information about your health risks, drug metabolism, ancestry, traits, and more. Here's exactly what's inside and how to put it to use.
Key Takeaways
- Your 23andMe raw data file is a plain text file containing roughly 600,000 SNPs — specific positions where your DNA differs from the reference genome
- The file includes health-relevant variants (BRCA, APOE, Factor V Leiden), pharmacogenomic markers (CYP2D6, CYP2C19), carrier status variants, ancestry markers, and trait-related SNPs
- 23andMe's built-in reports only analyze a fraction of the variants in your raw file — third-party tools can extract far more health insights from the same data
- DNA Explore analyzes your raw data entirely in your browser for $9.99, covering polygenic risk scores, pharmacogenomics, nutrigenomics, and more — with no data upload required
“I signed up for 23andMe in 2017 because I was fascinated by what my DNA could tell me. Six years later, my data was compromised in their breach — I'm a confirmed class member in the litigation. I didn't want to hand my genetic data to another company, so I built a tool where everything stays on your device. Then I thought: why not give people what I was actually searching for when I got my DNA tested in the first place — actionable health insights, drug metabolism analysis, risk scores — things you can actually do something with.”
What Is the 23andMe Raw Data File?
From Spit Kit to Data File
When you spit into a tube and send it to 23andMe, their lab scans your DNA using a genotyping chip — a silicon wafer designed to read specific positions across your genome. The results of that scan are stored in what's called your raw data file, a plain text file you can download directly from your 23andMe account under Settings > 23andMe Data.File Format and Structure
The file itself is surprisingly simple. It's a tab-separated text file (typically with a .txt extension) that weighs in at around 20-30 MB when unzipped. Each row represents a single genetic marker — a specific position on one of your 23 chromosomes (22 autosomes plus one sex chromosome pair). For each marker, the file records four things:- An rsID identifier (like rs12345)
- The chromosome number
- The base-pair position on that chromosome
- Your genotype — the two letters representing which nucleotide bases you carry at that position
Chip Versions and Marker Counts
Depending on which version of the 23andMe genotyping chip was used, your file will contain somewhere between 550,000 and 700,000 markers. Newer chip versions (v5) cover slightly different positions than older ones (v3, v4), but they all follow the same basic text format. This file is your data — you own it, and you're free to download it and use it with any compatible analysis tool.What Are SNPs and Why Do They Matter?
What Is a SNP?
The genetic markers in your 23andMe raw data are called SNPs — single nucleotide polymorphisms, pronounced "snips." A SNP is a position in the human genome where people commonly differ by a single DNA letter. Your DNA is written in a four-letter alphabet: A (adenine), C (cytosine), G (guanine), and T (thymine). At most positions, virtually all humans carry the same letter. But at SNP positions, a meaningful percentage of the population carries a different variant.A Real-World Example: MTHFR
For example, at the SNP rs1801133 on chromosome 1, most people carry a CC genotype. But roughly 10-15% of people of European descent carry a TT genotype at this position. This particular SNP sits in the MTHFR gene and affects how efficiently your body processes folate — a B vitamin critical for methylation, DNA repair, and dozens of other biochemical processes.600,000 Data Points in One File
Your 23andMe file contains roughly 600,000 of these SNPs. Each one has been studied to varying degrees. Some are well-characterized with strong research backing — like APOE variants linked to Alzheimer's risk or CYP2D6 variants that determine how you metabolize certain medications. Others have smaller effect sizes or less research behind them. Together, these SNPs form a partial but highly informative snapshot of your genetic makeup.Health and Disease Risk Information
High-Impact Single-Gene Variants
A significant portion of the SNPs in your 23andMe raw data relate to health conditions and disease risk. These range from single-gene variants with strong effects to common variants that each contribute a small amount to overall risk. Some of the most impactful health-related markers in a typical raw data file include:- BRCA1 and BRCA2 variants associated with hereditary breast and ovarian cancer risk
- APOE variants (rs429358, rs7412) linked to Alzheimer's disease risk
- Factor V Leiden (rs6025) associated with blood clotting disorders
- HFE gene variants tied to hereditary hemochromatosis
Polygenic Risk: Many Small Effects Add Up
Beyond these high-impact single-gene variants, your raw data contains hundreds of thousands of common variants that each nudge your risk for conditions like type 2 diabetes, coronary artery disease, atrial fibrillation, and various cancers by a small amount. Individually, each variant has a tiny effect. But when combined into what's called a polygenic risk score — a weighted sum across many variants — they can provide meaningful risk stratification.Beyond 23andMe's Built-In Reports
23andMe's own reports only cover a fraction of what's in your raw data. Their FDA-approved health reports analyze a small subset of markers. But the raw data file contains far more variants that have been linked to health outcomes in peer-reviewed genome-wide association studies. Third-party tools like DNA Explore can analyze these additional markers to generate polygenic risk scores and more comprehensive health insights.Pharmacogenomics: How You Process Medications
What Is Pharmacogenomics?
One of the most immediately actionable categories of information in your 23andMe raw data is pharmacogenomics — the study of how your genes affect your response to medications. Your raw data file contains variants in key drug metabolism genes that determine whether you process certain medications faster or slower than average.Key Drug Metabolism Genes in Your File
The most important pharmacogenes covered by the 23andMe chip include:- CYP2D6 — metabolizes roughly 25% of all prescription drugs including codeine, tramadol, tamoxifen, and many antidepressants
- CYP2C19 — affects how you process clopidogrel (Plavix), certain proton pump inhibitors, and some antidepressants like citalopram and escitalopram
- CYP2C9 — influences your response to warfarin, certain NSAIDs, and some oral diabetes medications
- CYP3A4/5 — metabolizes about half of all drugs on the market
Why Metabolizer Status Matters
Knowing your metabolizer status for these enzymes can be genuinely important for your healthcare. A poor metabolizer of CYP2D6, for instance, may get no pain relief from codeine because their body cannot convert it to its active form (morphine). An ultrarapid CYP2D6 metabolizer, on the other hand, may convert codeine too quickly and experience dangerous side effects. DNA Explore analyzes these pharmacogenomic markers directly from your raw data file for $9.99, giving you a report you can share with your doctor before starting a new medication.Ancestry and Trait Information
How Ancestry Estimation Works
Your 23andMe raw data also contains the markers that power ancestry estimation and trait predictions. Ancestry analysis works by comparing your pattern of SNPs against reference populations from around the world. Certain allele frequencies vary predictably between continental populations and even between regional groups within continents, allowing algorithms to estimate your ancestral composition.Lineage Markers: mtDNA and Y Chromosome
The raw data includes markers across all 22 autosomes plus the X chromosome, mitochondrial DNA (mtDNA) markers that trace your maternal lineage, and — for biological males — Y chromosome markers that trace the paternal lineage. These can be used by third-party tools to determine your mtDNA haplogroup, Y-DNA haplogroup, and to estimate ethnicity percentages.Trait-Related SNPs
Trait-related SNPs are also present throughout the file. These include variants associated with:- Eye color, hair color, and hair texture
- Freckling and skin pigmentation
- Bitter taste perception (TAS2R38)
- Lactose tolerance (MCM6/LCT)
- Caffeine metabolism (CYP1A2)
- Alcohol flush reaction (ALDH2)
- Asparagus metabolite detection
- Earwax type
Carrier Status for Inherited Conditions
What Does Carrier Status Mean?
Your raw data file contains variants relevant to carrier status for numerous recessive genetic conditions. Being a carrier means you have one copy of a variant associated with a condition but typically don't show symptoms yourself — however, if your partner is also a carrier for the same condition, each of your children has a 25% chance of being affected.Conditions Covered in the Raw Data
23andMe's own carrier status reports test for conditions like cystic fibrosis, sickle cell anemia, Tay-Sachs disease, and hereditary hearing loss, among others. But the raw data file contains additional variants in these and other genes that 23andMe doesn't include in their consumer-facing reports. Some of these are well-studied pathogenic variants cataloged in databases like ClinVar.Limitations of Genotyping vs. Sequencing
It is worth noting that genotyping chips like the one 23andMe uses are not the same as full gene sequencing. The chip tests specific known positions, which means it can miss variants that exist between the tested positions. A negative result on a genotyping chip does not guarantee you are not a carrier — it only means you don't carry the specific variants that were tested. For definitive carrier screening, clinical-grade sequencing is recommended. That said, the carrier information in your raw data can still be a useful starting point for understanding your genetic profile and for conversations with a genetic counselor.How to Read Your Raw Data File
File Header and Layout
If you open your 23andMe raw data file in a text editor, you'll see a header section at the top (lines beginning with #) that includes metadata about when the file was generated and which chip version was used. Below the header, each line contains four tab-separated columns: rsid, chromosome, position, and genotype.Understanding the Four Columns
The rsid column contains the reference SNP identifier — a standardized name like rs53576 that you can look up in databases like dbSNP, SNPedia, or ClinVar to learn what that variant is associated with. The chromosome column is a number from 1 to 22, or X, Y, or MT (mitochondrial). The position column tells you the exact base-pair location on that chromosome. The genotype column shows your two alleles — for example, AG means you have one A and one G at that position.No-Calls and Haploid Markers
Some genotypes will show as "--" which means the chip could not reliably read that position (a no-call). This is normal and typically affects 1-3% of markers. You might also see single-letter genotypes for markers on the X or Y chromosome in males, or for mitochondrial markers, since these are haploid (only one copy). While you can manually look up individual SNPs, doing this for 600,000 markers is impractical. That's where analysis tools come in — they automate the process of matching your variants against research databases and computing meaningful results.What You Can Do With Your Raw Data
Unlocking More Value From Your DNA
Downloading your 23andMe raw data file unlocks far more value than the reports 23andMe provides on their platform. Third-party analysis tools can extract additional health insights, pharmacogenomic reports, nutrition recommendations, and ancestry details that 23andMe either doesn't report on or charges extra for.DNA Explore: Privacy-First Analysis
DNA Explore is one such tool — and it takes a privacy-first approach that's especially relevant given the 23andMe bankruptcy in 2025 that put 15 million customers' data at risk. When you upload your raw data to DNA Explore, the file is analyzed entirely in your browser using JavaScript. Your genetic data never leaves your device — there's no server upload, no database storage, and no account required. For a one-time payment of $9.99, you get a comprehensive report covering:- Polygenic risk scores for dozens of conditions
- Pharmacogenomics for major drug metabolism genes
- Nutrigenomics recommendations
- Gene-gene interaction analysis
- An AI-powered chat that explains your results in plain language
Other Analysis Options
Other options include tools like Promethease (SNP lookup against SNPedia, $14), GEDmatch (genealogy-focused, free), and various clinical platforms. Whichever tool you choose, downloading your raw data is the essential first step. Go to your 23andMe account, navigate to Settings > 23andMe Data > Download Raw Data, and save the zip file. It's your data — make the most of it.Frequently Asked Questions
How do I download my 23andMe raw data file?
Is my 23andMe raw data file safe to share with third-party tools?
What's the difference between 23andMe raw data and whole genome sequencing?
Can I use my 23andMe raw data for medical decisions?
Does 23andMe raw data expire or change over time?
Sources & References
Disclaimer: The information provided in this article is for general educational and informational purposes only and does not constitute medical, legal, or financial advice. Genetic information should not be used as a substitute for professional medical advice, diagnosis, or treatment. Always consult a qualified healthcare provider before making any health decisions based on genetic data.
Prices, features, and availability of third-party products and services mentioned in this article are based on publicly available information as of the publication date and may have changed. We make reasonable efforts to ensure accuracy but cannot guarantee that all pricing, feature descriptions, or company information is current or complete. Trademarks and brand names referenced are the property of their respective owners and are used solely for identification and comparison purposes.
Genetic risk assessments, polygenic risk scores, and pharmacogenomic reports generated by any consumer tool — including DNA Explore — are based on currently published research and known associations. They are not diagnostic. Genetic predisposition does not guarantee the development or absence of any condition.
Unlock the full potential of your 23andMe raw data
Drop your 23andMe or AncestryDNA file. Results in seconds. $9.99 to unlock everything.
Try DNA Explore freeAlready purchased? Restore your access