Skip to main content
What's in Your 23andMe Raw Data File? — illustration

What's in Your 23andMe Raw Data File?

Your 23andMe raw data file contains roughly 600,000 genetic markers that hold information about your health risks, drug metabolism, ancestry, traits, and more. Here's exactly what's inside and how to put it to use.

By Peter Hollens·Last updated: ·10 min read

Key Takeaways

  • Your 23andMe raw data file is a plain text file containing roughly 600,000 SNPs — specific positions where your DNA differs from the reference genome
  • The file includes health-relevant variants (BRCA, APOE, Factor V Leiden), pharmacogenomic markers (CYP2D6, CYP2C19), carrier status variants, ancestry markers, and trait-related SNPs
  • 23andMe's built-in reports only analyze a fraction of the variants in your raw file — third-party tools can extract far more health insights from the same data
  • DNA Explore analyzes your raw data entirely in your browser for $9.99, covering polygenic risk scores, pharmacogenomics, nutrigenomics, and more — with no data upload required
“I signed up for 23andMe in 2017 because I was fascinated by what my DNA could tell me. Six years later, my data was compromised in their breach — I'm a confirmed class member in the litigation. I didn't want to hand my genetic data to another company, so I built a tool where everything stays on your device. Then I thought: why not give people what I was actually searching for when I got my DNA tested in the first place — actionable health insights, drug metabolism analysis, risk scores — things you can actually do something with.”

Peter Hollens

Founder, DNA Explore · Wikipedia

What Is the 23andMe Raw Data File?

From Spit Kit to Data File

When you spit into a tube and send it to 23andMe, their lab scans your DNA using a genotyping chip — a silicon wafer designed to read specific positions across your genome. The results of that scan are stored in what's called your raw data file, a plain text file you can download directly from your 23andMe account under Settings > 23andMe Data.

File Format and Structure

The file itself is surprisingly simple. It's a tab-separated text file (typically with a .txt extension) that weighs in at around 20-30 MB when unzipped. Each row represents a single genetic marker — a specific position on one of your 23 chromosomes (22 autosomes plus one sex chromosome pair). For each marker, the file records four things:
  • An rsID identifier (like rs12345)
  • The chromosome number
  • The base-pair position on that chromosome
  • Your genotype — the two letters representing which nucleotide bases you carry at that position

Chip Versions and Marker Counts

Depending on which version of the 23andMe genotyping chip was used, your file will contain somewhere between 550,000 and 700,000 markers. Newer chip versions (v5) cover slightly different positions than older ones (v3, v4), but they all follow the same basic text format. This file is your data — you own it, and you're free to download it and use it with any compatible analysis tool.

What Are SNPs and Why Do They Matter?

What Is a SNP?

The genetic markers in your 23andMe raw data are called SNPs — single nucleotide polymorphisms, pronounced "snips." A SNP is a position in the human genome where people commonly differ by a single DNA letter. Your DNA is written in a four-letter alphabet: A (adenine), C (cytosine), G (guanine), and T (thymine). At most positions, virtually all humans carry the same letter. But at SNP positions, a meaningful percentage of the population carries a different variant.

A Real-World Example: MTHFR

For example, at the SNP rs1801133 on chromosome 1, most people carry a CC genotype. But roughly 10-15% of people of European descent carry a TT genotype at this position. This particular SNP sits in the MTHFR gene and affects how efficiently your body processes folate — a B vitamin critical for methylation, DNA repair, and dozens of other biochemical processes.

600,000 Data Points in One File

Your 23andMe file contains roughly 600,000 of these SNPs. Each one has been studied to varying degrees. Some are well-characterized with strong research backing — like APOE variants linked to Alzheimer's risk or CYP2D6 variants that determine how you metabolize certain medications. Others have smaller effect sizes or less research behind them. Together, these SNPs form a partial but highly informative snapshot of your genetic makeup.

Health and Disease Risk Information

High-Impact Single-Gene Variants

A significant portion of the SNPs in your 23andMe raw data relate to health conditions and disease risk. These range from single-gene variants with strong effects to common variants that each contribute a small amount to overall risk. Some of the most impactful health-related markers in a typical raw data file include:
  • BRCA1 and BRCA2 variants associated with hereditary breast and ovarian cancer risk
  • APOE variants (rs429358, rs7412) linked to Alzheimer's disease risk
  • Factor V Leiden (rs6025) associated with blood clotting disorders
  • HFE gene variants tied to hereditary hemochromatosis

Polygenic Risk: Many Small Effects Add Up

Beyond these high-impact single-gene variants, your raw data contains hundreds of thousands of common variants that each nudge your risk for conditions like type 2 diabetes, coronary artery disease, atrial fibrillation, and various cancers by a small amount. Individually, each variant has a tiny effect. But when combined into what's called a polygenic risk score — a weighted sum across many variants — they can provide meaningful risk stratification.

Beyond 23andMe's Built-In Reports

23andMe's own reports only cover a fraction of what's in your raw data. Their FDA-approved health reports analyze a small subset of markers. But the raw data file contains far more variants that have been linked to health outcomes in peer-reviewed genome-wide association studies. Third-party tools like DNA Explore can analyze these additional markers to generate polygenic risk scores and more comprehensive health insights.

Pharmacogenomics: How You Process Medications

What Is Pharmacogenomics?

One of the most immediately actionable categories of information in your 23andMe raw data is pharmacogenomics — the study of how your genes affect your response to medications. Your raw data file contains variants in key drug metabolism genes that determine whether you process certain medications faster or slower than average.

Key Drug Metabolism Genes in Your File

The most important pharmacogenes covered by the 23andMe chip include:
  • CYP2D6 — metabolizes roughly 25% of all prescription drugs including codeine, tramadol, tamoxifen, and many antidepressants
  • CYP2C19 — affects how you process clopidogrel (Plavix), certain proton pump inhibitors, and some antidepressants like citalopram and escitalopram
  • CYP2C9 — influences your response to warfarin, certain NSAIDs, and some oral diabetes medications
  • CYP3A4/5 — metabolizes about half of all drugs on the market

Why Metabolizer Status Matters

Knowing your metabolizer status for these enzymes can be genuinely important for your healthcare. A poor metabolizer of CYP2D6, for instance, may get no pain relief from codeine because their body cannot convert it to its active form (morphine). An ultrarapid CYP2D6 metabolizer, on the other hand, may convert codeine too quickly and experience dangerous side effects. DNA Explore analyzes these pharmacogenomic markers directly from your raw data file for $9.99, giving you a report you can share with your doctor before starting a new medication.

Ancestry and Trait Information

How Ancestry Estimation Works

Your 23andMe raw data also contains the markers that power ancestry estimation and trait predictions. Ancestry analysis works by comparing your pattern of SNPs against reference populations from around the world. Certain allele frequencies vary predictably between continental populations and even between regional groups within continents, allowing algorithms to estimate your ancestral composition.

Lineage Markers: mtDNA and Y Chromosome

The raw data includes markers across all 22 autosomes plus the X chromosome, mitochondrial DNA (mtDNA) markers that trace your maternal lineage, and — for biological males — Y chromosome markers that trace the paternal lineage. These can be used by third-party tools to determine your mtDNA haplogroup, Y-DNA haplogroup, and to estimate ethnicity percentages.

Trait-Related SNPs

Trait-related SNPs are also present throughout the file. These include variants associated with:
  • Eye color, hair color, and hair texture
  • Freckling and skin pigmentation
  • Bitter taste perception (TAS2R38)
  • Lactose tolerance (MCM6/LCT)
  • Caffeine metabolism (CYP1A2)
  • Alcohol flush reaction (ALDH2)
  • Asparagus metabolite detection
  • Earwax type
While traits are often influenced by many genes and environmental factors, some have strong single-gene effects — like the rs4988235 variant near the LCT gene that largely determines whether you can digest lactose as an adult.

Carrier Status for Inherited Conditions

What Does Carrier Status Mean?

Your raw data file contains variants relevant to carrier status for numerous recessive genetic conditions. Being a carrier means you have one copy of a variant associated with a condition but typically don't show symptoms yourself — however, if your partner is also a carrier for the same condition, each of your children has a 25% chance of being affected.

Conditions Covered in the Raw Data

23andMe's own carrier status reports test for conditions like cystic fibrosis, sickle cell anemia, Tay-Sachs disease, and hereditary hearing loss, among others. But the raw data file contains additional variants in these and other genes that 23andMe doesn't include in their consumer-facing reports. Some of these are well-studied pathogenic variants cataloged in databases like ClinVar.

Limitations of Genotyping vs. Sequencing

It is worth noting that genotyping chips like the one 23andMe uses are not the same as full gene sequencing. The chip tests specific known positions, which means it can miss variants that exist between the tested positions. A negative result on a genotyping chip does not guarantee you are not a carrier — it only means you don't carry the specific variants that were tested. For definitive carrier screening, clinical-grade sequencing is recommended. That said, the carrier information in your raw data can still be a useful starting point for understanding your genetic profile and for conversations with a genetic counselor.

How to Read Your Raw Data File

File Header and Layout

If you open your 23andMe raw data file in a text editor, you'll see a header section at the top (lines beginning with #) that includes metadata about when the file was generated and which chip version was used. Below the header, each line contains four tab-separated columns: rsid, chromosome, position, and genotype.

Understanding the Four Columns

The rsid column contains the reference SNP identifier — a standardized name like rs53576 that you can look up in databases like dbSNP, SNPedia, or ClinVar to learn what that variant is associated with. The chromosome column is a number from 1 to 22, or X, Y, or MT (mitochondrial). The position column tells you the exact base-pair location on that chromosome. The genotype column shows your two alleles — for example, AG means you have one A and one G at that position.

No-Calls and Haploid Markers

Some genotypes will show as "--" which means the chip could not reliably read that position (a no-call). This is normal and typically affects 1-3% of markers. You might also see single-letter genotypes for markers on the X or Y chromosome in males, or for mitochondrial markers, since these are haploid (only one copy). While you can manually look up individual SNPs, doing this for 600,000 markers is impractical. That's where analysis tools come in — they automate the process of matching your variants against research databases and computing meaningful results.

What You Can Do With Your Raw Data

Unlocking More Value From Your DNA

Downloading your 23andMe raw data file unlocks far more value than the reports 23andMe provides on their platform. Third-party analysis tools can extract additional health insights, pharmacogenomic reports, nutrition recommendations, and ancestry details that 23andMe either doesn't report on or charges extra for.

DNA Explore: Privacy-First Analysis

DNA Explore is one such tool — and it takes a privacy-first approach that's especially relevant given the 23andMe bankruptcy in 2025 that put 15 million customers' data at risk. When you upload your raw data to DNA Explore, the file is analyzed entirely in your browser using JavaScript. Your genetic data never leaves your device — there's no server upload, no database storage, and no account required. For a one-time payment of $9.99, you get a comprehensive report covering:
  • Polygenic risk scores for dozens of conditions
  • Pharmacogenomics for major drug metabolism genes
  • Nutrigenomics recommendations
  • Gene-gene interaction analysis
  • An AI-powered chat that explains your results in plain language

Other Analysis Options

Other options include tools like Promethease (SNP lookup against SNPedia, $14), GEDmatch (genealogy-focused, free), and various clinical platforms. Whichever tool you choose, downloading your raw data is the essential first step. Go to your 23andMe account, navigate to Settings > 23andMe Data > Download Raw Data, and save the zip file. It's your data — make the most of it.

Frequently Asked Questions

How do I download my 23andMe raw data file?
Log in to your 23andMe account, go to Settings, then 23andMe Data, and click Download Raw Data. You'll need to verify your identity and wait a few minutes for the file to be prepared. It downloads as a zip file containing a text file with all your genotype data. If 23andMe is no longer operational, check whether your account data is still accessible through whatever entity manages their assets post-bankruptcy.
Is my 23andMe raw data file safe to share with third-party tools?
It depends on the tool. Any service that requires you to upload your raw data to their servers carries inherent risk — your genetic data could be exposed in a data breach, sold, or subpoenaed. Look for tools that process your data locally in the browser without uploading it to a server. DNA Explore, for example, analyzes your file entirely in your browser for $9.99 — your raw data never leaves your device.
What's the difference between 23andMe raw data and whole genome sequencing?
23andMe uses a genotyping chip that reads roughly 600,000 specific positions (SNPs) in your genome. Whole genome sequencing reads all 3 billion base pairs. Genotyping is much cheaper but only covers pre-selected positions, so it can miss variants between those positions. For most consumer health and ancestry purposes, genotyping data provides substantial value, but it's not as comprehensive as full sequencing.
Can I use my 23andMe raw data for medical decisions?
Your 23andMe raw data can provide useful health insights, but it should not be used as a sole basis for medical decisions. Genotyping chips can have error rates, and not all variants are covered. Treat raw data analysis as a starting point for conversations with your doctor or a genetic counselor — not as a clinical diagnosis. Tools like DNA Explore include disclaimers and encourage professional follow-up for any concerning findings.
Does 23andMe raw data expire or change over time?
Your raw data file itself does not expire or change — your DNA is fixed. However, 23andMe has used different chip versions over the years (v3, v4, v5), each covering slightly different sets of SNPs. A file downloaded in 2024 from a v5 chip will differ in which markers are included compared to a v3 file from 2014. The genotype values at shared positions remain consistent. Download and save your raw data file now, as access may not be guaranteed indefinitely given company changes.

Sources & References

  1. NCBI dbSNP — Database of Single Nucleotide Polymorphisms
  2. 23andMe — How to Download Raw Data
  3. DNA Explore Privacy Policy

Disclaimer: The information provided in this article is for general educational and informational purposes only and does not constitute medical, legal, or financial advice. Genetic information should not be used as a substitute for professional medical advice, diagnosis, or treatment. Always consult a qualified healthcare provider before making any health decisions based on genetic data.

Prices, features, and availability of third-party products and services mentioned in this article are based on publicly available information as of the publication date and may have changed. We make reasonable efforts to ensure accuracy but cannot guarantee that all pricing, feature descriptions, or company information is current or complete. Trademarks and brand names referenced are the property of their respective owners and are used solely for identification and comparison purposes.

Genetic risk assessments, polygenic risk scores, and pharmacogenomic reports generated by any consumer tool — including DNA Explore — are based on currently published research and known associations. They are not diagnostic. Genetic predisposition does not guarantee the development or absence of any condition.

Unlock the full potential of your 23andMe raw data

Drop your 23andMe or AncestryDNA file. Results in seconds. $9.99 to unlock everything.

Try DNA Explore free

Already purchased? Restore your access