A Researcher Needed Three Hours to Identify Me From My DNA
(Bloomberg Businessweek) -- Using just my DNA, a genealogist was able to identify me in three and a half hours.
It wasn’t hard. I’d previously sent a DNA sample to the genetic testing company 23andMe Inc. and then uploaded my data anonymously to a genealogy website. Researcher Michelle Trostler was able to access my data from that site and spent an afternoon looking for connections that would help her put a name to my data. The task was so easy that in the meantime she rewatched a season of Game of Thrones.
Law enforcement officials are increasingly using similar tactics to find and catch suspected criminals. Crime scene DNA gets uploaded to popular genealogy websites GEDmatch or FamilyTreeDNA that have given officials access to their databases. Genetic genealogists then look for DNA relatives of anonymous suspects, scouring public records and social media until they arrive at a likely name. That’s how California investigators found the suspected Golden State Killer last April.
The boom in consumer DNA testing has been a boon for criminal investigations. Over the past year investigators worked with genetic genealogists to identify suspects in more than 60 cases, most of them long cold. The suspects didn’t volunteer any information. Instead, they were tracked down using data shared by family members. Law enforcement can take a DNA sample, analyze it, and use relatives who match that DNA on one of the genealogy sites to develop an anonymous person’s family tree and identify them.
These cases highlight the intimate nature of DNA data. Anyone can be exposed, whether or not they’ve made their own DNA public. One family member sharing such personal information can expose multiple generations on their family tree.
I’ve never uploaded my DNA to any open source database like GEDmatch and I wanted to see if I could still be identified from my anonymous DNA data. I reached out to CeCe Moore, a genetic genealogist who’s solved many criminal cases. She guided me in the experiment. I would upload my 23andMe data anonymously to GEDmatch. CeCe would then send that anonymized file to Trostler, who would get to work trying to identify me. (Trostler and I had not previously met.) She also is a genetic genealogist and works primarily with adoptees, helping them solve such mysteries as the identity of birth parents.
Before sending Trostler my GEDmatch file, Moore learned that I’m pretty exposed when it comes to genetic information. My Aunt Catherine, an amateur genealogist, has uploaded her own DNA to GEDmatch, as well as data from my mother, grandmother, and cousin. “With your mom in there, it is a slam dunk,” Moore told me. She suggested we make everyone’s DNA except my cousin Jenn’s private to better mimic what a genealogist might more typically encounter.
Moore sent Trostler my data on April 5. The next day, I received the following note in my Gmail inbox: “CeCe asked me to figure out who Gedmatch FU2936683 was. I found you!”
Using GEDmatch’s tools, Trostler started by pulling up a list of my DNA matches. The closest connection was my cousin Jenn. She then ran Jenn’s DNA matches, too, and quickly discovered which branch of her family tree to look at. I don’t share any DNA with Jenn’s dad, my uncle, who also had DNA in the GEDmatch database, which meant I was related to Jenn’s mother.
Trostler hit a roadblock when she searched the email address associated with Jenn’s account (actually my aunt’s email) on BeenVerified, a background check website. That briefly sent her barking up the wrong family tree; it turned out the email address also was listed in connection to a totally unrelated person. Googling the address helped Trostler figure out that it belonged to my Aunt Catherine, and that she was Jenn’s mother. Trostler then identified the names of my aunt’s parents. From there it was easy: She found an obituary for my grandfather, Raul Alcala, in the Orange County Register, which also listed the names of Aunt Catherine’s two siblings.
Next, Trostler turned to Facebook, landing on my uncle’s page. She guessed he didn’t have any children (he has one daughter). So the anonymous DNA, she figured, must belong to a child of Catherine’s sister, Nancy Alcala Brown. Trostler looked my mom up in the California Birth Index and found just one female child associated with a woman with the maiden name Alcala and last name Brown. It was me. Trostler Googled my name. When she saw my Bloomberg profile, along with the trove of articles I’ve written about DNA testing, she was sure she’d found the right person. “It’s not always that easy,” she says.
There were a few things that made my case simpler than most, she says. For starters, not only did I have a first cousin in GEDmatch, but thanks to the DNA of Jenn’s dad’s, the genealogist was able to quickly narrow her search. Also, my grandfather’s obituary dutifully listed all the names of his children. And my mom’s maiden name, Alcala, is a lot more unusual than Brown.
Trostler says typical cases take her at least 10 hours; others take as many as 30 hours or can’t be solved at all. But the growing popularity of DNA testing is speeding up her work. GEDmatch has about 1 million users, and that number will increase as some of the more than 15 million people who’ve sent their saliva to 23andMe and Ancestry upload their data there.
The majority of the human genome is the same from person to person. Variation adds as much as about 0.1 percent of the total. It’s that variation that acts as a fingerprint, identifying who we are but also how closely related we are to another person. My next closest match in GEDmatch shared only a fraction of my DNA—69 centimorgans compared with the 830 centimorgans I share with Jenn.
As DNA databases grow, so too does the potential for abuse. “Misuse of surreptitious DNA is potentially a big problem,” says Debbie Kennett, genealogist and author. “You can imagine celebrities and politicians being stalked to get illicit DNA samples for paternity testing without consent.”
Today, a person interested in identifying someone from their DNA would require that person to somehow collect a vial of saliva and send it to a company like 23andMe. But in the future, technology accessible to law enforcement may become more widely accessible. Perhaps in the not-so-distant future, you’ll be able to swab your half-eaten ham sandwich in the work refrigerator and unmask the co-worker who ate your lunch. (It sounds ridiculous, but there’s already a company sequencing apartment complex dog poop to expose owners who don’t pick up after their pooches.)
Kennett also worries about people being charged for offenses they didn’t commit because their DNA was found at a crime scene, or about innocent people being subjected to intrusive social media and public-record searches as police try to identify a suspect. Recently, a woman in Washington state learned it was her DNA that led to a distant relative’s arrest in an Iowa murder when her name was disclosed in a search warrant.
Government DNA databases have rules governing access, but consumer DNA databases operate only according to a company’s terms of service. James Hazel, a researcher at the Center for Genetic Privacy and Identity in Community Settings at Vanderbilt University Medical Center in Nashville, recently suggested that a universal DNA database containing everyone’s information may do a better job protecting people’s privacy. With regulated access to a more limited set of genetic information than the data that can be culled from consumer testing reports, investigators could solve crimes without intruding as much on people who happen to be related to a suspect. “People who aren’t genealogists don’t realize how much information is out there about them,” Kennett says. “Even if they haven’t shared the information themselves, it’s likely that one of the relatives has shared private information.”
To contact the editor responsible for this story: Dimitra Kessenides at email@example.com
©2019 Bloomberg L.P.