Let People Share DNA With a Click

(Bloomberg Opinion) -- The medical world is obsessed with privacy, and this is often a good thing: Patients don’t want personal information about their health to be shared without their permission. The world of research, in contrast, likes sharing: Scientists want everyone — especially other scientists — to know about their discoveries, through publications and other means.

These two worlds collide when scientists study human health. For decades, the National Institutes of Health has enforced strict specifications for how and when human-derived data can be shared. Every bit and byte is treated as if it’s part of someone’s personal medical record, and shared only after it’s been scrubbed so thoroughly, the person it came from cannot be identified.

When it comes to DNA, however, this is a high bar, because DNA is an identifier. If I sequence your DNA to use it in a study, how can I guarantee that no one will ever figure out the data is yours? What if your cousin makes his DNA public — can I still protect your identity? Probably not.

Nonetheless, the NIH is trying to guarantee total privacy as it recruits 1 million U.S. residents for its $1.5 billion “All of Us” program. This is not a medical study or clinical trial, where patient privacy is essential. Rather, it is an effort to gain a broad understanding of human health and disease by sequencing the DNA of a vast population.

The only way such a project can succeed is if all data gathered are shared among thousands of scientific labs, not only in the U.S. but around the world. This would seem to make it near-impossible to guarantee privacy.

The NIH has similar projects already underway, including its TCGA cancer-gene sequencing project, and another called TOPMed. Both of these are sequencing more than 100,000 human samples.

The NIH is using two approaches to protect the privacy of this data, both of which present significant roadblocks to scientists. First, researchers who want to use the data must submit an application laying out their scientific plans and promising to keep the data confidential. The committee that reviews the applications can approve them or not, and if it says no, there’s no mechanism for appeal. Approval allows limited access to the data, only for the scientists' stated goals.

Second, the NIH itself and all the scientists who use the data are required to keep it on secure servers, for authorized users only. This will be increasingly difficult to control as tens of thousands of users — including students, lab personnel and others — gain legitimate access to those servers.

There’s a better way to handle these troves of DNA data: Simply allow free, unrestricted access. This was the NIH’s original policy: When genome sequencing was just getting started, and the race was on to sequence the first human genome, Francis Collins, then the director of the Human Genome Project and now the head of the NIH, argued repeatedly that no “roadblocks” should be put in the way of scientists’ access to the data. In debates with Craig Venter, who was leading a private effort to sequence the genome, Collins warned that a private effort might restrict access.

When the human genome was published in two simultaneous papers, in early 2001, the public effort made its data available freely — and the private effort followed suit, in large part because of pressure from Collins and the NIH.

Since then, the NIH seems to have forgotten how powerful and valuable a fully free and shared genome can be. Most of the many thousands of genomes that have been sequenced in the past 18 years have been hidden behind firewalls. Today, the largest and most diverse set of publicly available human genomes is one created by the private Simons Foundation.

Free, unrestricted access is the best way to accelerate scientific discovery. Genome data collected for the benefit of all should be shared by all, just as Francis Collins argued 18 years ago.

This raises an obvious question: Would anyone be willing to share something as personal as their DNA? The answer, it turns out, is yes.

In 2005, Harvard professor George Church created the Personal Genome Project to demonstrate the power of open science in genomics. The PGP’s approach is “to invite willing participants to publicly share their personal data for the greater good.” Rather than lock the data behind firewalls, the PGP makes all its data freely available.

And many people are happy to sign on under these terms. The PGP has more than 6,000 participants in the U.S. alone, and it has launched efforts in the U.K., Canada, Austria and China, with more to come. As soon as genomes are sequenced, anyone — scientists, teachers, amateur geneticists, reporters or the merely curious — can download them.

This broad willingness to share shouldn’t be surprising. Consider that Facebook has persuaded more than 2 billion people to make publicly available huge amounts of personal data, in return for a nice app, but little more. And there is not much risk in sharing one’s DNA sequence, now that federal law bars health insurers from discriminating on the basis of genetic risk.

Rather than promise to keep all data secret forever — a promise that will almost inevitably end up being broken — the NIH should first try recruiting people to sign up for “All of Us” and other genome projects using the open model of the Personal Genome Project. Open genomes will make science move faster, and there’s no time, or good reason, to wait.

This column does not necessarily reflect the opinion of the editorial board or Bloomberg LP and its owners.

Steven Salzberg is the Bloomberg Distinguished Professor of Biomedical Engineering, Computer Science and Biostatistics, and the director of the Center for Computational Biology at Johns Hopkins University.

©2019 Bloomberg L.P.