Reading and Writing a Book With DNA

16 August 2012 Emily Waltz Harvard University researchers converted a 53 000-word book into DNA and then read the DNA-encoded book using gene-sequencing technology, the researchers report this week in Science. The project is by far the largest demonstration of…

16 August 2012

Emily Waltz

Harvard University researchers converted a 53 000-word book into DNA and then read the DNA-encoded book using gene-sequencing technology, the researchers report this week in Science. The project is by far the largest demonstration of digital information storage in DNA and the densest consolidation of data in any medium, the authors say.
There is a clear need for improved long-term storage of massively large data, says George Church, a geneticist at Harvardʼs Wyss Institute and one of the leaders of the research. There is data that we are throwing away or donʼt collect because we canʼt afford to store it, such as video surveillance of public spaces and large research projects, he says. Someday that won’t be necessary. The question is, What will get us there first: electronic or molecular memory?

DNA offers advantages over electronic storage, but whether it will ever make sense practically or economically is unclear. DNA can store more digital information per cubic millimeter than flash memory or even cutting-edge experimental memories such as quantum holography. Data stored in DNA is also recoverable for millennia (consider the 7000-year-old DNA archaeologists have extracted from human remains). And given DNAʼs biological importance, we can safely assume itʼs going to remain a readable standard for a long time. “If you look at the size per bit of stored memory as DNA, itʼs unlikely that weʼll ever get better than that,” says Joseph Jacobson, a synthetic biologist at MIT who was not involved in the project.

But making and reading DNA isnʼt yet practical. Synthesizing and sequencing DNA is expensive, although the cost for both of these technologies has been dropping at a rate of five- and twelvefold per year, respectively. What’s more, unlike electronic bits, most DNA data cannot be changed once itʼs written. And with today’s technology, information in DNA usually has to be accessed as a whole, not in parts. (There is no way to make random-access DNA memory.)

Church and his colleagues set out to demonstrate a simple way to densely store data in DNA. They converted an html draft of a book comprising 53 426 words, 11 JPG images, and one JavaScript program into a 5.27 megabit set of zeros and ones. Using software they wrote, zeros were assigned the letter A or C for the DNA bases adenine and cytosine, and ones were assigned the letter G or T for DNA bases guanine and thymine. A lowercase f from the book, for example, was represented in binary as “01100110” and encoded in DNA as “ATGAATTC.”

Synthesizing that string of bits would yield a stretch of DNA that was 5.27 million bases long. Such long stretches of DNA are particularly expensive to work with, so Church and his colleagues split the DNA sequence into short chunks that were each 96 bases long. Each chunk included a 19-bit bar code, or address, to show where that chunk belonged in the whole of the book. The DNA was synthesized, inkjet-printed on a glass DNA microchip, and then cleaved off and dried to form a 50-nanogram clump smaller than a speck of pollen.

To convert the DNA back to a book, Church and his colleagues read out the bases using commercially available sequencing technology. They then arranged the sequence, decoded it back to zeros and ones, and converted those back to an HTML book. The researchers were able to complete the project with errors in only 10 bits out of 5.27 million—on par with the raw error rate of other storage media, says Sriram Kosuri, a staff scientist at the Wyss Institute who also worked on the project.

The tome that got the honor of becoming the world’s first biological book is the forthcoming Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves. The book, coauthored by Church, will be published in more conventional forms this fall.
Similar approaches have been demonstrated before, but on a smaller scale. In 2001, Carter Bancroft and his colleagues at the Mount Sinai School of Medicine encoded in DNA the opening lines of Charles Dickensʼs A Tale of Two Cities. A 2010 project from the J. Craig Venter Institute encoded a 7920-bit watermark in a bacterium genome sequence. Churchʼs paper, however, takes us “from a few bits to many megabits,” says Jacobson. “If you have a big enough quantitative advance, at some point thereʼs a qualitative shift, and Iʼd say thatʼs the case here.”

But another researcher who studies the intersection of biology and technology and asked to remain anonymous calls Churchʼs paper “a silly vanity project” with little value. “Itʼs like showing you could painstakingly use an abacus to solve a Hamiltonian path problem that would take the average computer a microsecond,” he says. Other than maybe military intelligence, finding real-world applications for DNA storage technology “under no conceivable set of circumstances is even remotely likely,” he says.

Jacobson disagrees and says itʼs easy to dismiss the technology at first glance but that upon further consideration, itʼs clear there are near-term practical uses, like storing data that requires millions of copies. “DNA is expensive to write the first time,” he says, but making copies of it is cheap. “If you want to replicate it a billion times, I donʼt know of a cheaper way to do it.” The food and consumer products industries, for example, could insert in every product they produce a DNA identification tag to indicate the country of origin and other information consumers might want (or not want) to know.
Church says he hadnʼt thought of the food application but thinks it would work. His bookʼs DNA is biocompatible and biodegradable. “You could eat it,” he says.

IEEE Spectrum