Twist Bioscience last month announced that for the first time, an episode of a Netflix Original Series has been stored in Twist’s synthetic DNA. Upon first glance, it seemed like an advertisement for a new Black Mirror-type series about futuristic technologies. However, after reading the announcement in its entirety, I was dumbfounded. The idea of storage on ‘the cloud’ itself is complex and now DNA is being used to store data!
Surprisingly, the idea of using DNA to store data dates back to a 1959 Richard P. Feynman lecture titled: ‘Plenty of Room at the Bottom’. In this lecture, Feynman stated:
“Consider the possibility that we too can make a thing very small which does what we want.”
At the same time, researchers from the European Bioinformatics Institute (EBI) submitted a paper detailing the storage, retrieval and reproduction of over five million bits of data. The DNA files all reproduced the information with 99.99% to 100% accuracy. Figures estimated that the cost per megabyte to encode the data was $12,400 and for retrieval, $220. In 2013, Manish K. Gupta and co-workers developed a software called DNACloud. This software encodes computer files to their DNA representation, aiming to make storing data on DNA easier.
In 2015, Nick Goldman from EBI announced the Davos Bitcoin Challenge. During his presentation at the World Economic Forum annual meeting in Davos, DNA-tubes were distributed to the audience with the message that each tube contained the private key of exactly one bitcoin. The first person to sequence and decode the DNA could claim the bitcoin and win the challenge. Three years later, in 2018, Belgian PhD student Sander Wuyts was the first to complete the challenge. Retrieved from the DNA was instructions on how to claim the bitcoin, the logo of EBI, the logo of the company that printed it and a sketch of James Joyce.
Other relevant developments include research by Church and Technicolor Research and Innovation, where they stored and recovered a compressed movie sequence from DNA, with no errors. In addition, in 2019, start-up company Catalog reported that they encoded all of Wikipedia into synthetic DNA. More recently, in 2020, researchers detailed a mechanism known as DNA punch cards, which is able to store data in the form of nicks on the backbone of DNA. These researchers are the first to describe data storage on native DNA sequences and their results pave the way towards future low-cost storage solutions.
Prior to the 2020 paper, all existing DNA-based data stores were based on synthetic oligos. To store data in DNA, the data file is first converted into a binary digital sequence. For example, 00 = A, 01 = C, 10 = G and 11 = T. The DNA data file is then encoded into short segments (typically 200 to 300 bases long) that can be synthesised and stored. Each segment contains an index to identify its place within the overall data file. This allows scientists to recover part of the file (random access) before sequencing. To retrieve the data, scientists sequence segments using high-throughput next generation sequencing technologies and then decode these back into the original file.
Why use DNA?
As digital information continues to accumulate, the need for higher density and longer-term storage solutions is apparent. Currently, the global demand for data storage is outpacing the world’s storage capabilities. DNA has many advantages as a potential storage medium. For example, it is able to encode two bits per nucleotide. It is often readable despite degradation in non-ideal conditions. Importantly, DNA’s essential biological role means it has access to natural reading and writing enzymes that ensure DNA will remain a readable standard for the foreseeable future. This approach is also more compact and durable than current types of storage mediums, such as hard drives.
Although DNA information storage has huge potential, several problems need to be addressed before its broader implementation. Firstly, the cost of writing and reading information is still high and the efficiency of storing data is too low. However, as costs for nucleic acid synthesis and sequencing decrease, in the future, the technology should become cost-effective. In addition, researchers have yet to develop techniques to erase and rewrite information stored. Nevertheless, ongoing development of synthetic biology has shown the possibility of solving this problem, e.g. artificial gene circuits.
From floppy disks to USB sticks to ‘the cloud’, storage of digital information is continually changing. With the increasing amount of information worldwide, traditional storage methods face daunting challenges. Current storage media have maximal density. Furthermore, the high costs of maintaining and transferring data call for novel solutions for information storage. Who would have thought that something that has been on Earth since the very beginning could help us solve this problem?
Image credit: By Image Team – canva.com