The storage community is confronted with challenges related to the increasing use of data and energy conservation, necessitating the development of high-volume, non-volatile, and durable recording media. Traditional data storage methods like tapes, HDDs, and NAND flash are progressing, but innovative approaches are required. One promising solution is using macromolecules for ultra-dense storage, with DNA emerging as a prime candidate due to its exceptional information density, stability, and robustness. The current technologies for synthesizing artificial DNA and sequencing are highly efficient and accurate, and given DNA’s central role in medicine and life sciences, its relevance will persist, making it a reliable long-term storage medium.
DNA-based data storage is gaining traction, supported by proof-of-concept demonstrations, though cost-effectiveness in sequencing and synthesis remains a hurdle. Cost reductions can be achieved through improved synthesis methods, appropriate coding techniques, and reconstruction algorithms tailored to these methods. Efficient sequencing further enhances practicality. DiDAX addresses these development directions to support long-term DNA-based data storage applications. A typical DNA-based data storage system includes DNA synthesis to produce data-encoding molecules, a storage container, and a DNA sequencing device for reading the data. Encoding and decoding processes, which convert binary data into DNA molecules, must handle errors in the process.
DNA-based data storage systems differ from others due to unordered oligos in the memory mixture, resolved by using indices or barcodes. Errors in DNA typically involve substitutions, insertions, and deletions. To compete with existing storage technologies, DNA synthesis costs must be reduced. Increasing the volume of data encoded in oligos and using composite DNA letters can enhance information capacity. Composite DNA letters represent positions with a mixture of nucleotides, increasing capacity significantly.
DiDAX aims to demonstrate a 100MB DNA-based data storage system using low-cost synthesis, developing encoding schemes for random access and robust error correction. By optimizing synthesis and sequencing processes, DiDAX seeks to enhance DNA-based data storage’s information capacity and reduce costs, making it a viable alternative to traditional storage technologies. Improved synthesis codes can significantly decrease synthesis time and cost, with the project’s primary goal being to maximize information encoded within a given synthesis budget. Enhanced sequencing depth and optimized coding schemes will also improve DNA-based data storage systems’ efficiency and cost-effectiveness.