The capacity of DNA for information encoding


Information encoding and processing in DNA has proved to be an important problem for biomolecular computing, including the well studied codeword design problem. A lower bound is established for the capacity of DNA to encode information using a combinatorial model of DNA homology given by the so-called h-distance. This bound decreases exponentially with a parameter τ that roughly codes for stringency in reaction conditions. We further introduce a new family of near-optimal codeword sets, so-called shuffle codes. This construction, which is optimal in terms of efficiency, can also be used to produce set of codewords with a given constant GC-content. These codes yield estimates of the capacity of DNA oligonucleotides to store abiotic information in DNA arrays as defined in [11]. Finally, we discuss the sensitivity of the corresponding DNA chip encodings to store and discriminate inputs, including the regions of maximum discrimination and uncertainty. © Springer-Verlag Berlin Heidelberg 2005.

Publication Title

Lecture Notes in Computer Science