July 23, 2006
Similar papers 5
May 8, 2008
A plausible architecture of an ancient genetic code is derived from an extended base triplet vector space over the Galois field of the extended base alphabet {D, G, A, U, C}, where the letter D represents one or more hypothetical bases with unspecific pairing. We hypothesized that the high degeneration of a primeval genetic code with five bases and the gradual origin and improvements of a primitive DNA repair system could make possible the transition from the ancient to the m...
March 1, 2020
The genetic code is the function from the set of codons to the set of amino acids by which a DNA sequence encodes proteins. Since the codons also influence the shape of the DNA molecule itself, the same sequence that encodes a protein also has a separate geometric interpretation. A question then arises: How well-duplexed are these two "codes"? In other words, in choosing a genetic sequence to encode a particular protein, how much freedom does one still have to vary the geomet...
September 22, 2020
Genomes may be analyzed from an information viewpoint as very long strings, containing functional elements of variable length, which have been assembled by evolution. In this work an innovative information theory based algorithm is proposed, to extract significant (relatively small) dictionaries of genomic words. Namely, conceptual analyses are here combined with empirical studies, to open up a methodology for the extraction of variable length dictionaries from genomic sequen...
May 2, 2019
Most living systems rely on double-stranded DNA (dsDNA) to store their genetic information and perpetuate themselves. This biological information has been considered the main target of evolution. However, here we show that symmetries and patterns in the dsDNA sequence can emerge from the physical peculiarities of the dsDNA molecule itself and the maximum entropy principle alone, rather than from biological or environmental evolutionary pressure. The randomness justifies the h...
January 21, 2011
Genetic regulatory networks enable cells to respond to the changes in internal and external conditions by dynamically coordinating their gene expression profiles. Our ability to make quantitative measurements in these biochemical circuits has deepened our understanding of what kinds of computations genetic regulatory networks can perform and with what reliability. These advances have motivated researchers to look for connections between the architecture and function of geneti...
December 19, 2006
Relation of genome sizes to organisms complexity is still described rather equivocally. Neither the number of genes (G-value), nor the total amount of DNA (C-value) correlates consistently with phenotype complexity. Using information theory considerations we developed a model that allows a quantative estimate for the amount of functional information in a genomic sequence. This model easily answers the long-standing question of why GC content is increased in functional regions...
January 30, 2012
Designing short DNA words is a problem of constructing a set (i.e., code) of n DNA strings (i.e., words) with the minimum length such that the Hamming distance between each pair of words is at least k and the n words satisfy a set of additional constraints. This problem has applications in, e.g., DNA self-assembly and DNA arrays. Previous works include those that extended results from coding theory to obtain bounds on code and word sizes for biologically motivated constraints...
February 14, 2001
The evolution in coding DNA sequences brings new flexibility and freedom to the codon words, even as the underlying nucleotides get significantly ordered. These curious contra-rules of gene organisation are observed from the distribution of words and the second moments of the nucleotide letters. These statistical data give us the physics behind the classification of bacteria.
October 15, 2010
The problem of differentiating the informational content of coding (exons) and non-coding (introns) regions of a DNA sequence is one of the central problems of genomics. The introns are estimated to be nearly 95% of the DNA and since they do not seem to participate in the process of transcription of amino-acids, they have been termed "junk DNA." Although it is believed that the non-coding regions in genomes have no role in cell growth and evolution, demonstration that these r...
February 2, 2015
We consider the problem of storing and retrieving information from synthetic DNA media. The mathematical basis of the problem is the construction and design of sequences that may be discriminated based on their collection of substrings observed through a noisy channel. This problem of reconstructing sequences from traces was first investigated in the noiseless setting under the name of "Markov type" analysis. Here, we explain the connection between the reconstruction problem ...