July 8, 2021
How information is encoded in bio-molecular sequences is difficult to quantify since such an analysis usually requires sampling an exponentially large genetic space. Here we show how information theory reveals both robust and compressed encodings in the largest complete genotype-phenotype map (over 5 trillion sequences) obtained to date.
October 30, 2020
In computer science, we can theoretically neatly separate transmission and processing of information, hardware and software, and programs and their inputs. This is much more intricate in biology, Nevertheless, I argue that Shannon's concept of information is useful in biology, although its application is not as straightforward as many people think. In fact, the recently developed theory of information decomposition can shed much light on the complementarity between coding and...
July 4, 2014
In our paper selected linguistic features of genomes to study the statistics of the gene codes are considered. We present the information theory from which it follows that if the system is described by distributions of hyperbolic type it leads to the possibility of entropy loss and stability. We show that the histograms of gene lengths are similar to that of language words. We show the correspondence between presented theory and results for the number of replicated genes and ...
November 10, 2022
Due to its longevity and enormous information density, DNA is an attractive medium for archival data storage. Thanks to rapid technological advances, DNA storage is becoming practically feasible, as demonstrated by a number of experimental storage systems, making it a promising solution for our society's increasing need of data storage. While in living things, DNA molecules can consist of millions of nucleotides, due to technological constraints, in practice, data is stored o...
January 11, 2009
We have presented the basic knowledge on the structure of molecules coding the genetic information, mechanisms of transfer of this information from DNA to proteins and phenomena connected with replication of DNA. In particular, we have described the differences of mutational pressure connected with replication of the leading and lagging DNA strands. We have shown how the asymmetric replication of DNA affects the structure of genomes, positions of genes, their function and ami...
February 4, 2002
It is a fascinating subject to explore how well we can understand the processes of life on the basis of fundamental laws of physics. It is emphasised that viewing biological processes as manipulation of information extracts their essential features. This information processing can be analysed using well-known methods of computer science. The lowest level of biological information processing, involving DNA and proteins, is the easiest one to link to physical properties. Physic...
May 22, 2006
A dynamical theory for the evolution of the genetic code is presented, which accounts for its universality and optimality. The central concept is that a variety of collective, but non-Darwinian, mechanisms likely to be present in early communal life generically lead to refinement and selection of innovation-sharing protocols, such as the genetic code. Our proposal is illustrated using a simplified computer model, and placed within the context of a sequence of transitions that...
January 19, 2006
We consider the problem of efficiently designing sets (codes) of equal-length DNA strings (words) that satisfy certain combinatorial constraints. This problem has numerous motivations including DNA computing and DNA self-assembly. Previous work has extended results from coding theory to obtain bounds on code size for new biologically motivated constraints and has applied heuristic local search and genetic algorithm techniques for code design. This paper proposes a natural opt...
February 27, 2004
We introduce a novel method to analyse complete genomes and recognise some distinctive features by means of an adaptive compression algorithm, which is not DNA-oriented. We study the Information Content as a function of the number of symbols encoded by the algorithm. Preliminar results are shown concerning regions having a sublinear type of information growth, which is strictly connected to the presence of highly repetitive subregions that might be supposed to have a regulato...
July 11, 2008
There is an intrinsic relationship between the molecular evolution in primordial period and the properties of genomes and proteomes of contemporary species. The genomic data may help us understand the driving force of evolution of life at molecular level. In absence of evidence, numerous problems in molecular evolution had to fall into a twilight zone of speculation and controversy in the past. Here we show that delicate structures of variations of genomic base compositions a...