December 30, 2016
The probability distribution of sequences with maximum entropy that satisfies a given amino acid composition at each site and a given pairwise amino acid frequency at each site pair is a Boltzmann distribution with $\exp(-\psi_N)$, where the total interaction $\psi_N$ is represented as the sum of one body and pairwise interactions. A protein folding theory based on the random energy model (REM) indicates that the equilibrium ensemble of natural protein sequences is a canonica...
September 24, 2005
Recent work has shown that expression level is the main predictor of a gene’s evolutionary rate, and that more highly expressed genes evolve slower. A possible explanation for this observation is selection for proteins which fold properly despite mistranslation, in short selection for translational robustness. Translational robustness leads to the somewhat paradoxical prediction that highly expressed genes are extremely tolerant to missense substitutions but nevertheles...
September 27, 2005
The frequencies of A, C, G and T in mitochondrial DNA vary among species due to unequal rates of mutation between the bases. The frequencies of bases at four-fold degenerate sites respond directly to mutation pressure. At 1st and 2nd positions, selection reduces the degree of frequency variation. Using a simple evolutionary model, we show that 1st position sites are less constrained by selection than 2nd position sites, and therefore that the frequencies of bases at 1st posit...
July 11, 2008
There is an intrinsic relationship between the molecular evolution in primordial period and the properties of genomes and proteomes of contemporary species. The genomic data may help us understand the driving force of evolution of life at molecular level. In absence of evidence, numerous problems in molecular evolution had to fall into a twilight zone of speculation and controversy in the past. Here we show that delicate structures of variations of genomic base compositions a...
April 30, 2010
Empirical substitution matrices represent the average tendencies of substitutions over various protein families by sacrificing gene-level resolution. We develop a codon-based model, in which mutational tendencies of codon, a genetic code, and the strength of selective constraints against amino acid replacements can be tailored to a given gene. First, selective constraints averaged over proteins are estimated by maximizing the likelihood of each 1-PAM matrix of empirical amino...
August 14, 2007
Across all kingdoms of biological life, protein-coding genes exhibit unequal usage of synonmous codons. Although alternative theories abound, translational selection has been accepted as an important mechanism that shapes the patterns of codon usage in prokaryotes and simple eukaryotes. Here we analyze patterns of codon usage across 74 diverse bacteriophages that infect E. coli, P. aeruginosa and L. lactis as their primary host. We introduce the concept of a `genome landscape...
January 15, 2003
We describe the evolution of macromolecules as an information transmission process and apply tools from Shannon information theory to it. This allows us to isolate three independent, competing selective pressures that we term compression, transmission, and neutrality selection. The first two affect genome length: the pressure to conserve resources by compressing the code, and the pressure to acquire additional information that improves the channel, increasing the rate of info...
August 25, 2013
It is likely that the strength of selection acting upon a mutation varies through time due to changes in the environment. However, most population genetic theory assumes that the strength of selection remains constant. Here we investigate the consequences of fluctuating selection pressures on the quantification of adaptive evolution using McDonald-Kreitman (MK) style approaches. In agreement with previous work, we show that fluctuating selection can generate evidence of adapt...
September 15, 2017
Models of codon evolution are commonly used to identify positive selection. Positive selection is typically a heterogeneous process, i.e., it acts on some branches of the evolutionary tree and not others. Previous work on DNA models showed that when evolution occurs under a heterogeneous process it is important to consider the property of model closure, because non-closed models can give biased estimates of evolutionary processes. The existing codon models that account for th...
November 10, 2003
It is important to understand how protein folding and evolution influences each other. Several studies based on entropy calculation correlating experimental measurement of residue participation in folding nucleus and sequence conservation have reached different conclusions. Here we report analysis of conservation of folding nucleus using an evolutionary model alternative to entropy based approaches. We employ a continuous time Markov model of codon substitution to distinguish...