June 1, 2004
With the aim to study the relationship between protein sequences and their native structures, we adopt vectorial representations for both sequence and structure. The structural representation is based on the Principal Eigenvector of the fold's contact matrix (PE). As recently shown, the latter encodes sufficient information for reconstructing the whole contact matrix. The sequence is represented through a Hydrophobicity Profile (HP), using a generalized hydrophobicity scale that we obtain from the principal eigenvector of a residue-residue interaction matrix and denote it as interactivity scale. Using this novel scale, we define the optimal HP of a protein fold, and predict, by means of stability arguments, that it is strongly correlated with the PE of the fold's contact matrix. This prediction is confirmed through an evolutionary analysis, which shows that the PE correlates with the HP of each individual sequence adopting the same fold and, even more strongly, with the average HP of this set of sequences. Thus, protein sequences evolve in such a way that their average HP is close to the optimal one, implying that neutral evolution can be viewed as a kind of motion in sequence space around the optimal HP. Our results indicate that the correlation coefficient between N-dimensional vectors constitutes a natural metric in the vectorial space in which we represent both protein sequences and protein structures, which we call Vectorial Protein Space. In this way, we define a unified framework for sequence to sequence, sequence to structure, and structure to structure alignments. We show that the interactivity scale is nearly optimal both for the comparison of sequences with sequences and sequences with structures.
Similar papers 1
December 2, 2004
We review and further develop an analytical model that describes how thermodynamic constraints on the stability of the native state influence protein evolution in a site-specific manner. To this end, we represent both protein sequences and protein structures as vectors: Structures are represented by the principal eigenvector (PE) of the protein contact matrix, a quantity that resembles closely the effective connectivity of each site; Sequences are represented through the ``in...
April 13, 2004
We show that the contact map of the native structure of globular proteins can be reconstructed starting from the sole knowledge of the contact map's principal eigenvector, and present an exact algorithm for this purpose. Our algorithm yields a unique contact map for all 221 globular structures of PDBselect25 of length $N \le 120$. We also show that the reconstructed contact maps allow in turn for the accurate reconstruction of the three-dimensional structure. These results in...
December 13, 2012
Various approaches have explored the covariation of residues in multiple-sequence alignments of homologous proteins to extract functional and structural information. Among those are principal component analysis (PCA), which identifies the most correlated groups of residues, and direct coupling analysis (DCA), a global inference method based on the maximum entropy principle, which aims at predicting residue-residue contacts. In this paper, inspired by the statistical physics o...
September 6, 1997
Protein structures are a very special class among all possible structures. It was suggested that a ``designability principle'' plays a crucial role in nature's selection of protein sequences and structures. Here we provide a theoretical base for such a selection principle, using a novel formulation of the protein folding problem based on hydrophobic interactions. A structure is reduced to a string of 0's and 1's which represent the surface and core sites, respectively, as the...
July 10, 2012
We present a sequence-based probabilistic formalism that directly addresses co-operative effects in networks of interacting positions in proteins, providing significantly improved contact prediction, as well as accurate quantitative prediction of free energy changes due to non-additive effects of multiple mutations. In addition to these practical considerations, the agreement of our sequence-based calculations with experimental data for both structure and stability demonstrat...
September 23, 2017
Predicting three dimensional residue-residue contacts from evolutionary information in protein sequences was attempted already in the early 1990s. However, contact prediction accuracies of methods evaluated in CASP experiments before CASP11 remained quite low, typically with $<20$% true positives. Recently, contact prediction has been significantly improved to the level that an accurate three dimensional model of a large protein can be generated on the basis of predicted cont...
October 23, 2011
The evolutionary trajectory of a protein through sequence space is constrained by function and three-dimensional (3D) structure. Residues in spatial proximity tend to co-evolve, yet attempts to invert the evolutionary record to identify these constraints and use them to computationally fold proteins have so far been unsuccessful. Here, we show that co-variation of residue pairs, observed in a large protein family, provides sufficient information to determine 3D protein struct...
January 2, 2008
The total conformational energy is assumed to consist of pairwise interaction energies between atoms or residues, each of which is expressed as a product of a conformation-dependent function (an element of a contact matrix, C-matrix) and a sequence-dependent energy parameter (an element of a contact energy matrix, E-matrix). Such pairwise interactions in proteins force native C-matrices to be in a relationship as if the interactions are a Go-like potential [N. Go, Annu. Rev. ...
November 3, 2009
Just as physicists strive to develop a TOE (theory of everything), which explains and unifies the physical laws of the universe, the life-scientist wishes to uncover the TOE as it relates to cellular systems. This can only be achieved with a quantitative platform that can comprehensively deduce and relate protein structure, functional, and evolution of genomes and proteomes in a comparative fashion. Were this perfected, proper analyses would start to uncover the underlying ph...
October 4, 2013
Mapping between sequence and structure is currently an open problem in structural biology. Despite many experimental and computational efforts it is not clear yet how the structure is encoded in the sequence. Answering this question may pave the way for predicting a protein fold given its sequence. My doctoral studies have focused on a particular phenomenon relevant to the protein sequence-structure relationship. It has been observed that many proteins having apparently dis...