Research Overview

Objectives

Databases, derived from the activities of molecular biologists, are expanding rapidly and contain a diverse range of data encompassing everything from DNA and protein sequences to information on protein-protein complex formation. The quest for us, as computational molecular biologist, is to assist in unravelling the complexities of intra- and inter- celluar functions through the construction and application of tools for the assimilation of these very large, and often very complex, data sets. Ideally, such tools should be easy to use, universally accessible via the Internet, and dynamic in both the range of questions that can be asked of the data and the information content displayed. In this laboratory we are particularly interested in understanding data generated to describe the structure and function of proteins - which are usually the major components of cellular systems - and mapping how they interact with one another in the context of the systems they support. Such studies involve not only the analysis of the more than thirty thousand known protein structures but also, with the aid of computer simulations, the way in which protein flexibility, mobility and specificity can affect protein interactions.

 

Cluster analysis of protein interaction networks

A key interest of the laboratory is the analysis of protein-protein networks. However, first reliable networks must be constructed by careful assignment of which protein-protein interactions are likely to occur. Several experimental methods are currently in use to detect protein interactions, with two-hybrid screens being, to date the approach that has yielded the greatest volume of data. However, its level of accuracy is not particularly high and typically must be supported by additional evidence - for example, tandem-affinity purification, mass spectroscopy, X-ray crystallography. Nevertheless, there are now millions of protein sequences available from a wide variety of organisms, including human, that can be used to establish potential interactions. The reason for this is that not only does a similarity between two proteins sequences suggest a similar protein fold but also that if two proteins are known to interact, then homologues to those proteins are likely to interact in a similar way.

We have utilised a protein sequence homology-based method for inferring interacting proteins with the S. pombe, rat and human genomes and developed a scoring function, based upon both sequence similarity and the number of experiments relating interacting homologues. We have created a web-based server called PIP (Potential Interactions of Proteins) that enables the user to type in a gene name from any of the above three eukaryotic genomes and graphically display their potential partners.

We beleive that one of the major uses of these computer generated 'interactomes' is to assist in the interpretation of microarray expression data.

 

Modelling protein structure

Few cellular systems have as yet been fully characterised by experimental methods, such as X-ray crystallography and NMR spectroscopy, and so are not described in detail at the atomic level. Therefore, it is very important to try to complete the repitoire of protein structures by modelling, utilising all the data available, again most inportantly, the vast number of protein sequences. By far the most successful computational technique for achieving this, and one that has been under continual development within this laboratory over a number of years, is called Comparative Modelling (CM) - modelling proteins from experimentally observed protein folding space. Our particular interest in this field is to use the principles of protein evolution to construct models. We have developed a set of computational tools based on a genetic algorithm, which borrows the natural mechanisms of chromosomal mutation and recombination, used throughout evolution to generate diversity in populations. These algorithms are very computationally expensive, nevertheless, over the coming years, these developments will be incorporated into our online CM modelling tool 3D-JIGSAW.

It is very important that we rigorously benchmark our protein modelling algorithms, therefore, the laboratory enthusiastically participates in the biannual CASP (Critical Asseessment of Techniques for Protein Structure Prediction) blind trials.

 

Modelling protein interactions

To reach our ultimate goal of understanding complete molecular systems we must first be able to understand, and predict, how proteins interact with each other. Protein binding is often thought of as a "lock-and-key" process; however, we now know that this is an oversimplification. Just how much rearrangement does occur, and whether this can be predicted when two proteins form a complex, is one of this laboratory's central interests. We are developing computer-based algorithms for docking proteins together that take account of the protein's natural flexibility. This is achieved by using a combination of tools that include molecular dynamics, sequence conservation analysis, and conserved coordinate geometry between key, highly stabilising, residues at protein-protein interfaces.

Here too our algorithm development schemes are regularly tested in international blind trials; CAPRI (Critical Assessment of PRediction of Interactions.

For further details on our work see the Laboratory publications.