People | Projects | Data&Software | Stages | Seminars | Publications


Equipe de Génomique Analytique
Université Pierre et Marie Curie, INSERM U511
Responsable : Alessandra Carbone,
Alessandra DOT Carbone AT lip6 DOT fr





Mathematical and Algorithmic Approaches to Systems Biology

Our group is working on various problems connected with the functioning and evolution of biological systems. In particular, we are interested in the systemic description of genetic and biochemical networks, such as those responsible for gene regulation. Our approach is based on quantitative experimental measurements and theoretical modeling. We use mathematical tools (mainly from statistics and combinatorics) and algorithmic tools to study basic principles of cellular functioning starting from genomic data.

1. Sequence evolution in microbial organisms: essential genes, synthetic biology and genome evolution.

Information concerning environmental organisation, essentiality of metabolic networks and gene essentiality can be extracted from codon bias analysis. Codon bias explains also evolutionary tendencies of phages whose hosts are translationally biased bacteria. We are interested to find ways to exploit this information for metabolic network reconstruction of metagenomics sampling, genome synthesis, and modeling evolvability of gene expression under changes of environmental conditions. Findings in this direction would provide a foundation for realistic models of regulatory evolution.

A.Carbone, R.Madden, Insights on the evolution of metabolic networks of unicellular translationally biased organisms from transcriptomic data and sequence analysis, Journal of Molecular Evolution. 61:456469, 2005.
A.Carbone, Computational prediction of genomic functional cores specific to different microbes. Journal of Molecular Evolution, 63(6):733-746, 2006.
J.Breton, E.Bart-Delabesse, S.Biligui, A.~Carbone, X.~Sellier, M.Okome-Nkoumou, C.Nzamba, M.Kombila, I.Accoceberry, M.Thellier. Genotypic analysis of Enterocytozoon bieneusi isolates from Gabon and Cameroon: reporting a new highly divergent sequence and a wide distribution of genotypes'', Journal of Clinical Microbiology, 45(8):2580--2589, 2007.
A.Carbone, Codon bias is a major factor explaining phage evolution in translationally biased hosts, Journal of Molecular Evolution,
66(3):210--23, 2008.
A.Carbone and A.Mathelier, Environmental and physiological insights from microbial genome sequences. In Elements of Computational Systems Biology,
Huma Lodhi and Stephen Muggleton (eds.), Wiley Book Series in Bioinformatics, 2008.

A.Mathelier, A.Carbone, Chromosomal periodicity and positional networks of genes in Escherichia coli, in preparation, 2009.

  2. Sequence evolution in eukaryotic organisms: detection of regulatory signals in sequences.

miRNAs have been demonstrated to be of crucial importance in cell regulation. Prediction of miRNAs constitutes nowadays a computational challenge even in its most simplified forms. We are interested to look for miRNA in Plasmodium genome by searching in the Plasmodium genome for known miRNAs (from Arabidopsis thaliana and Oryza sativa for example), where besides similar hairpins, we aim to predict new precursor RNA structures. A reliable answer to this question demands for the development of fast pattern matching algorithms for the exhaustive screening of a possibly very large genome as well as of for suitable physical-chemical-combinatorial criteria for the detection of acceptable miRNA structures.

We are interested to search for other regulatory motifs in intergenic regions of several genomes of the Plasmodium family. The statistical and combinatorial analysis that we develop is based on a comparative approach of genomic sequences. More precisely, we are interested to study 3'UTR regions, 5'UTR regions and promoter regions of P. falciparum. The recent discovery of specific motifs in mammalian genomes characterizing the three types of regions above suggests to compare the four available Plasmodium genomes to detect potential specific signals.

  3. Protein evolution: detection of distantly related proteins

We propose to develop two novel computational approaches to annotation based on sequence and structural homology search. We have recently developed a new tool, named PHYBAL, for an optimal alignment of pairs of distantly related proteins as well as some numerical criteria for the selection, within large databases, of pairs of proteins susceptible to share the same structure. The method allows us to align protein pairs with very weak sequence identity (10-15%). The extension of the alignment tool to multiple alignment and the integration of suitable selecting criteria within the alignment tool will provide a way to realize a systematic large scale search of homologous proteins for genomes that demonstrated difficult annotation. The second bioinformatics approach takes advantage of the new amount of available protein structure information and of the structure-function relationship. We propose to adapt one of the existing threading methods, FROST, to local structural information coming from specific protein families. This information is expected to refine the outcomes nowadays attainable with FROST. Affinity between a sequence and its fold will be established through adequate scores, and selection criteria will be developed. We want to apply the methods to the detection of transcription factors in Plasmodium and of human glycosilase proteins. The predictons will be checked by two experimental biologists that collaborate with us. Both methodologies have general scope and multiple potential applications, other than the one we propose. As a result of a better alignment, the reconstruction of phylogenetic trees might result more trustful.

J.Baussand, C.Deremble, A.Carbone, Periodic distributions of hydrophobic amino acids allows to define fundamental building blocks to align distantly related proteins, Proteins: Structure, Function and Bioinformatics, 67(3):695-708, 2007.
J.Baussand, A.Carbone, Inconsistent distances in substitution matrices can be avoided by properly handling hydrophobic residues, Evolutionary Bioinformatics, 1-6, 2008. In press.

3. Protein evolution: detection of functional sites on protein complexes and detection of potential protein partners.

The Joint Evolutionary Trees (JET) method detects protein interfaces, a core of residues involved in the folding process, residues susceptible to be relevant to site-directed mutagenesis and to molecular recognition. JET is a fully automatized system that we recently developed at the lab and it can be applied to a large scale analysis of protein interfaces. This research constitutes now a part of the DECRYPTHON project and it will be coupled with a docking algorithm for a large scale detection of potential protein partners. We shall be particularly focused in protein partners involved in neuromuscular deseases. We aim to construct a new database of information on functionally interacting proteins. Further extensions will include studies of protein binding sites involved in interactions with DNA or ligands (such as drugs). This will be of significant medical interest since, while it is now feasible to design a small molecule to inhibit or enhance the binding of a given molecule to a given partner, it is much more difficult to understand how that same small molecule could directly or indirectly influence other existing interactions. The approach proposed in this project combines evolutionary information (how evolution modified proteins to enhance their function) and molecular modeling (computational determination of the relative position of two interacting protein partners) to identify potential interactions.

The current project is a pilot project designed for an eighteen-month period. It is well adapted to grid calculations since it consists of a series of independent calculations with limited data transfers. In addition to being a robust support for the development of a grid computing platform, it will provide the basis for real-case predictions of functionally significant partners that will be carried out during subsequent stages of the project.

Calculations are done on the World Community Grid:
Phase I, with cross-docking of about 150 proteins, was completed.
Phase II, with targeted cross-docking of about 4000 proteins will be launched at the end of the year.

A description of the project with an update on the current status can be found here.

S.Engelen, L.A.Trojan, S.Sacquin-Mora, R.Lavery, A.Carbone, Joint Evolutionary Trees : a large scale method to predict protein interfaces based on sequence sampling. PLoS Computational Biology, 5(1): e1000267, 1--17, 2009. doi:10.1371/journal.pcbi.1000267
S.Sacquin-Mora, A.Carbone, R.Lavery, Identification of protein interaction partners and protein-protein interaction sites via cross-docking simulations, Journal of Molecular Biology, 382:1276--1289, 2008.

  4. Protein evolution: detection of networks of co-evolved residues.

It has been demonstrated that evolutionarily conserved networks of residues mediate allosteric communication in proteins involved in cellular signaling, the process by which signals originating at one site in a protein propagate reliably to affect distant functional sites. The general principles of protein structure that underlie this process remain unknown. In a seminal paper Ranganathan described a sequence-based statistical method for quantitatively mapping the global network of amino acid interactions in a protein. The method reveals a surprisingly simple architecture for amino acid interactions in each protein family: a small subset of residues forms physically connected networks that link distant functional sites in the tertiary structure. The evolutionarily conserved sparse networks of amino acid interactions are proposed as representative structural motifs for allosteric communication in proteins. To investigate further Ranganathan approach, we developed a new method, based on a fine combinatorial analysis of phyogenetic trees associated to a protein family to reconstruct networks of co-evolved residues from sequence analysis. The approach will be used to detect motifs of co-evolved residues which will be used to detect distantly related protein pairs.

J.Baussand, A.Carbone, A combinatorial approach to detect co-evolved amino-acid networks in protein families with variable divergence, 2008. Submitted.

  7. DNA nanotechnologies: nanoscopic aperiodic tiling in 3 dimensions.

Periodic structures self-assemblying in 3 dimensions have been conceived and realised experimentally with DNA molecules. We study DNA molecules that can be used to realise molecular assembly growing as three-dimensional fractals. On a theoretical base, the interaction between geometry and tile codying plays a key role.

A.Carbone, N.C.Seeman, Circuits and Programmable Self-Assembling DNA Structures, Proceedings of the National Academy of Science USA, 99:12577-12582, 2002.
A.Carbone, N.C.Seeman, A Root to Fractal DNA Assembly, Natural Computing, 1:469-480, 2002.
A.Carbone, N.C.Seeman, Coding and Geometrical Shapes in Nanostructures: fractal DNA assemblies, Natural Computing, 2:133-151, 2003.
A.Carbone, N.C.Seeman. Molecular Tiling and DNA self-assembly, in "Aspects of Molecular Computing", N.Jonoska, G.Paun, G.Rozenberg (Eds), Lecture Notes in Computer Science 2950, Springer, 2003.
A.Carbone, C.Mao, P.E.Constantinou, B.Ding, J.Kopatsch, W.B.Sherman, N.C.Seeman, 3D Fractal DNA Assembly from Coding, Geometry and Protection, Natural Computing, 3:235-252, 2004.

Detection and classification of protein interaction sites

Richard Lavery
Laboratoire de Biochimie Theorique, CNRS UPR 9080, CNRS
Detection of protein partners and protein interaction sites

Pascale Guicheney
Proteins involved in human muscular distrophy

Sophie Sacquin-Mora
Detection of protein partners and protein interaction sites

Biological networks

François Képès
ATelier de Génomique Cognitive, CNRS et Génopole d'Evry
Macromolecular networks, regulation mechanisms: protein-protein and protein-DNA interaction.

Richard Madden
Institut des Hautes Etudes Scientifiques, Bures-sur-Yvette
Metabolic networks and genome comparison.

Comparative genomics, sequence analysis and codon bias

François Képès
ATelier de Génomique Cognitive, CNRS et Génopole d'Evry
Codon bias, formal spaces of gene sequences, evolution and codon bias.

Jacques van Helden
Université Libre de Bruxelles, Belgique
Detection of binding sites in prokaryotic promoter regions and comparison of statistical signals across organisms.

Catherine Vaquero
INSERM U511, Immunologie cellulaire et moléculaire des infections parasitaires
Detection of new transcription factors for Plasmodium falciparum.

Thierry Grange
CNRS, Institut Jacques Monod
Detection of new glycosylase proteins.

DNA Nanotechnology

Ned Seeman
Department of Chemistry, New York University, New York, USA
DNA self-assembly, nanostructures, DNA computing.