Mondrian Piet: Red Tree Erik van Nimwegen Group
Bioinformatics and Systems Biology

PhyloGibbs

  1. Overview

    PhyloGibbs is an algorithm for discovering regulatory sites in a collection of DNA sequences, including multiple alignments of orthologous sequences from related organisms. Many existing approaches to either search for sequence-motifs that are overrepresented in the input data, or for sequence-segments that are more conserved evolutionary than expected. PhyloGibbs combines these two approaches and identifies significant sequence-motifs by taking both over-representation and conservation signals into account.

    PhyloGibbs runs on arbitrary collections of multiple local sequence alignments of orthologous sequences. The algorithm searches over all ways in which an arbitrary number of binding sites for an arbitrary number of transcription factors can be assigned to the multiple sequence alignments. These binding site configurations are scored by a Bayesian probabilistic model that treats aligned sequences by an explicit model for the evolution of binding sites and \'background\' intergenic DNA that takes the phylogenetic relationship between the species in the alignment into account. The algorithm uses simulated annealing and Monte-Carlo Markov-chain sampling to rigorously assign posterior probabilities to all the binding sites that it reports.

  2. List of the most important features:

    • The algorithm can search for an arbitrary number of sites for an arbitrary number of different regulatory motifs. The user can either specify the total number of sites and motifs that PhyloGibbs needs to search for, or it can supply PhyloGibbs with a guess for the total number of sites and motifs in the data.
    • The algorithm rigorously takes into account the phylogenetic relationships of the species from which the input data derive. This allows PhyloGibbs to distinguish conservation that is due to the occurrence of functional sites from spurious conservation that is due to the evolutionary proximity of the species. Example phylogenetic trees for commonly used species can be downloaded from the download page.
    • PhyloGibbs uses an anneal+track strategy that rigorously assigns posterior probabilities to the sites it reports. In the anneal stage the globally maximum-a-posterior-probability set of binding sites is identified and their posterior probabilities are calculated in the tracking stage.
    • The program can also be used to calculate the statistical significance of a pre-specified set of putative binding sites.
    • Background probabilities for nonfunctional sequences are implemented as Markov models of arbitrary order (to be specified by the user). Background models can be calibrated from externally supplied files with background sequences.
    • Users can specify informative priors for the motifs by supplying an external file with weight matrices. This allows the algorithm to automatically identify new binding sites for motifs for which one or more binding sites are already known.
  3. Citing

    PhyloGibbs should be cited as:

  4. Download

    Please supply your name, institution, and email address when you download the code. The code is actively being developed and this way we can keep you up to date with the latest versions including bug fixes and newly implemented features. The source code is freely available under the GNU Public license.

    Name*
    E-mail*
    Institution*
    Notes

    * - are required fields.

    You can try our web interface to phylogibbs.

  5. Online Tools

    You can try our web interface to Phylogibbs. The online tool supports all options available in standalone version and features convinient way of displaying results.

    There is another online implementation of Phylogibbs by group of Dr. Bertie Goettgens. This tool is designed to search TFBS specifically in human and mouse genomes.

  6. Feedback

    Please report bugs and problems to us (erik.vannimwegen@unibas.ch). We welcome all feedback on the program. If there is a feature you particularly would like to see, please let us know. If you successfully compiled the program on another platform please let us know and we will distribute the executable here acknowledging you.

  7. Acknowledgments

    The PhyloGibbs algorithm was developed by:

    The code was written by Rahul Siddharthan and Erik van Nimwegen.