|
Research
Projects
Publications
Databases
Software
Teaching
Open Positions
People
Contact
Links
|
PhyloGibbs
-
Overview
PhyloGibbs is an algorithm for discovering regulatory sites in a
collection of DNA sequences, including multiple alignments of
orthologous sequences from related organisms. Many existing approaches
to either search for sequence-motifs that are overrepresented in the
input data, or for sequence-segments that are more conserved
evolutionary than expected. PhyloGibbs combines these two approaches
and identifies significant sequence-motifs by taking both
over-representation and conservation signals into account.
PhyloGibbs runs on arbitrary collections of multiple local sequence
alignments of orthologous sequences. The algorithm searches over all
ways in which an arbitrary number of binding sites for an arbitrary
number of transcription factors can be assigned to the multiple
sequence alignments. These binding site configurations are scored by
a Bayesian probabilistic model that treats aligned sequences by an
explicit model for the evolution of binding sites and \'background\'
intergenic DNA that takes the phylogenetic relationship between the
species in the alignment into account. The algorithm uses simulated
annealing and Monte-Carlo Markov-chain sampling to rigorously assign
posterior probabilities to all the binding sites that it reports.
-
List of the most important features:
-
The algorithm can search for an arbitrary number of sites for
an arbitrary number of different regulatory motifs. The user can
either specify the total number of sites and motifs that PhyloGibbs
needs to search for, or it can supply PhyloGibbs with a guess for the
total number of sites and motifs in the data.
-
The algorithm rigorously takes into account the phylogenetic
relationships of the species from which the input data derive. This allows
PhyloGibbs to distinguish conservation that is due to the occurrence
of functional sites from spurious conservation that is due to the
evolutionary proximity of the species. Example phylogenetic trees for
commonly used species can be downloaded from the
download page.
-
PhyloGibbs uses an anneal+track strategy that rigorously
assigns posterior probabilities to the sites it reports. In the anneal
stage the globally maximum-a-posterior-probability set of binding
sites is identified and their posterior probabilities are calculated
in the tracking stage.
-
The program can also be used to calculate the statistical
significance of a pre-specified set of putative binding sites.
-
Background probabilities for nonfunctional sequences are
implemented as Markov models of arbitrary order (to be specified by
the user). Background models can be calibrated from externally
supplied files with background sequences.
-
Users can specify informative priors for the motifs by
supplying an external file with weight matrices. This allows the
algorithm to automatically identify new binding sites for motifs for
which one or more binding sites are already known.
-
Citing
PhyloGibbs should be cited as:
-
Download
Please supply your name, institution, and email address when you
download the code. The code is actively being developed and this way
we can keep you up to date with the latest versions including bug
fixes and newly implemented features. The source code is freely
available under the
GNU Public license.
* - are required fields.
-
Linux executable with static libraries
tar-archive containing binary, man pages, and examples of usage with
example output. Should run on as is on most Linux systems.
-
Linux executable with dynamic libraries
tar-archive containing binary, man pages, and examples of usage with
example output. It requires
glib library and
GSL library being installed.
-
Windows executable
zip archive containing phylogibbs.exe executable (compiled under
cygwin
on Windows XP) and required DLL\'s from
cygwin
. Also contains man pages and usage examples.
-
MAC OSX executable
archive with executable, manpages, and usage examples.
-
Source code
tar-archive, including instructions on compiling and usage
(start with README), and example output. Requires the
GNU Scientific Library (GSL)
and
glib
libraries and headers installed (standard on most linux
systems). Should compile on most Unix-like systems, and on Microsoft
Windows in the Cygwin environment.
You can try our web interface to phylogibbs.
-
Online Tools
You can try our web
interface to Phylogibbs. The online tool supports all
options available in standalone version and features convinient
way of displaying results.
There is another online
implementation of Phylogibbs by group of Dr. Bertie
Goettgens. This tool is designed to search TFBS specifically
in human and mouse genomes.
-
Feedback
Please report bugs and problems to us
(erik.vannimwegen@unibas.ch). We welcome all
feedback on the program. If there is a feature you particularly would
like to see, please let us know. If you successfully compiled the
program on another platform please let us know and we will distribute
the executable here acknowledging you.
-
Acknowledgments
The PhyloGibbs algorithm was developed by:
The code was written by Rahul Siddharthan and Erik van Nimwegen.
|