Research group Erik van Nimwegen
Structure, function, and evolution of genome-wide regulatory networks
Most projects that we pursue concern the functioning and evolution of genome-wide regulatory systems in organisms ranging from bacteria to humans. The type of large-scale questions that we aim to address include understanding how regulatory systems are integrated on a genome-wide scale, how regulatory networks are structured, how these systems control and potentially exploit the inherent noise in gene regulatory processes, how stable cell types are defined and maintained, how gene regulation evolves, and understanding under what conditions regulatory complexity can be expected to increase in evolution. Another major topic of interest in our group is the discovery and analysis of quantitative laws of genome evolution. We are particularly interested in using high-throughput genomic data to put theories of genome evolution on a strictly empirical footing.
Our group pursues both theoretical/computational and experimental approaches and our projects can be roughly divided into'dry lab' projects in which we develop regulatory genomics tools for studying gene regulatory networks in higher eukaryotes, and 'wet lab' projects in which we study gene regulation at the single-cell level in E. coli.
Regulatory genomics tools: from binding site constellations to genome-wide regulatory programs
In spite of the large volumes of data gathered by high-throughput measurements technologies in recent years, our knowledge and understanding of the genome-wide regulatory networks that control gene expression remains extremely fragmentary. We are still very far from being able to do meaningful quantitative modeling of these gene regulatory networks. Consequently, and understandably, the focus in most of the regulatory genomics community has remained firmly on data gathering efforts. In our opinion, this continuing primacy of experimental data gathering has led to a fairly careless attitude toward data analysis methods in many studies. We feel that the lack of rigorous and robust methodologies for interpreting high-throughput data is currently a major stumbling block in regulatory genomics research, and our group aims to help remedy this situation by developing analysis tools that use methods of the highest rigour, that can be automatically applied to raw data-sets, and that provide robust predictions with concrete biological interpretations. We typically strive to make our tools available through automated interactive web-services, thereby empowering experimental researchers to perform cutting-edge analysis methods on their own data.
For over a decade we have been developing Bayesian probabilistic methods that combine information from high-throughput experiments (e.g. RNA-seq, ChIP-seq) with comparative genomic sequence analysis. Very roughly speaking, our projects concern identifying regulatory sites genome-wide in DNA and RNA, understanding how constellations of regulatory sites determine binding patterns of transcription factors and, ultimately, gene expression and chromatin state patterns. Finally, we aim to develop quantitative and predictive models that describe how dynamic interactions between transcriptional and post-transcriptional regulators implement gene regulatory programs that define cellular states and the transitions between them.
With regards to the predictions of regulatory sites in DNA and RNA, we have been working on extending the well-known position-specific weight matrix models of transcription factor (TF) binding specificity into dinucleotide weight tensor models that take arbitrary dependencies between pairs of positions into account. We have also been developing a completely automated procedure, called CRUNCH for analysis of ChIP-seq data, including comprehensive downstream motif analysis of binding regions. Using CRUNCH we have analyzed large-collections of ChIP-seq datasets in humans, mouse, and Fly, and have been using these to curate comprehensive sets of regulatory motifs in these model organisms. All our genome-wide regulatory site predictions are available in various formats through our SwissRegulon database and genome browser.
To take a first step toward modeling how constellations of regulatory sites determine genome-wide expression patterns we developed an approach, called Motif Activity Response Analysis (MARA), that models the expression of each gene as a linear function of the binding sites that occur in its promoter, and unknown 'motif activities' that represent the condition-dependent activities of the regulators binding to these sites. Since the original presentation of this approach, in the FANTOM4 collaboration with the RIKEN Institute in Yokohama, Japan, we have been working both on completely automating the MARA approach and on extending it in a number of ways, including using MARA to model chromatin dynamics in terms of local constellations of regulatory sites. MARA is now available through a fully automated webserver, called ISMARA (integrated system for motif activity response analysis), available at ismara.unibas.ch , where users can perform automated motif activity analysis of their micro-array, RNA-seq, or ChIP-seq data, simply by uploading raw data (Fig. 1). The system has already been successfully used to predict key regulatory interactions in substantial number of studies, and we are working on various further improvements and extensions of the system. This includes extension to additional model organisms such as Drosophila and E. coli, significantly extending the set of regulatory motifs that it uses, and incorporation of distal cis-regulatory modules. In addition, several of ISMARA's recent applications involve systems that are highly medically relevant and we plan to adapt ISMARA in ways that aim to increase its medical relevance. In particular, we want to extend ISMARA to allow it to infer the effects of single nucleotide polymorphisms on gene expression and regulatory programs genome-wide. Finally, we are currently working to adapting ISMARA to be able to analyze data from single-cell RNA-seq experiments.
Gene regulatory dynamics at the single-cell level in bacteria
Since 2010 our group also includes a wet lab component where we study gene regulation at the single-cell level in E. coli. We are particularly interested in stochastic aspects of gene regulation at the single-cell level, how fluctuations in the physiological state of the cell couple to gene expression fluctuations, how stochastic fluctuations propagate through the regulatory network, and the role of stochasticity in the evolution of gene regulation.
In the first major wet lab project of our lab, we set out to study how natural selection has shaped the noise characteristics of E. coli promoters. In particular, using FACS selection in combination with error-prone PCR, and starting from a large collection of random sequences, we evolved a collection of synthetic promoter sequences in the lab. Surprisingly, we found that these synthetic promoters generally exhibit noise levels that are lower than those of most native E. coli promoters. In particular, native promoters that are known to be regulated by one or more TFs tend to exhibit elevated noise levels. Since our synthetic promoters were selected solely on their mean expression and not on their noise properties, this allowed us to conclude that native promoters of regulated genes must have experienced selection pressures that caused their noise levels to increase.
To explain these observations we developed a new theory for the evolution of gene regulation that calculates the ‘fitness’ of a promoter as a function of its coupling to transcriptional regulators and the noise levels of these regulators (Fig. 2). This analysis shows that noise propagation from regulators to their target genes can often be functional, acting as a rudimentary form of regulation. In particular, whenever regulation has limited accuracy in implementing a promoter's desired expression levels, selection favors noisy gene regulation. The theory provides a novel framework for understanding when and how gene regulation will evolve, and shows that expression noise generally facilitates the evolution of gene regulatory interactions.
The results of this project suggest that gene expression noise to a significant extent results from propagation of noise from regulators to their target genes. This in turn implies that noise levels should be condition-dependent and we are currently investigating how noise levels change as growth conditions change.
A microfluidic framework for studying single-cell gene expression and growth dynamics
Beyond studying single-cell gene expression using flow cytometry, we aim to directly track both the growth and gene expression dynamics of single-cells as they are experiencing changing growth conditions. To this end we have worked on establishing a micro-fluidic setup in our lab that allows us to track growth of single cells and expression of fluorescent reporters in these cells using time-lapse microscopy. Our novel design is an extension of the so-called Mother Machine design (Fig. 3), which allows us to dynamically mix different growth media, thereby allowing us to arbitrarily vary the growth conditions that the cells experience. In addition, through a successful collaboration with the group of Gene Myers (MPI, Dresden), we have developed image analysis procedures that automatically segment and track the cells, allowing us to obtain accurate tracking of size and gene expression in a large number of single cell lineages.
Using this microfluidic setup we are currently studying the stochastic response of single E. coli cells to switches in growth conditions.
Integrated Genotype/Phenotype evolution in E. coli
The availability of large numbers of complete genome sequences has led, over the last 15 years, to a revolution in our understanding of genome evolution and the identification of a number of surprising ‘quantitative laws’ of genome evolution. However, whereas the insights gained from analysis of genomic data have been impressive, they have taught us surprisingly little about what selective pressures in the wild are driving genotype dynamics. In this project we aim to learn about selection pressures that are acting in the wild by combining information on genotype evolution in closely related bacterial strains with extensive quantitative characterization of their phenotypes. In particular, using next-generation sequencing we have determined complete genomes of 91 wild E. coli isolates that were all obtained from a common location at the shore of lake Superior (Minnesota, USA). In parallel we have been characterizing the phenotypes of these strains by assessing their growth in a wide variety of conditions using automated image-analysis of cultures on agar plates. We have started developing theoretical models to describe the joint evolution of genotypes and phenotypes of the strains. We are aiming in particular to develop rigorous quantitative measures of the extent to which different phenotypic traits have been under natural selection in the history of these strains, and to infer how this has impacted their genomes. As part of this project we have also recently developed a new method, called REALPHY, for automatically inferring phylogenies from raw next-generation sequencing data.