Epi-MARA results and supplementary materials

Arnold, Schöler et al "Modeling of epigenome dynamics identifies transcription factors that mediate Polycomb targeting"

Epi-MARA models the dynamics of chromatin marks (e.g. H3K27me3, H3K27ac, ...) across different samples in terms of predicted transcription factor binding sites (TFBSs) and infers the activities of TFs in driving chromatin dynamics.

We make Epi-MARA available through an easy-to-use web interface where users can upload chromatin mark data, have Epi-MARA analysis performed automatically, and browse the results through a graphical interface. Here we provide step-by-step instructions on the use of the Epi-MARA web interface. In particular, we how the different Epi-MARA analyses in"Modeling of epigenome dynamics during differentiation reveals REST as a mediator of Polycomb targeting in neuronal progenitors."can be reproduced using our web interface. Second, we demonstrate Epi-MARA's application on another epigenome data-set, in which H3K27ac marks were measured across adipocyte differentiation. All data-sets analyzed here derive from mouse.

The Epi-MARA web interface can be used in two ways:

  1. Automated mode (tab ChIP-seq): Allows you to upload ChIP-seq data (bed files) of a particular chromatin mark across different samples which will be used to run Epi-MARA in a totally automated fashion. In this mode the ChIP-seq data will be used to estimate levels of the chromatin mark across all promoter regions genome-wide [1] across the input samples and Epi-MARA will model this chromatin dynamics in terms of predicted binding sites for a large set of mammalian regulatory motifs [2].
  2. Expert mode (tab Expert): In expert mode Epi-MARA can be run on the chromatin dynamics across an arbitrary set of regions using an arbitrary set of predicted transcription factor binding sites. The user can provide both a file with chromatin levels across regions and samples, as well as a file of transcription factor binding sites across the regions. For a detailed description of the file formats, see section " Epi-MARA Expert mode: File format" at the bottom of this page.

A detailed description of how the Epi-MARA results are presented on the web page can be found here.

Epi-MARA analysis on promoters for H3K27me3 in ES, NP, and TN

The initial analysis of H3K27me3 dynamics in our neurogenesis system was performed using ChIP-chip measurements. H3K27me3 levels were measured at mouse promoter regions in embryonic stem cells (ES), neuronal progenitors (NP), and terminal neurons (TN). We processed the raw ChIP-chip data to obtain a table of H3K27me3 levels (log intensities) at promoter regions across the 3 differentiation stages which is contained in the first file below. The table of predicted transcription factor binding sites (number of sites for each region and each motif) for these regions is provided in the second file. These two files can be directly uploaded to the Epi-MARA web interface using the expert mode to obtain the analysis on the H3K27me3 data at promoters.

Epi-MARA's results on this data-set can be found here

Genome-wide Epi-MARA analysis on H3K27me3-enriched regions in ES, NP, and TN

This section describes the Epi-MARA analysis done on genome-wide regions that are enriched for H3K27me3 (combining data from the ES, NP, and TN stages). Regions were separated into high-CpG and low-CpG classes and we allowed a given motif to have different activities at high-CpG and low-CpG regions. That is, we consider both a high-CpG and low-CpG version of each motif so that, in the table of predicted binding site counts, high-CpG regions have nonzero counts only for high-CpG motifs, and low-CpG regions have nonzero counts only for low-CpG motifs. The two files below contain the H3K27me3 levels in the H3K27me3-enriched regions across the three stages and the corresponding predicted binding sites for all TFs in these regions.

These two files can be directly uploaded to Epi-MARA using the expert mode to obtain the analysis on the genome-wide H3K27me3 data.

The results of Epi-MARA's analysis for this data can be found here. Each motif appears twice, once for its occurrences in high-CpG H3K27me3-enriched regions (motifname__high) and once for its occurrence in low-CpG H3K27me3-enriched regions (motifname__low).

Epi-MARA analysis on promoters for H3K27ac in proliferating cells (day −2), confluent (day 0) preadipocytes, immature adipocytes (day 2), mature adipocytes (day 7)

We next demonstrate Epi-MARA's analysis of another epigenome data-set. In particular, we analyzed data of genome-wide H3K27ac marks across adipocyte diferentiation from [3]. The provided files with aligned reads. Using a simple perl script, we converted them into bed files (click here to download). These bed files can be directly uploaded to our Epi-MARA webpage and run in automated mode to obtain analysis of the H3K27ac dynamics at promoters. The results of Epi-MARA's analysis on this data-set can be found here.

Genome-wide Epi-MARA analysis on H3K27ac-enriched regions in cells (day −2), confluent (day 0) preadipocytes, immature adipocytes (day 2), mature adipocytes (day 7)

As the H3K27ac mark is generally associated with distal enhancers, it is probably more relevant for this data-set to analyze the dynamics of H3K27ac levels across regions genome-wide. For the genome-wide H3K27ac analysis, we took all H3K27ac-enriched regions reported by [3]. For these regions, we predicted transcription factor binding sites and calculated the corresponding H3K27ac levels across the stages. We again summarized the H3K27ac levels for all regions and all stages, and the transcription factor binding site predictions for all regions and all motifs in two tables which are provided below. These files, can be directly uploaded to the Epi-MARA's webpage (expert mode).

The results of Epi-MARA's analysis of this data-set can be found here.

Epi-MARA Expert mode: File format

To use the "Expert mode" two files have to be provided: a table containing the epigenomic signal across samples and a table containing the predicted transcription factor binding sites. The two tables must have the following format:

Signal table format:

  • The first line contains a tab-separated list of names of the samples.
  • Each following line corresponds to a genomic region. The first column on the line corresponds to the name/ID of the region.
  • All other (tab-separated) columns correspond to the epigenomic signal levels (typically log-intensities) across the samples.

Sitecount table format

  • The first line contains a tab-separated list of motif names.
  • Each following line corresponds to a genomic region. The first column on the line corresponds to the name/ID of the region.
  • All other (tab-separated) columns correspond to the total number of predicted binding sites for each of the motifs.

Note that the number of regions and their names/IDs must match between the two uploaded files and the order of names/IDs should be the same in the both files.

See the section “Initial Epi-MARA analysis on promoters for H3K27me3 in ES, NP, and TN” for example files.

Please provide only tab-separated, ASCII encoded files. The files might be compressed with gzip or bzip2.

References:

1: Methods for analyzing deep sequencing expression data: constructing the human and mouse promoterome with deepCAGE dataBalwierz PJ, Carninci P, Daub CO, Kawai J, Hayashizaki Y, Van Belle W, Beisel C, van Nimwegen EJGenome Biol. 2009 Jul 22;10(7):R79

2: The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell lineFANTOM Consortium, Riken Omics Science CenterNat Genet. 2009 May;41(5):553-62

3: Comparative Study, Journal Article, Research Support, N.I.H., Extramural, Research Support, Non-U.S. Gov't ]Mikkelsen, TS, Xu, Z, Zhang, X, Wang, L, Gimble, JM, Lander, ES, Rosen, ED (2010). Comparative epigenomic analysis of murine and human adipogenesis