Abstract

In this vignette, we will be discovering more advanced functions and possibilities of this package. You may want to read the basics functions first in order to understand the principles of enrhciment analysis.

Loading a catalogue

You have a choice of either loading the ReMap catalogue, or loading your own catalogue of peaks/regions.

Download the ReMap catalogue

There is 4 different versions of the ReMap catalogue. The current version of ReMap 2018 [1] and the previous ReMap 2015 release [2]. Both are declined in hg19 or hg38 assemblies. This function will download the files locally.

# Create a local directory for the tutorial
demo.dir <- "~/ReMapEnrich_demo"
dir.create(demo.dir, showWarnings = FALSE, recursive = TRUE)

# Use the function DowloadRemapCatalog
remapCatalog2018hg38 <- downloadRemapCatalog(demo.dir)

# Or download other versions
remapCatalog2015hg19 <- downloadRemapCatalog(demo.dir, version = "2015", assembly = "hg19")

The genomic ranges object remapCatalog now contains the full ReMap 2015 catalogue[2]. A file has also been downloaded at “~/ReMapEnrich_demo/remap1_hg38_nrPeaks.bed” (0.544 GB when not compressed).

Load the ReMap catalogue

# Load the ReMap catalogue and convert it to Genomic Ranges
remapCatalog <- bedToGranges(remapCatalog2018hg38)

Loading a custom catalogue

Another funtionnality of this package is the ability to load a custom catalogue instead of ReMap. This is quite simple starting from a BED file as your catalogue. In this example, we load a custom catalogue present in extdata/, this is just a BED file for chr22 as dummy example.

# Load the ReMapEnrich library
library(ReMapEnrich) 

# Load the example catalogue as a BED file.
catalogFile <- system.file("extdata",
                                    "ReMap_nrPeaks_public_chr22.bed",
                                    package = "ReMapEnrich")
catalog <- bedToGranges(catalogFile)

Download ENCODE peaks

We have added a fucntionnality to direclty fetch ENCODE peaks. With ReMapEnrich it is possible to download genomic regions of any ENCODE experiments using an ENCODE ID for a bed file (eg. ENCFF001VCU). In this example we download a BED file for the H3K27 histone mark in MCF-7 cell line.

# Downloading the ENCFF001VCU regions.
ENCFF001VCU <- bedToGranges(downloadEncodePeaks("ENCFF001VCU", demo.dir))

The ENCFF001VCU variable now contains all the regions of the given experiment a GRange object. It is possible to use it in future enrichment analysis.

Compute enrichment

The basic way to compute an enrichment is to run with default parameters. - no universe - single core - Default shuffling - defautl overlaps

enrichment.df <- enrichment(ENCFF001VCU, remapCatalog)
head(enrichment.df)

Using a universe

Enrichment analysis often surestimate the p-values. Using a universe is setting constraints on the shuffling function, resulting in more reasonable probabilities. The universe is simply another set of genomic regions, in a GeomicRanges format, that will prevent shuffles to take place outside of it. Using a universe will reduce the analysis to certain portions of the genome. However the ReMapEnrich package can also work without a universe, in this case it will use the entire genome.

Choosing an universe

One of the most diffcult task in using a universe is choosing one that is relevant for you data. No methods are perfect, once you understand what is the purpose of using a universe, it is be up to you to specify which one to use in order to accomplish a specific analysis. But in general, the universe could be understood as the set of regions that could have been used as query.

Enrichment using a universe

# Download a universe.
universe <- bedToGranges(downloadEncodePeaks("ENCFF718QVA", demo.dir))
# Convert ReMap to GRanges
remapCatalog <- bedToGranges(remapCatalog2018hg38)
# Create the enrichment with the universe.
enrichment.df <- enrichment(ENCFF001VCU, remapCatalog, universe, nCores=2)

The data frame enrichment.df now contains the enrichment informations between ENCFF001VCU and the ReMap catalogue given ENCFF718QVA as a universe.

For more permissive universe you can use the parameter included as the fraction of shuffled regions that must be at least contained in universe regions (1 by default). For example included = 0.1 would allow at least 10% of shuffled regions to be within the universe.

# Create the enrichment with a less restrictive universe.
enrichment.df <- enrichment(ENCFF001VCU, remapCatalog, universe, included = 0.1, nCores=2)
# 90% of the shuffled regions can now be outside of the universe regions.

Shuffling regions occurs in the whole genome but it is possible to restrict shuffles to occur only in the chromosome they originate with the parameter byChrom (FALSE by default).

# Create the enrichment with a less restrictive universe.
enrichment.df <- enrichment(ENCFF001VCU, remapCatalog, universe, included = 0.1, byChrom = TRUE, nCores=2)
# 90% of the shuffled regions can now be outside of the universe regions.
# The shuffled regions are still in the same chromosome where they came from.

Shuffling and random generation

As the most basic enrichment function uses random generations in order to estimate the p-values, shuffling functions and random generations are available in this package. In this two functions, it is possible to use all the parameters already mentioned (universe, included, byChrom).

Shuffling

Shuffling regions consists in randomly reordering the positions of the query regions within a genome.

# Shuffling ENCFF001VCU
shuffledENCFF001VCU <- shuffle(ENCFF001VCU, universe = universe, byChrom = TRUE)

Random generations

It is sometimes useful to generate purely random regions, for negatives controls for example.

# Generate 100 random regions with a size of 1000 bases pair.
randomRegions <- genRegions(100, 1000)

Other assemblies

For now, only the hg38 assembly has been used by default. It is important to know how to make enrichment analysis with other genomes. All functions in the ReMapEnrich package that uses shuffles or random regions generation must ackowledge the sizes of the chromosomes of the species in consideration.

It is possible to load the chromosomes sizes of hg38 in one function.

hg38ChromSizes <- loadChromSizes("hg38")

But you may want to download other assemblies from the UCSC database.

# Example with rn5
rn5ChromSizes <- downloadUcscChromSizes("rn5")

# Creation of random regions in the rattus norvegicus genome.
randomRegions <- genRegions(100, 1000, rn5ChromSizes)

# Shuffling of regions in the rattus norvegicus genome.
shuffledRegions <- shuffle(randomRegions, rn5ChromSizes)

# Species relevant to current and future ReMap releases
hg38ChromSizes <- downloadUcscChromSizes("hg38")
hg19ChromSizes <- downloadUcscChromSizes("hg19")
mm10ChromSizes <- downloadUcscChromSizes("mm10")
dm6ChromSizes <- downloadUcscChromSizes("dm6")

# Maybe one day
ce11ChromSizes <- downloadUcscChromSizes("ce11")
rn5ChromSizes <- downloadUcscChromSizes("rn5")

References

1. Chèneby J, Gheorghe M, Artufel M, Mathelier A, Ballester B. ReMap 2018: an updated atlas of regulatory regions from an integrative analysis of DNA-binding ChIP-seq experiments. Nucleic acids research. 2018;46:D267–75.

2. Griffon A, Barbier Q, Dalino J, Helden J van, Spicuglia S, Ballester B. Integrative analysis of public ChIP-seq experiments reveals a complex multi-cell regulatory landscape. Nucleic acids research. 2015;43:e27–7.

ReMapEnrich: Advanced usage

Martin Mestdagh, Zacharie Ménétrier

2019-05-03