Question

R memory limit causing problems with genomicRanges operations...

0

Entering edit mode

chrisclarkson100 ▴ 50

@chrisclarkson100-11114

Last seen 2.8 years ago

United Kingdom

I have tried, using rtracklayer and Genomicranges to find the overlap between bed files that document different ChIP Seq data e.g. Nucleosome coverage, CTCF coverage etc.

I want to then have a csv file that documents the overlapping regions and their occupancy levels e.g.

"","seqnames","start","end","width","strand","ctcf","H3k27ac","H3k36me3","H3k4me1","H3k79me2"

"1","chr21",9419882,9419981,100,"+", 1.04, 3.32, 1, 0.16, 6.76

"2","chr21",9424072,9424171,100,"+", 1.84, 2, 4.64, 4, 4.52

"3","chr21",9426516,9426560,45,"+", 1.44, 2.92, 1, 1.16, 2.4

The script, based on a previous post (https://support.bioconductor.org/p/89510/#89521), goes as follows:

library(bigmemory)

library(rtracklayer)

library(GenomicRanges)

Nuc_hESC<-import('Nucleosome_coverage.bed') # file too big for processing

ctcfe<-import('CtcfStdSig.bed')

H3k27ace<-import('H3k27acStdSig.bed')

H3k36me3e<-import('H3k36me3StdSig.bed')

H3k4me1e<-import('H3k4me1StdSig.bed')

H3k79me2e<-import('H3k79me2StdSig.bed')

dj <- disjoin(c(Nuc_hESC, ctcfe, H3k27ace, H3k36me3e, H3k4me1e, H3k79me2e))

djhESC <- dj[ (dj %over% Nuc_hESC) & (dj %over% ctcfe) & (dj %over% H3k27ace) & (dj %over% H3k36me3e) &(dj %over% H3k4me1e) & (dj %over% H3k79me2e)]

mcols(djhESC) <- DataFrame(Nuc=NA, ctcf=NA, H3k27ac=NA, H3k36me3=NA, H3k4me1=NA, H3k79me2=NA)

annotate <- function(dj, gr, column) {

hits <- findOverlaps(dj, gr)

mcols(dj)[queryHits(hits), column] <- mcols(gr)[subjectHits(hits), "score"]

dj

}

dj_annotatedhESC<-annotate(djhESC, Nuc_hESC, "Nuc")

Error: cannot allocate memory of size 5.8 Gb #debilitating error

dj_annotatedhESC<-annotate(dj_annotatedhESC, ctcfe, "ctcf")

dj_annotatedhESC<-annotate(dj_annotatedhESC, H3k27ace, "H3k27ac")

dj_annotatedhESC<-annotate(dj_annotatedhESC, H3k36me3e, "H3k36me3")

dj_annotatedhESC<-annotate(dj_annotatedhESC, H3k4me1e, "H3k4me1")

dj_annotatedhESC<-annotate(dj_annotatedhESC, H3k79me2e, "H3k79me2")

write.csv(dj_annotatedhESC, "occup_embryo.csv")

The error highlighted in bold is due to the sheer size of the Nucleosome coverage vector. I tried to sort this problem out by using the command: "memory.limit(.....)" and requesting different amounts of memory but this does not work.

Although I came up with a temporary solution of reducing the size of the Nucleosome coverage file by taking the best ChIP-Seq reads- Is there a way that I can get around this (i.e. can I somehow request more memory for R, bearing in mind that I am using a linux cluster computer)?

R bigmemory • 1.5k views

ADD COMMENT • link updated 8.2 years ago by theobroma22 ▴ 10 • written 8.2 years ago by chrisclarkson100 ▴ 50

score 0 · Answer 1 · 2017-01-07

R uses virtual memory for any machine, PC or Linux: https://stat.ethz.ch/R-manual/R-devel/library/base/html/Memory-limits.html

There is a way to upgrade, but it seems unfortunately you are already maxed out at 5GB. You may want to restart your session or remove all objects that are not related to your current task as you proceed through your workflow in order to retain as much memory as possible.