I am trying to get the read counts over a number of annotated and newly created features from S, cerevisiae so that I can detect differentially expressed regions.
The features are in a dozen or so txdbs GRanges I created and I have tens of bam files to deal with on a 8Gb RAM laptop. To accomplish this, I did the following:
- I created a BamFileList with all my bam files that are accessed in e^5 chunks
fls <- BamFileList(dir(pattern = ".bam$"), yieldSize=100000)
- I created a GRangesList containing all my GRanges
All<-GRangesList(c(gr1, gr2....., gr12)
- Passed them to summarizeOverlaps()
se=summarizeOverlaps(All, fls, mode="IntersectionNotEmpty", singleEnd=FALSE, ignore.strand = FALSE) saveRDS(se, file = "Allcounts.rds")
To my surprise, I got a RangeSummerizedExperiment Object that contains only one row with an aggregated number of counts for each bam file, which doesn't seem very useful...
sample1 sample2 ......sample85 counts counts counts
I expected a table-like format with counts/feature/GRanges
The question is whether I can de-convolute somehow the counts /feature/ GRanges.
A test in which I used a GRanges object containing 2 GRanges - GR=c(gr1, gr2) and a couple of bam files in a BamFileList seems to do the trick. The assay(se) object shows the expected counts/feature for each bam file, so expanding this approach to a BamFileList that contains all my files would be a solution to my problem. I am testing it right now...
> assay(se) file1.bam file2.bam [1,] 671 256 [2,] 250 178 [3,] 64 39
Thanks in advance for any suggestions/positive comments/tips.
Can you point to where you see the example
GRangesList(c(gr1, gr2))
? It doesn't look correct.http://www.bioconductor.org/packages//2.11/bioc/vignettes/GenomicRanges/inst/doc/summarizeOverlaps.pdf
See page 4
Thanks; note the '2.11' in the URL and the 'compiled' date on the document; this is from 2013 so very out of date. Current vignettes are available from your own session with
Or from the web site by browsing to, e.g., https://bioconductor.org/packages/SummarizedExperiment and looking for the 'Documentation' section.
Thanks for pointing out the new resources.
I have tried GRangesList(gr1, ..gr12) and the result is a (12x nr of samples) object. So each component of the list yields an aggregated count. Not what I want. I want to have the ranges and the feature info preserved and it doesn't seem that GRangesList option allows for that as far as I tried.
So far, the only option I found that does what I want is
new_gr<- c(gr1, gr2)
summarizeOverlaps(new_gr, fls)
I tried it with two so far and it took a while...