Hi,
I want to build a table of counts using 'summarizeOverlaps()' from a PE RNA-seq data set generated with a strand-specific library preparation protocol from NewEngland Biolabs called "NEBNext Ultra Directional RNA Library Prep Kit for Illumina". The web page from that protocol says that it "utilizes dUTP-based methodology". The help page from the 'GAlignmentPairs' class explains that when PE data is generated using a stranded protocol such as 'dUTP' one should build such an object using the non-default argument 'strandMode=2'. I want to use 'summarizeOverlaps()' over the set of BAM files, so doing something like:
bfl <- BamFileList(bamfiles, asMates=TRUE) se <- summarizeOverlaps(features, bfl, mode="Union", singleEnd=FALSE, ignore.strand=FALSE, fragments=TRUE)
I've tried to set 'strandMode=2' in the call to 'BamFileList()' and 'summarizeOverlaps()' and within 'ScanBamParam()' through the 'param' argument to 'summarizeOverlaps()', but the argument 'strandMode' does not belong to any those methods. I've googled about it without luck, so my question is, how can I ensure that 'strandMode=2' is being used under the hood through this approach to ensure I'm informing the counting algorithm about the correct library preparation protocol?
thanks!
robert.
Hi Jim, I had already tried setting that argument through summarizeOverlaps(), BamFileList() and ScanBamParam() and I get an error. Here's an example reproducing the error:
There should be a way I guess but it's not obvious, at least to me. Session information below.
robert.
I think I set you wrong. You want both
pairedEnd
ANDfragments
to beFALSE
, otherwise you getreadGAlignmentsList
, which doesn't have astrandMode
argument. I still wouldn't expect an error, however, as that argument is being passed down via the ellipsis argument, and my understanding is that a function with an ellipsis argument will 'accept' any ellipsis arguments that make sense and ignore those that don't. Like this:So it's weird to me that somewhere or another the
strandMode
argument is getting picked up and complained about. But regardless, if you still get errors you can just make a GAlignmentPairs object usingreadGAlignmentPairs(bfl, strandMode =2)
and feed that tosummarizeOverlaps
directly.I guess you refer to 'singleEnd' when you say 'pairedEnd', but anyway, now I understand that if I want to set 'strandMode' I need to set 'fragments=FALSE' to force the call to 'readGAlignmentPairs()'. Unfortunately, that does not work either:
I've looked at the source code and i think the problem is that the ellipsis argument passes down ultimately to the function given in the 'preprocess.reads' argument of 'summarizeOverlaps()' and not to 'readGAlignmentPairs()'.
As you suggest, I can use 'readGAlignmentPairs()' directly but because I have multiple BAM files, each of them with several Gb of size, I'd have to end up doing quite some programming to stream over them and put the results together. Before I undertake that task, is there any trick to use 'summarizeOverlaps()' with 'strandMode=1'? Shouldn't in fact this functionality be provided through the 'summarizeOverlaps()' interface?
Ah, you're right. In
.countWithYieldSize
, the specified function is called usingwith no ellipsis argument, so
strandMode
is ignored. But do note that all.countWithYieldSize
is doing (that you will have to do yourself) is callingbplapply
, using the chosen function. So the 'by hand' version isWhich doesn't require any programming at all.
Thanks for the directions and your sample code. Unfortunately, I can't use it because it assumes that the whole thing fits into main memory. I'll paste my solution for the record, just in case becomes useful for somebody else interested in using the non-default 'strandMode=2':
I would call this programming :-D
GenomicFiles::reduceByYield()
might be a wrapper around some of this. Is there a bug in SummarizedExperiment, along the lines Jim indicated of not passing strandMode correctly? If so, perhaps an issue and / or pull request on https://github.com/Bioconductor/SummarizedExperiment would be appropriate.I sent a patch to Val. After adding a couple of ellipsis args, I get