I encountered problems when dealing with paired-end read using readGAlignmentPairs function. I tested on the data the package comes with (i.e. extdata), no problem, this function returns the paired end data as expected. However when I applied to my own paired-end data (aligned from tophat), I got no records for some reason. I tested it using testPairedEndBam function, it returned TRUE. I also manually examined the SAM file, all reads appeared in pairs. Anyone has any hints on this?
> library(Rsamtools)
> library("GenomicAlignments")
> setwd("~/degnorm")
#read in paired-end bam file
> bamfile="SRR873822.bam"
> ##testing whether single or paired-end
> testPairedEndBam(bamfile)
[1] TRUE
>
> ##subsetting the bam file for test
> t <- scanBamHeader(bamfile)[[1]][["targets"]]
> which <- GRanges(names(t), IRanges(1, unname(t)))
>
> ##only look at chrI
> param <- ScanBamParam(which=which[1], what=character(), flag=scanBamFlag(isProperPair=TRUE,
+ isDuplicate=FALSE,
+ isSecondaryAlignment=FALSE))
> galp2 <- readGAlignmentPairs(bamfile, use.names=TRUE, param=param)
> galp2
GAlignmentPairs object with 0 pairs, strandMode=1, and 0 metadata columns:
seqnames strand : ranges -- ranges
<Rle> <Rle> : <IRanges> -- <IRanges>
-------
seqinfo: 25 sequences from an unspecified genome
I would start by getting the relevant GRanges with
It might also be instructive to look at the number of mapped reads on each seqname
I would then suggest working through your filtering commands, starting with no filters
and adding filters / etc until you identify the problem. It's not clear how the BAM file was created, so it's hard to provide specific advice. You could make a subset of it (e.g., using
filterBam()
and share that via dropbox or similar...Hi Martin,
Thanks so much for help. My data is paired-end RNA-seq data, aligned using tophat. I followed your recommendation, here is what got:
A subset of the bam file (only chr1) can be downloaded from here: https://www.dropbox.com/s/qgjh74lmucp13r6/SRR873822_chr1.bam?dl=0
A next step might be
and interpret the output to ensure that your alignment is actually paired end.
If so, you might try
readGAlignments(bf, param = param)
andreadGAlignmentsList(bf, param = param)
.I do not quite understand the output of quckBamFlagSummary. What does it tell about the paired-end or single-end?
It says that most reads are paired end, and most are 'primary' alignments. This all suggests that they should be input by
readGAlignmentPairs()
.It looks like in my code I should have suggested
Does it 'work' to
and then to summarize map quality
Thanks again! Still I cannot get the paired end counts.
Thanks for your patience; this shows that there are actually reads there, and that they have high map quality. They are simply not being paired. It could be that the flags are somehow excluding proper pairing, so
might be informative. Another possibility is that the read names have been mangled, e.g., by adding suffix .1 / .2; the following
will I think generate a short table where most of the entries should be '2'.
Thanks, Martin! This is very helpful. I think we are close to the answer why I couldn't get the paired-end count. Indeed the qname of the two mates have the extension .1 and .2. For example, the following are two paired ends. I thought this is the standard for naming the two mates and the R package should automatically detect it since this is alignment output from tophat, which is still widely used. Following your instruction, here is the output. What would you recommend to do next? thanks.