I am trying to use pcoverageByTranscript function to calculate coverage score on each genes. I have a paired-end RNA-seq data. I read in with readGAlignments function as gal1 and readGAlignmentPairs function as gal2 separately.
> which <- GRanges(seqnames = c("chr1"))
> gr = as(seqinfo(bf), "GRanges")
> param = ScanBamParam(which = gr["chr1"])
> gal1=readGAlignments("SRR873822.bam",param=param)
> bf = BamFile("SRR873822.bam", asMates = TRUE, qnameSuffixStart = ".")
> gal2=readGAlignmentPairs(bf, param = param)
I have parsed transcript file dm3_transcripts from .gtf file. Following the sample codes from the tutorial of coverageByTranscript page, I got the results I want for gal1.
However for gal2, the extractList function fails to relist the alignmentPairs object according to the groups (i.e. under each gene name no compatible reads paired were listed as should be). For the paired-end data, how should I relist the compatible reads for coverage calculation? Thanks for help!!
> compat_hits <- findCompatibleOverlaps(gal1, dm3_transcripts)
> compat_hits
Hits object with 4263975 hits and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
[1] 1 455
[2] 2 455
[3] 3 455
[4] 4 455
[5] 5 455
... ... ...
[4263974] 6020051 1755
[4263975] 6020052 1755
-------
queryLength: 6020257 / subjectLength: 2570
> tx2reads <- setNames(as(t(compat_hits), "List"), names(dm3_transcripts))
> compat_reads_by_tx <- extractList(gal1, tx2reads)
> compat_reads_by_tx
GAlignmentsList object of length 2570:
$A3GALT2
GAlignments object with 0 alignments and 0 metadata columns:
seqnames strand cigar qwidth start end width njunc
$AADACL3
GAlignments object with 0 alignments and 0 metadata columns:
seqnames strand cigar qwidth start end width njunc
...
<2567 more elements>
-------
seqinfo: 25 sequences from an unspecified genome
>
> ## find compatible overlap between reads and genes for gal2
>
> compat_hits <- findCompatibleOverlaps(gal2, dm3_transcripts)
> compat_hits
Hits object with 1815917 hits and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
[1] 1 455
[2] 2 455
[3] 3 455
[4] 4 455
[5] 5 455
... ... ...
[1815916] 2787062 1755
[1815917] 2787064 1755
-------
queryLength: 2787118 / subjectLength: 2570
> tx2reads <- setNames(as(t(compat_hits), "List"), names(dm3_transcripts))
> compat_reads_by_tx <- extractList(gal2, tx2reads)
> compat_reads_by_tx
List of length 2570
names(2570): A3GALT2 AADACL3 AADACL4 ABCA4 ABCB10 ABCD3 ... ZSCAN20 ZSWIM5 ZYG11A ZYG11B ZZZ3