Question

Why extractList fails to relist paired-end RNA-seq data when using pcoverageByTranscript?

0

Entering edit mode

Jiping Wang ▴ 90

@jiping-wang-6687

Last seen 12 weeks ago

United States

I am trying to use pcoverageByTranscript function to calculate coverage score on each genes. I have a paired-end RNA-seq data. I read in with readGAlignments function as gal1 and readGAlignmentPairs function as gal2 separately.

> which <- GRanges(seqnames = c("chr1"))
> gr = as(seqinfo(bf), "GRanges")
> param = ScanBamParam(which = gr["chr1"])
> gal1=readGAlignments("SRR873822.bam",param=param)
> bf = BamFile("SRR873822.bam", asMates = TRUE, qnameSuffixStart = ".")
> gal2=readGAlignmentPairs(bf, param = param)

I have parsed transcript file dm3_transcripts from .gtf file. Following the sample codes from the tutorial of coverageByTranscript page, I got the results I want for gal1.

However for gal2, the extractList function fails to relist the alignmentPairs object according to the groups (i.e. under each gene name no compatible reads paired were listed as should be). For the paired-end data, how should I relist the compatible reads for coverage calculation? Thanks for help!!

> compat_hits <- findCompatibleOverlaps(gal1, dm3_transcripts)
> compat_hits
Hits object with 4263975 hits and 0 metadata columns:
            queryHits subjectHits
            <integer>   <integer>
        [1]         1         455
        [2]         2         455
        [3]         3         455
        [4]         4         455
        [5]         5         455
        ...       ...         ...
  [4263974]   6020051        1755
  [4263975]   6020052        1755
  -------
  queryLength: 6020257 / subjectLength: 2570
> tx2reads <- setNames(as(t(compat_hits), "List"), names(dm3_transcripts))
> compat_reads_by_tx <- extractList(gal1, tx2reads)
> compat_reads_by_tx
GAlignmentsList object of length 2570:
$A3GALT2 
GAlignments object with 0 alignments and 0 metadata columns:
     seqnames strand cigar qwidth start end width njunc

$AADACL3 
GAlignments object with 0 alignments and 0 metadata columns:
     seqnames strand cigar qwidth start end width njunc

...
<2567 more elements>
-------
seqinfo: 25 sequences from an unspecified genome
> 
> ## find compatible overlap between reads and genes for gal2
> 
> compat_hits <- findCompatibleOverlaps(gal2, dm3_transcripts)
> compat_hits
Hits object with 1815917 hits and 0 metadata columns:
            queryHits subjectHits
            <integer>   <integer>
        [1]         1         455
        [2]         2         455
        [3]         3         455
        [4]         4         455
        [5]         5         455
        ...       ...         ...
  [1815916]   2787062        1755
  [1815917]   2787064        1755
  -------
  queryLength: 2787118 / subjectLength: 2570
> tx2reads <- setNames(as(t(compat_hits), "List"), names(dm3_transcripts))
> compat_reads_by_tx <- extractList(gal2, tx2reads)
> compat_reads_by_tx
List of length 2570
names(2570): A3GALT2 AADACL3 AADACL4 ABCA4 ABCB10 ABCD3 ... ZSCAN20 ZSWIM5 ZYG11A ZYG11B ZZZ3

GenomicAlignments pakcage extractList • 650 views

ADD COMMENT • link 5.8 years ago • updated 5.7 years ago Jiping Wang ▴ 90