Enter the body of text here
Users of DegNorm
bioconductor package encountered an error when loading in very large bam file to compute the read coverage score on transcript. The errors were shown as in the post title. When the bam file size is relatively smaller, such error doesn't happen. I noticed a similar discussion on this was IRanges::findOverlapPairs cannot take inputs if they are too large. But I couldn't figure out an easy fix in my problem. I would greatly appreciate help.
We have RNA-seq data from bam file, single-end or paired-end. DegNorm tries to calculate the read coverage on the total transcript-- concatenation of all exons from a gene. DegNorm follows the HTSEQ to define a valid read to be one that completely sit "within" the exon regions. In DegNorm package, defined a function to calculate the read coverage curve as follows:
.paired_end_cov_by_ch=function(bam,all_genes,ch){
##function to calculate the coverage score and return in Rle objects
bf = BamFile(bam, asMates = TRUE, qnameSuffixStart = ".")
tx_compat_cvg=RleList()
genes=all_genes[unlist(runValue(seqnames(all_genes)))==ch]
gr = as(seqinfo(bf), "GRanges")
if(ch %in% runValue(seqnames(gr))){
param = ScanBamParam(which = gr[ch],flag=scanBamFlag(isPaired=TRUE,
isSecondaryAlignment=FALSE))
gal2=readGAlignmentPairs(bf, param = param)
gal2=gal2[strand(gal2)!="*"]
gal2=gal2[is.na(seqnames(gal2))==FALSE]
grl <- as(gal2, "GRangesList")
gal3=reduce(grl)
#only keep genes on the current chromosome
results=.IntersectionStrict2(genes,gal2)
tx2reads <- setNames(as(t(results), "List"), names(genes))
## calculate read pair
read_counts=data.frame(count=lengths(tx2reads))
rownames(read_counts)=names(tx2reads)
#relist gal3 based on compatibility
temp0=data.table(group=as(gal3,"data.frame")$group,
index=seq_len(sum(lengths(gal3))))
#group2indexMap=unique(setDT(temp0))[, .(values = index), by =group ]
idmatch=function(tx2reads,group2indexMap){
return(group2indexMap[group2indexMap$group %in% tx2reads]$index)
}
#tx2reads1=lapply(tx2reads,idmatch,group2indexMap=group2indexMap)
tx2reads1=lapply(tx2reads,idmatch,group2indexMap=temp0)
gal4=unlist(gal3)
rm(gal3)
compat_reads_by_tx <- extractList(gal4, tx2reads1)
tx_compat_cvg <- pcoverageByTranscript(compat_reads_by_tx,
genes,
ignore.strand=TRUE)
}else{
tx2reads1=IntegerList(vector("list", length(genes)))
left_gal=GAlignments() #empty alignment file
names(tx2reads1)=names(genes)
read_counts=data.frame(count=lengths(tx2reads1))
rownames(read_counts)=names(tx2reads1)
compat_reads_by_tx <- extractList(left_gal, tx2reads1)
tx_compat_cvg <- pcoverageByTranscript(compat_reads_by_tx,
genes,
ignore.strand=FALSE)
}
return(list(tx_compat_cvg,read_counts))
}
I was able to narrow down that the error was from the pcoverageByTranscript
function.
tx_compat_cvg <- pcoverageByTranscript(compat_reads_by_tx,
genes,
ignore.strand=FALSE)
Thanks for help.
Any help on why
pcoverageByTranscript
generate such errors? thanks.