I am trying to follow the tutorial found here . The tutorial uses a subset of the sample data (chromosome 1) to make the demo run faster, but I am trying to shift my bam file for the whole mouse genome (so I changed "seqlev" from the tutorial code to "mmstdchr" when I applied it to my own). Here is my code:
library(ATACseqQC)
library(BSgenome.Mmusculus.UCSC.mm10)
mouse <- BSgenome.Mmusculus.UCSC.mm10 # doing this subsets the mouse genome assembly to only include "standard" sequences and not alternates
mmstdchr <- standardChromosomes(mouse)
tags <- c("AS", "XN", "XM", "XO", "XG", "NM", "MD", "YS", "YT")
outPath <- "splited"
dir.create(outPath)
bamfile <- "D:/All open chromatin bam files for counting/Sorted_3-AL1_filtered.bam"
bamfile.labels <- gsub(".bam", "", basename(bamfile))
which <- as(seqinfo(Mmusculus)
[mmstdchr]
, "GRanges")
gal <- readBamFile(bamfile, tag=tags, which=which, asMates=TRUE)
gal1 <- shiftGAlignmentsList(gal)
The issue is that when the shiftGAlignmentsList(gal) completes, gal1 is generated into a large GAlignments object 54 GB in size, whereas the bam file I'm shifting (Sorted_3-AL1_filtered.bam) is only 4.6 GB. This seems very wrong. And I can't even export gal1 because it's too large. Can anyone tell me what I'm doing wrong, and how to properly generate a shift bam file?
Hi,
Thank you for trying ATACseqQC. Could you let me know the version of ATACseqQC you are using? And did you try to set
bigFile=TRUE
when you callreadBamFile
?Jianhong.
Thanks for your response. The version of ATACseqQC I'm using is 1.26.0. I tried to set bigFile=TRUE. The resulting object gal1 still ends up the same huge size. Here is what happens when I try to export this file:
I'm not sure I fully understand, but I think R is running out of memory at this stage?
try:
gal1 <- shiftGAlignmentsList(gal, outbam='3-AL1_Tn5_shift.bam')
. I will try to figure out why the memory is so high.