Hi Everyone
I have two questions related to Rsamtools :
1. I am trying to filter my bam file by scanning the genome in 1kb bins, and removing reads that match certain criteria. But I will have to bin the genome chromosome by chromosome and run the filtering per chromosome..
I created the following example using the example BAM file , although it's a bit buggy due to NA's.
Suppose I want to only keep reads with highest mapQ per bin.
library(Rsamtools) bamFile <- system.file("extdata", "ex1.bam", package="Rsamtools") sparam <- ScanBamParam(what = c("qname", "flag", "rname", "pos", "mapq"), flag = scanBamFlag(isFirstMateRead = TRUE, isUnmappedQuery = FALSE) ) bam <- scanBam(bamFile, param = sparam)[[1]] bam <- as.data.frame(bam) filterfunc <- function(bam) { # remove NA positions bam <- na.omit(bam) # split df by chromosome chroms <- as.character(unique(bam$rname)) bam2 <- split(bam, bam$rname) # for each chrom, filter highest mapq in bins filterHiQ <- function(bam){ chromEnd <- round(max(bam$pos), -2) end <- ifelse(chromEnd < max(bam$pos), chromEnd + 100, chromEnd) bins <- .bincode(bam$pos, seq(0,end, 100)) tokeep <- lapply(split(bam$mapq, bins), function(x) { keep <- x == max(x) return(keep) }) return(unlist(tokeep)) } # get filtered bam per chrom keep <- unlist(lapply(bam2, filterHiQ)) return(keep) }
This function works on bamFile that I read (returns logical vector), but when I try to create the FilterRule using this function, this returns errors due to NAs removed.
filterBam(bamFile, "/tmp/test.bam" , filter = FilterRules(list(filterfunc))) Error in seq.default(0, end, 100) : 'to' cannot be NA, NaN or infinite
I got this problem while trying to create an example. However, my question was different.! I wanted to know if there is a way to apply a function (like the filterHiQ
i wrote above) under filterRules in such a way that It does it chromosome by chromosome, so I don't have to split the file internally?
2. My second question was related to the ScanBamParam
above. I wanted to actually include R1 reads + R2 reads in case R1 is unmapped. So something like isFirstMateRead = TRUE | hasUnmappedMate = TRUE
, but can't find how to do that..
Thanks for your help.
Vivek
P.S. (29th Aug 2016) I just edited the example for clarity. this works if I read the bam file as data frame and run the command. But doesn't work as filterRules