Entering edit mode
Hi,
I mapped fastq to 3 different reference genome. Now I want to find out reads which were mapped to the first two genome, then filter them out in the 3rd bam file.
I will then process the 3rd bam file for obtaining UMI count.
Is there any simple way to achieve this?
Many thanks!
Thank you, actually, there are two species. The 3rd genome is the combination of them (1st: h. 2nd: m and 3rd: hm). I want to make sure that reads mapped to all the three genomes were removed in the 3rd bam file. Below is my code:
''' p1 <- ScanBamParam(what=c('qname','mapq'),tagFilter =list(xf=c(25)) ,tag=c('xf'))
in 10X's bam file, xf=25 means that read is uniquely mapped to a genome, and was used for counting UMI. So I only look at reads with xf=25 among the 3 bam files.
filter <- FilterRules(list(fun = function(hm_reads) {
anab_m <- scanBam(BAM_m, param=p1)
anab_h <- scanBam(BAM_h, param=p1)
df_m <- do.call(DataFrame, anab_m[[1]])
df_h <- do.call(DataFrame, anab_h[[1]])
Find qnames: which are mapped to the 3 genome at the same time
internames<-Reduce(intersect,list(hm_reads$qname,df_h$qname,df_m$qname))
pickl<-rep(TRUE,nrow(hm_reads))
names(pickl)<-hm_reads$qname
If there are multiple mapped reads, then remove it in hm bam file
if(length(internames)>0){ pickl[internames]<-FALSE }
return(pickl) }))
print('now, subset BAM files')
outbam<-paste(OUTPATH,Sample,'-',IND_SECTION,'.bam',sep='')
open(BAM_h)
open(BAM_m)
filterBam(BAM_hm, destination=outbam, filter = filter, param = p1)
close(BAM_h)
close(BAM_m) '''
May I ask if this is correct? Many thanks!