I am trying to get basepair and read counts per position for my aligned reads. I have a BAM file with as many as 7500 amplicon reads aligned to a reference sequence. I am using Rsamtools and the pileup function to do this. However, I am running into a problem that it doesn't seem to count more than 290 reads. I have tried to reset the max_depth and even the yieldSize to be larger, but nothing seems to change with my output. Here's some example script that I have been using.
infile <- c("C:/R_Files/Data.bam")
p_params<- PileupParam(max_depth=2e5, min_base_quality= 10, min_mapq= 1,
min_nucleotide_depth=1, min_minor_allele_depth= 0,
distinguish_strands= FALSE, distinguish_nucleotides= TRUE,
ignore_query_Ns= FALSE, include_deletions= TRUE, include_insertions=
FALSE)
bf <- BamFile(infile, yieldSize=2e5)
data <- pileup(bf, PileupParam=p_params)
head(data)
seqnames pos strand nucleotide count
1 base 1 + G 251
2 base 2 + G 251
3 base 2 + T 1
4 base 3 + G 1
5 base 3 + T 252
6 base 4 + T 254
I have even set the depth and yieldSize to a small number and still no change. What am I missing? length(scanBam(infile)[[1]][[1]]) does give me the correct 7500 read count so I know the data is in the file. Any help is appreciated.
Thank you very much. It worked. I'm still unsure what I was doing wrong but I'm happy its working. So my question then is, is it best to define the parameters values as an object first, then pass them to PileupParam and then put that into the pileup command. That is the slight nuance that I can see I was doing differently.
Also we are trying to apply this to a set of files. Does the pileup command need the associated BAI file? Our initial command was the following but this does not produce the right output. If it needs both, then we would need to use mapply here correct or does the applyPileup command do this. We want the analyse of each file done separately.
files <- list.files(path="C:/RFiles/BAMFiles", pattern="*.bam", full.names=TRUE, recursive=FALSE) maxdepth <- 50000 p = PileupParam(maxdepth = max_depth)
lapply(files, function(x) { bamfile <- c(x) bf <- BamFile(x) data <- pileup(bf, PileupParams=p) data$IDs <- paste(data$seqnames, ":", data$pos, data$strand, data$nucleotide, sep="") data2 <- data[,c(6,1,2,3,4,5)]
data2 <- cbind(data2, basename(bamfile))
output <- paste(x, ".csv", sep="") write.table(data2, output, sep=",", row.names=FALSE) })
Ah, actually, looking at your code carefully, the problem is that you call the argument as
PileupParam=p_params
but it should bepileupParam=p_params
with a lower-casepileupParam
. The function is written in such a way that mis-spelled arguments are silently ignored. I'll try to update that.Thank you for the correction.
Also we are trying to apply this to a set of files. Does the pileup command need the associated BAI file? Our initial command was the following but this does not produce the right output. If it needs both, then we would need to use mapply here correct or does the applyPileup command do this. We want to analyse each file separately.
Bam files will discover their indexes if they follow standard naming conventions. When you say 'does not produce the right output' you are not providing enough information on the nature of your problem for others to be helpful. In addition it's challenging for others to reproduce, since they do not have access to your bam files. Consider writing a 'reproducible example' that uses the bam files from help page examples, for instance...
My apologies for not providing a better explanation. I did actually get it working. I think I still had the typo of the pileupParam in the lapply function. Thank you, you were very helpful and I appreciate the fast response.