data2 <- cbind(data2, basename(bamfile))

Question

Rsamtools pileup function

1

Entering edit mode

mgray ▴ 10

@mgray-22913

Last seen 5.1 years ago

Madison

I am trying to get basepair and read counts per position for my aligned reads. I have a BAM file with as many as 7500 amplicon reads aligned to a reference sequence. I am using Rsamtools and the pileup function to do this. However, I am running into a problem that it doesn't seem to count more than 290 reads. I have tried to reset the max_depth and even the yieldSize to be larger, but nothing seems to change with my output. Here's some example script that I have been using.

infile <- c("C:/R_Files/Data.bam")
p_params<- PileupParam(max_depth=2e5, min_base_quality= 10, min_mapq= 1,
  min_nucleotide_depth=1, min_minor_allele_depth= 0,
  distinguish_strands= FALSE, distinguish_nucleotides= TRUE,
  ignore_query_Ns= FALSE, include_deletions= TRUE, include_insertions=
  FALSE)
bf <- BamFile(infile, yieldSize=2e5)
data <- pileup(bf, PileupParam=p_params)
head(data)
  seqnames pos strand nucleotide count
1  base   1      +          G   251
2  base   2      +          G   251
3  base   2      +          T     1
4  base   3      +          G     1
5  base   3      +          T   252
6  base   4      +          T   254

I have even set the depth and yieldSize to a small number and still no change. What am I missing? length(scanBam(infile)[[1]][[1]]) does give me the correct 7500 read count so I know the data is in the file. Any help is appreciated.

Rsamtools pileup number of reads • 2.9k views

ADD COMMENT • link updated 5.1 years ago by Martin Morgan 25k • written 5.1 years ago by mgray ▴ 10

James W. MacDonald · Answer 1 · 2020-02-13

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 10 weeks ago

United States

Hmm, generated as sample bam file with 500 (identical) reads, and max_depth seems to be obeyed


max_depth <- 500

sam <- c(
    list(
        c("@HD", "VN:1.3", "SO:coordinate"),
        c("@SQ", "SN:chr4", "LN:2000")
    ),
    rep(list(
        c("r1",   "81", "chr4", "155", "99", "5M", "=", "200", "0", "TATAT", "*")
    ), max_depth)
)

toBam <- function(sam) {
    tmp <- tempfile(fileext=".sam")
    writeLines(sapply(sam, paste, collapse="\t"), tmp)
    asBam(tmp)
}

bam <- toBam(sam)

and then

> pileup(bam) # default depth 251 (hmm, not sure about the '1'!
  seqnames pos strand nucleotide count
1     chr4 155      -          T   251
2     chr4 156      -          A   251
3     chr4 157      -          T   251
4     chr4 158      -          A   251
5     chr4 159      -          T   251
> p = PileupParam(max_depth = max_depth)
> pileup(bam, pileupParam = p) # depth 500
  seqnames pos strand nucleotide count
1     chr4 155      -          T   500
2     chr4 156      -          A   500
3     chr4 157      -          T   500
4     chr4 158      -          A   500
5     chr4 159      -          T   500

What happens for you? This is with

> sessionInfo()
R version 3.6.1 Patched (2019-12-01 r77489)
Platform: x86_64-apple-darwin17.7.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS:   /Users/ma38727/bin/R-3-6-branch/lib/libRblas.dylib
LAPACK: /Users/ma38727/bin/R-3-6-branch/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] Rsamtools_2.3.3      Biostrings_2.54.0    XVector_0.26.0
[4] GenomicRanges_1.38.0 GenomeInfoDb_1.22.0  IRanges_2.20.2
[7] S4Vectors_0.24.3     BiocGenerics_0.32.0

loaded via a namespace (and not attached):
[1] zlibbioc_1.32.0        compiler_3.6.1         GenomeInfoDbData_1.2.2
[4] RCurl_1.98-1.1         BiocParallel_1.20.1    bitops_1.0-6

what about your sessionInfo()?

ADD COMMENT • link 5.1 years ago Martin Morgan 25k

0

Entering edit mode

Thank you very much. It worked. I'm still unsure what I was doing wrong but I'm happy its working. So my question then is, is it best to define the parameters values as an object first, then pass them to PileupParam and then put that into the pileup command. That is the slight nuance that I can see I was doing differently.

Also we are trying to apply this to a set of files. Does the pileup command need the associated BAI file? Our initial command was the following but this does not produce the right output. If it needs both, then we would need to use mapply here correct or does the applyPileup command do this. We want the analyse of each file done separately.

files <- list.files(path="C:/RFiles/BAMFiles", pattern="*.bam", full.names=TRUE, recursive=FALSE) maxdepth <- 50000 p = PileupParam(maxdepth = max_depth)

lapply(files, function(x) { bamfile <- c(x) bf <- BamFile(x) data <- pileup(bf, PileupParams=p) data$IDs <- paste(data$seqnames, ":", data$pos, data$strand, data$nucleotide, sep="") data2 <- data[,c(6,1,2,3,4,5)]

data2 <- cbind(data2, basename(bamfile))

output <- paste(x, ".csv", sep="") write.table(data2, output, sep=",", row.names=FALSE) })

ADD REPLY • link 5.1 years ago mgray ▴ 10

0

Entering edit mode

Ah, actually, looking at your code carefully, the problem is that you call the argument as PileupParam=p_params but it should be pileupParam=p_params with a lower-case pileupParam. The function is written in such a way that mis-spelled arguments are silently ignored. I'll try to update that.

ADD REPLY • link 5.1 years ago Martin Morgan 25k

0

Entering edit mode

Thank you for the correction.

Also we are trying to apply this to a set of files. Does the pileup command need the associated BAI file? Our initial command was the following but this does not produce the right output. If it needs both, then we would need to use mapply here correct or does the applyPileup command do this. We want to analyse each file separately.

files <- list.files(path="C:/RFiles/BAMFiles", pattern="*.bam", full.names=TRUE, recursive=FALSE)
max_depth <- 50000 
p = PileupParam(max_depth = max_depth)

lapply(files, function(x) { 
bamfile <- c(x) 
bf <- BamFile(x) 
data <- pileup(bf, pileupParams=p) 
data$IDs <- paste(data$seqnames, ":", data$pos, data$strand, data$nucleotide, sep="") 
data2 <- data[,c(6,1,2,3,4,5)]
data2 <- cbind(data2, basename(bamfile))
output <- paste(x, ".csv", sep="") 
write.table(data2, output, sep=",", row.names=FALSE) })

ADD REPLY • link updated 5.1 years ago by James W. MacDonald 68k • written 5.1 years ago by mgray ▴ 10

0

Entering edit mode

Bam files will discover their indexes if they follow standard naming conventions. When you say 'does not produce the right output' you are not providing enough information on the nature of your problem for others to be helpful. In addition it's challenging for others to reproduce, since they do not have access to your bam files. Consider writing a 'reproducible example' that uses the bam files from help page examples, for instance...

ADD REPLY • link 5.1 years ago Martin Morgan 25k

0

Entering edit mode

My apologies for not providing a better explanation. I did actually get it working. I think I still had the typo of the pileupParam in the lapply function. Thank you, you were very helpful and I appreciate the fast response.

ADD REPLY • link 5.1 years ago mgray ▴ 10