Question

parameter "refSeqName" in function "getReadCountsFromBAM"

0

Entering edit mode

Mohammad Alkhamis • 0

@mohammad-alkhamis-10862

Last seen 7.9 years ago

Canada/ Victoria, BC / University of Vi…

Hello everyone,

I think that the argument "refSeqName" in the function "getReadCountsFromBAM" is a bit misleading. While looking at the source code in "getReadCountsFromBAM.R", I noticed the following code line:

for (i in 1:length(refSeqName))

It turned out that I am able to use the function with multiple reference sequence names like this, for example:

bamDataRanges <- getReadCountsFromBAM(BAMFiles,
                                      sampleNames=paste("Sample",1:2),
                                      refSeqName=c("1", "2"),
                                      WL=100,
                                      mode="paired", 
                                      parallel=12)

which will give an output in the console as following:

Identified the following reference sequences:  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y,MT,GL000207.1,GL000226.1 [...]
Using 1, 2 as reference.
Using indexed BAM files.

Though, the documentation (quoted below) suggests that the function is only capable of handing a single reference sequence name. This is also the case with the examples that come with the cn.mops package.

"refSeqName: Name (Names?) of the reference sequence (sequences?) that should be analyzed. The name (names?) must appear in the header of the BAM file. If it is not given the function will select the first reference sequence that appears in the header of the BAM files."

Thanks for accepting my comment.

cn.mops • 1.5k views

ADD COMMENT • link updated 7.8 years ago by Günter Klambauer ▴ 540 • written 8.2 years ago by Mohammad Alkhamis • 0

score 2 · Answer 1 · 2016-08-27

Hello Malkhamis, Thanks for bringing this to my attention. Indeed users are often struggling with that parameter. I will make changes to the documentation as suggested. Regards, Günter On 2016-08-26 21:31, malkhamis [bioc] wrote: > Activity on a post you are following on support.bioconductor.org > <https: support.bioconductor.org=""> > > User malkhamis <https: support.bioconductor.org="" u="" 10862=""/> wrote > Question: parameter "refSeqName" in function "getReadCountsFromBAM" > <https: support.bioconductor.org="" p="" 86536=""/>: > > Hello everyone, > > I think that the argument "|refSeqName|" in the function > "|getReadCountsFromBAM|" is a bit misleading. While looking at the > source code in "getReadCountsFromBAM.R", I noticed the following code > line: > > for (i in 1:length(refSeqName)) > > It turned out that I am able to use the function with multiple > reference sequence names like this, for example: > > bamDataRanges <- getReadCountsFromBAM(BAMFiles, > sampleNames=paste("Sample",1:2), > *refSeqName=c("1", "2"),* > WL=100, > mode="paired", > parallel=12) > > which will give an output in the console as following: > > /Identified the following reference sequences: > 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y,MT,GL000207.1,GL000226.1 > [...] Using 1, 2 as reference. Using indexed BAM files./ > > Though, the documentation (quoted below) suggests that the function is > only capable of handing a single reference sequence name. This is also > the case with the examples that come with the cn.mops package. > > "refSeqName: Name *(Names?)* of the reference sequence > *(sequences?)* that should be analyzed. The name *(names?)* must > appear in the header of the BAM file. If it is not given the function > will select the first reference sequence that appears in the header of > the BAM files." > > Thanks for accepting my comment. > > ------------------------------------------------------------------------ > > Post tags: cn.mops > > You may reply via email or visit > parameter "refSeqName" in function "getReadCountsFromBAM" > -- Günter Klambauer, PhD Institute of Bioinformatics Johannes Kepler University, Linz, Austria http://www.bioinf.jku.at/people/klambauer/

score 2 · Answer 2 · 2017-02-10

2

Entering edit mode

Günter Klambauer ▴ 540

@gunter-klambauer-5426

Last seen 3.8 years ago

Austria

Just to make it visible also here... Something like the following typically works for human genomes and runs the function for chr1-22:

X <- getReadCountsFromBAM(BAMFiles,refSeqName=as.character(1:22),WL=1000, mode="paired")

If your sequence names are "chr1", "chr2" rather than "1", "2", then:

X <- getReadCountsFromBAM(BAMFiles,refSeqName=paste0("chr",1:22),WL=1000, mode="paired")

should work.

ADD COMMENT • link 7.8 years ago Günter Klambauer ▴ 540

0

Entering edit mode

Thanks a lot Gunter, that helps tremendously ! especially when running cnMOPS on our computer cluster ;)

ADD REPLY • link 7.8 years ago Bogdan ▴ 670