How to get GenomicAlignments to read all reads?
0
0
Entering edit mode
@mikkoarvas-12227
Last seen 7.6 years ago

Hi,

when looking at a bam with a genome viewer I see reads of:

-Mapping quality  over 60
-Insert size 170 kbp
-Cigar 100M
-Paired
-Primary alingment TRUE

When trying to read them to R:

> sbf <- scanBamFlag(isPaired = NA, isProperPair = NA, isUnmappedQuery = NA,
+             hasUnmappedMate = NA, isMinusStrand = NA, isMateMinusStrand = NA,
+             isFirstMateRead = NA, isSecondMateRead = NA, isSecondaryAlignment  = NA, isNotPassingQualityControls = NA,
+             isDuplicate = NA)
> sbf2 <- ScanBamParam(flag = sbf, mapqFilter=0)
> bam  = readGAlignmentPairs(file="file.bam",param = sbf2)
> bam
GAlignmentPairs object with 0 pairs, strandMode=1, and 0 metadata columns:
   seqnames strand :    ranges --    ranges
      <Rle>  <Rle> : <IRanges> -- <IRanges>
  -------
  seqinfo: 84 sequences from an unspecified genome

I see no reads. What am I missing in here?

With scanBamFlag above I try set up things so that everything would be read in. Is there some further parameter I should look into?

Best,

Mikko

 

 

R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=fi_FI.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=fi_FI.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=fi_FI.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=fi_FI.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] GenomicAlignments_1.10.0   Rsamtools_1.26.1           Biostrings_2.42.1          XVector_0.14.0             SummarizedExperiment_1.4.0
 [6] Biobase_2.34.0             GenomicRanges_1.26.2       GenomeInfoDb_1.10.2        IRanges_2.8.1              S4Vectors_0.12.1          
[11] BiocGenerics_0.20.0       

loaded via a namespace (and not attached):
[1] lattice_0.20-34    bitops_1.0-6       grid_3.3.1         zlibbioc_1.20.0    Matrix_1.2-8       BiocParallel_1.8.1 tools_3.3.1      

GenomicAlignments bam scanbamparam • 1.8k views
ADD COMMENT
0
Entering edit mode

It's difficult to know what the issue may be without seeing an example of your data.  Is it possible for you to make a small subset of your bam file available?

I would also try running the function using the default parameters, which are pretty inclusive and should get most reads, e.g.

readGAlignmentPairs('file.bam')

This should help identify if the problem is the function, the data, or the parameters.  Similarly you can try running readGAlignments() to make sure this isn't an issue with the pairing of your data.

readGAlignments('file.bam')
ADD REPLY
0
Entering edit mode

Thank you Mike!

> bam <- readGAlignmentPairs("test.bam")
> cvg <- coverage(bam)
> max(cvg$`6`)
[1] 1

Wrong.

> bam <- readGAlignments("test.bam")
> cvg <- coverage(bam)
> max(cvg$`6`)
[1] 50

OK!
This is what I see with Baseplayer https://www.cs.helsinki.fi/u/rkataine/BasePlayer/info.html Genome browser.

It looks like it is something in the pairing of the reads. Browser sees the pairs but readGAlignmentPairs() misses them.

If somebody is interested to have a look I can send a test file.

Best,

Mikko

ADD REPLY
0
Entering edit mode

readGAlignmentList() might be informative -- it can include unmated reads as well as reads that are ambiguously mated. Identify reads that you think should have been mated. Use ScanBamParam() to access the information used for mating and described in ?readGAlignmentsList to identify what criterion prevents them from being mated.

ADD REPLY
0
Entering edit mode

If the workflow Martin suggests doesn't shed any light I can look at the files for you. Make the files (or subsets) available to valerie.obenchain@roswellpark.org. 

Valerie

ADD REPLY

Login before adding your answer.

Traffic: 1064 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6