Entering edit mode
Fong Chun Chan
▴
320
@fong-chun-chan-5706
Last seen 10.6 years ago
Hi,
Was wondering if anyone had any experience working with the
GenomicAlignments R package when trying to retrieve per-exon count
data?
Specifically, whether these is rationale for using
readGAlignmentsPairs()
over readGAlignments() when using the summarizeOverlaps() function?
From my
understanding, the readGAlignmentsPairs() function will traverse the
input
bam file and return each read pair as one GAlignment. Looking at the
summarizeOverlaps() function document, it states that:
"paired-end reads : Paired-end reads are counted the same as single-
end
reads with a junction (N operation in the CIGAR); each pair registers
as a
single hit. Paired-end records can be counted in a GAlignmentPairs
container or BAM file."
If the paired-end reads are counted as single-end reads, does this not
suggest that using the readGAlignments() and readGAlignmentsPairs()
returns
you somewhat similar results?
Another question I have is if a single-read is actually split between
two
exons and the exons features are stored as a GRanges (and NOT a
GRangesList) object then the default summarizeOverlaps() function
would
technically discard this read if I understand correctly? Since the
default
mode is Union and the read overlaps with both features. In fact,
wouldn't
all the modes actually discard the read? If so, is it correct to use
inter.feature = FALSE, to include split-reads into the equationb?
Thanks for any advice on this. sessionInfo() below:
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
LC_PAPER=en_US.UTF-8
LC_NAME=C LC_ADDRESS=C
LC_TELEPHONE=C
LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods
base
other attached packages:
[1] argparse_1.0.1 proto_0.3-10
GenomicAlignments_1.0.1 BSgenome_1.32.0 Rsamtools_1.16.1
Biostrings_2.32.0 XVector_0.4.0
GenomicFeatures_1.16.2
AnnotationDbi_1.26.0 Biobase_2.24.0 GenomicRanges_1.16.3
GenomeInfoDb_1.0.2 IRanges_1.22.9 BiocGenerics_0.10.0
vimcom.plus_0.9-93
[16] setwidth_1.0-3 colorout_1.0-2
loaded via a namespace (and not attached):
[1] BatchJobs_1.2 BBmisc_1.7 BiocParallel_0.6.1
biomaRt_2.20.0 bitops_1.0-6 brew_1.0-6 checkmate_1.0
codetools_0.2-8 DBI_0.2-7 digest_0.6.4 fail_1.2
findpython_1.0.1 foreach_1.4.2 getopt_1.20.0
iterators_1.0.7
plyr_1.8.1 Rcpp_0.11.2 RCurl_1.95-4.1
[19] rjson_0.2.14 RSQLite_0.11.4 rtracklayer_1.24.2
sendmailR_1.1-2 stats4_3.1.0 stringr_0.6.2 tools_3.1.0
XML_3.98-1.1 zlibbioc_1.10.0
[[alternative HTML version deleted]]