Entering edit mode
Hello everyone,
I am having trouble counting RNA seq reads with GenomicRanges and
GenomicFeatures. Essentially the problem is I have many reads (10
million
mapped) and I only can count 200K in features and 300K between
features.
So obviously I am doing something wrong. I am working with Arabidopsis
thaliana, so I mapped to the TAIR10.17 release with tophat2. The gff
file I
use is TAIR10.17.gtf, I also tried using TAIR10.16.gtf. I also made a
GRangesList with the 10.17 with no effect.
This is what I have done while using 1 library
#outside of R stuff
tophat2 (no -G ) blah blah
samtools sort accepted_hits.bam foo
samtools index foo
#In R
#load bams
bf <- "myBam.bam"
bi <- "myBam.bai"
bfl <- BamFileList(bf, bi)
#set up TxDb
gff <- makeTranscriptDbFromGFF(gff)
tx <- transcriptsBy(gff)
#summarize
olap <- summarizeOverlaps(tx, bfl)
#how many reads mapped?
colSums(assays(olap)$counts)
#only 222,835! are the reads inbetween features?
abf <- BamFile("path to a single bam file")
reads <- readGappedAlignments(bf)
tx2 <- transcripts(gff)
strand(tx2) <- "*"
txs <- reduce(tx2)
nontx <- gaps(txs)
o <- summarizeOverlaps(nontx, reads, ignore.strand = TRUE)
sum(o)
#365401
So what happened? If the reads are not in the features, and not
inbetween
the features, where did they go?
Anything helps, and thanks in advance
Sam
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] AnnotationDbi_1.24.0 Biobase_2.22.0 BiocGenerics_0.8.0
[4] biomaRt_2.18.0 Biostrings_2.30.1 bitops_1.0-6
[7] BSgenome_1.30.0 DBI_0.2-7
GenomicFeatures_1.14.2
[10] GenomicRanges_1.14.4 IRanges_1.20.6 parallel_3.0.2
[13] RCurl_1.95-4.1 Rsamtools_1.14.2 RSQLite_0.11.4
[16] rtracklayer_1.22.0 stats4_3.0.2 tools_3.0.2
[19] XML_3.98-1.1 XVector_0.2.0 zlibbioc_1.8.0
--
Sam McInturf
[[alternative HTML version deleted]]