Hi,
I have RNAseq data with strand information and I'm using Rsubread
's featureCounts function to sum reads onto the GTF annotation.
This is the command I'm using:
featureCounts(files=my.bam,annot.ext=my.gtf,isGTFAnnotationFile=T,useMetaFeatures=T,isPairedEnd=T)
my.gtf
is gencode.v25.primary_assembly.annotation.gtf
BTW.
When I specify strandSpecific=1
I'm getting a weird behavior where, for an example gene, it's reporting many more reads mapping to an antisense gene encoded on the - strand than to an overlapping protein coding gene encoded on the + strand (260 vs. 12 reads). This is neither supported when I count the number of reads per strand for that genomic interval using samtools view
nor when I view this region with IGV
.
If I don't specify strandSpecific
I am getting nearly the reverse outcome: most of the reads are assigned to the protein coding gene (2 vs. 878).
Any idea what is this?
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
locale:
[1] C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] Rsubread_1.24.0 snpEnrichment_1.7.0 doParallel_1.0.10
[4] iterators_1.0.8 foreach_1.4.3
loaded via a namespace (and not attached):
[1] Rcpp_0.12.7 lattice_0.20-34 codetools_0.2-15
[4] assertthat_0.1 grid_3.3.1 plyr_1.8.4
[7] gtable_0.2.0 scales_0.4.1 ggplot2_2.2.0
[10] zlibbioc_1.20.0 lazyeval_0.2.0 Matrix_1.2-7.1
[13] snpStats_1.24.0 splines_3.3.1 tools_3.3.1
[16] munsell_0.4.3 survival_2.40-1 BiocGenerics_0.20.0
[19] colorspace_1.2-7 tibble_1.2