Entering edit mode
Guest User
★
13k
@guest-user-4897
Last seen 10.4 years ago
Hi,
I am trying to use featureCounts to perform strand-specifc RNA-Seq
read counting with an Ensembl GTF file. However, it seems to count
reads from each strand separately rather than for each gene from the
strand where the gene is located. I have created a small BAM that
contains reads from only two genes (one from each strand) to
illustrate the problem.
First, if I specify strandSpefic = 0 I get 97.7% of the reads aligned
to two genes:
a = featureCounts("merged.bam", annot.ext =
"Homo_sapiens.GRCh38.76.gtf",
strandSpecific = 0, isGTFAnnotationFile = TRUE, isPairedEnd = TRUE)
> a$stat
Status counts.merged.bam
1 Assigned 16179
2 Unassigned_Ambiguity 0
3 Unassigned_MultiMapping 56
4 Unassigned_NoFeatures 335
And the reads counts for the two genes are:
> a$counts[c("ENSG00000066336","ENSG00000136869"),]
ENSG00000066336 ENSG00000136869
11759 4419
However, when I try to specifiy strandSpecific = 1 or 2 I get reads
from only one strand counted in either case
b = featureCounts("counts/merged.bam", annot.ext =
"../../annotations/GRCh38/genes/Homo_sapiens.GRCh38.76.gtf",
strandSpecific = 1, isGTFAnnotationFile = TRUE, isPairedEnd = TRUE)
> b$stat
Status counts.merged.bam
1 Assigned 5295
2 Unassigned_Ambiguity 0
3 Unassigned_MultiMapping 56
4 Unassigned_NoFeatures 11219
> b$counts[c("ENSG00000066336","ENSG00000136869"),]
ENSG00000066336 ENSG00000136869
952 4343
ENSG00000136869 is located on the forward strand, gets high counts
with strandSpecific = 1.
#####
c = featureCounts("counts/merged.bam", annot.ext =
"../../annotations/GRCh38/genes/Homo_sapiens.GRCh38.76.gtf",
strandSpecific = 2, isGTFAnnotationFile = TRUE, isPairedEnd = TRUE)
> c$stat
Status counts.merged.bam
1 Assigned 10884
2 Unassigned_Ambiguity 0
3 Unassigned_MultiMapping 56
4 Unassigned_NoFeatures 5630
> c$counts[c("ENSG00000066336","ENSG00000136869"),]
ENSG00000066336 ENSG00000136869
10807 76
ENSG00000066336 is located on the reverse strand, gets high counts
with strandSpecific = 2.
Furthermore,
a$counts[c("ENSG00000066336","ENSG00000136869"),] =
c$counts[c("ENSG00000066336","ENSG00000136869"),] +
b$counts[c("ENSG00000066336","ENSG00000136869"),]
I downloaded the Ensembl 76 GTF file from here:
ftp://ftp.ensembl.org/pub/release-76/gtf/homo_sapiens
The sample BAM files is available from here:
https://dl.dropboxusercontent.com/u/6006832/merged.bam
Is this a bug of featureCounts or am I doing something obviously
wrong?
Kind regards,
Kaur Alasoo
PhD student at Sanger Institute
-- output of sessionInfo():
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Rsubread_1.12.6 BiocInstaller_1.12.1
loaded via a namespace (and not attached):
[1] tools_3.0.2
--
Sent via the guest posting facility at bioconductor.org.