I apologize in advance for my ignorance--this is my first time using both STAR and featureCounts.
my STAR command is STAR --runThreadN 4 --genomeDir $REF --readFilesIn ${FILE1} ${FILE2} --sjdbGTFfile $GTF --outSAMtype BAM Unsorted --outFileNamePrefix ${OUT_STAR}
my featureCounts command is featureCounts -T 2 -p -t exon -g gene_id -a ${GTF} -o counts.txt ${FILE1}
I am using STAR 2.5.2b and featureCounts 1.5.1
I am using human samples and the hg19 gtf annotation file. I ran featureCounts after getting unreliable results from htseq-count. My samples are paired, and featureCounts was appealing after struggling with htseq-count because I read that featureCounts handles paired-end reads very well.
The output from featureCounts is as follows:
Assigned 17798778
Unassigned_Ambiguity 316936
Unassigned_MultiMapping 3099021
Unassigned_NoFeatures 13022732
Unassigned_Unmapped 0
Unassigned_MappingQuality 0
Unassigned_FragmentLength 0
Unassigned_Chimera 0
Unassigned_Secondary 0
Unassigned_Nonjunction 0
Unassigned_Duplicate 0
These values are from 34237467 total fragments, and corresponds to a success rate of 52.0%.
I am confused because my STAR output is as follows:
Number of input reads | 33385184
Average input read length | 175
UNIQUE READS:
Uniquely mapped reads number | 31138446
Uniquely mapped reads % | 93.27%
Average mapped length | 174.80
Number of splices: Total | 13176899
Number of splices: Annotated (sjdb) | 12931737
Number of splices: GT/AG | 12993469
Number of splices: GC/AG | 93376
Number of splices: AT/AC | 7197
Number of splices: Non-canonical | 82857
Mismatch rate per base, % | 0.25%
Deletion rate per base | 0.01%
Deletion average length | 1.62
Insertion rate per base | 0.01%
Insertion average length | 1.33
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 1399714
% of reads mapped to multiple loci | 4.19%
Number of reads mapped to too many loci | 6341
% of reads mapped to too many loci | 0.02%
UNMAPPED READS:
% of reads unmapped: too many mismatches | 0.00%
% of reads unmapped: too short | 2.50%
% of reads unmapped: other | 0.02%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%
Prior to running featureCounts I sorted my bam file by position.
I am confused primarily for 2 reasons. 1) why does featureCounts identify more input fragments (34237467) as were mapped by STAR (31138446). 2) Why is featureCounts returning such a high Unassigned_NoFeatures value?
Any help and advice would be invaluable and much appreciated.
Thanks,
Greg
tl;dr featureCounts assigning high number of no features and reading more input fragments than what was produced by STAR alignemnt.
Thank you for your thoughtful response. That definitely makes sense, I should've thought more closely about what the values meant before just comparing the numbers.