Question

Hisat2 vs featureCounts summary statistics

1

Entering edit mode

Koen Van den Berge ▴ 180

@koen-van-den-berge-6369

Last seen 6 months ago

Ghent University, Belgium

I am processing paired-end RNA-Seq data. I map them using Hisat2.0.3.Next I convert SAM files to BAM, sort and index the BAM file using samtools 1.3. I subsequently use the sorted BAM files to aggregate mapped reads based on annotation using featureCounts (Rsubread package v1.20.6).

After mapping the reads using Hisat2, we get some summary statistics like this:

20358841 reads; of these:
20358841 (100.00%) were paired; of these:
5046482 (24.79%) aligned concordantly 0 times
12600084 (61.89%) aligned concordantly exactly 1 time
2712275 (13.32%) aligned concordantly >1 times
----
5046482 pairs aligned concordantly 0 times; of these:
57755 (1.14%) aligned discordantly 1 time
----
4988727 pairs aligned 0 times concordantly or discordantly; of these:
9977454 mates make up the pairs; of these:
8723119 (87.43%) aligned 0 times
914540 (9.17%) aligned exactly 1 time
339795 (3.41%) aligned >1 times
78.58% overall alignment rate

After sorting and indexing, featureCounts on this BAM file tells me

Process BAM file xxxx.bam... ||
|| Paired-end reads are included. ||
|| Assign fragments (read pairs) to features... ||
|| ||
|| WARNING: reads from the same pair were found not adjacent to each ||
|| other in the input (due to read sorting by location or ||
|| reporting of multi-mapping read pairs). ||
|| ||
|| Read re-ordering is performed. ||
|| ||
|| Total fragments : 21770038 ||
|| Successfully assigned fragments : 10375129 (47.7%)

Which shows that the total fragments (i.e. paired-end reads) do not correspond with the number of paired-end reads that is given in the summary statistics from Hisat2. featureCounts tells me I have 21M fragments and Hisat2 tells me I have 20M fragments. How can this be possible?

featurecounts hisat2 rna-seq mapping • 6.5k views

ADD COMMENT • link updated 8.6 years ago by Wei Shi ★ 3.6k • written 8.6 years ago by Koen Van den Berge ▴ 180

0

Entering edit mode

Wei Shi ★ 3.6k

@wei-shi-2183

Last seen 3 hours ago

Australia/Melbourne

Can you provide the parameters you used when running featureCounts?

The discrepancy might be due to how you define "concordantly mapped reads". Although you count fragments where both ends are mapped, the fragments with paired end distance greater than the nominal template length are counted by featureCounts as well and this will give you larger counts. If you use -d and -D options to specify the minimum and maximum fragment length, you will get reduced number of counts. So this is all dependent on how you count the reads.

ADD COMMENT • link 8.6 years ago Wei Shi ★ 3.6k

0

Entering edit mode

Dear Wei, thanks again.

I used featureCounts (Rsubread_1.20.6) with the following arguments:

featureCounts ( f i l e s = bamFiles , annot . ext=”xxx.gff3”, isGTFAnnotationFile = TRUE, useMetaFeatures = TRUE, isPairedEnd=TRUE, requireBothEndsMapped = TRUE, GTF . featureType = ”CDS” , GTF. attrType = ”ID” , allowMultiOverlap=FALSE)

The genome is an in-house draft for a non-model organism.

For the mapping, Hisat2 has been used with default parameters, and I basically compared the number of concordantly mapped reads reported there with the number of assigned fragments in featureCounts.

ADD REPLY • link 8.6 years ago Koen Van den Berge ▴ 180

0

Entering edit mode

Thanks for the information. I think you will need to understand how Hisat2 defines "concordantly mapped reads". What is the limit on the distance between the two reads from the same pair?

If you set 'checkFragLength=TRUE' in your featureCounts command, you will get less assigned reads.

ADD REPLY • link 8.6 years ago Wei Shi ★ 3.6k

score 3 · Accepted Answer · 2016-05-04

3

Entering edit mode

Wei Shi ★ 3.6k

@wei-shi-2183

Last seen 3 hours ago

Australia/Melbourne

The total number of fragments reported by featureCounts is the total number of alignments included in your bam files. If there are fragments that were reported more than once in your bam file, the total number of alignments will be greater than the total number of fragments included in your data.

ADD COMMENT • link 8.6 years ago Wei Shi ★ 3.6k

0

Entering edit mode

Thank you for your reply, Wei.

So the larger number of fragments reported by featureCounts may be the result of multi-mapping reads?

However, similar, the number of counted fragments is higher than the number of concordantly mapped reads (exactly one time as in the example above) for some of the samples. This happens when we are not counting multi-mapping fragments, not counting multi-overlapping fragments and only counting fragments where both reads are mapped. How can this be explained?

Best,

Koen

ADD REPLY • link 8.6 years ago Koen Van den Berge ▴ 180