Difference between SummarizeOverlaps and HTSeq
1
0
Entering edit mode
@walter-f-baumann-12439
Last seen 7.0 years ago

Hi, 

I compared the counts per gene of summarizeOverlaps and HTSeq (python). The correlation was ~0.98. Although the correlation is very good, I was surprised that it was not roughly or equal 1, because summarizeOverlaps is according to the documentation designed after the counting modes in HTSeq (I use "Union" mode for both, Single end). The settings in both tools are the same. 

While reading a bit more I came across the paper introducing "featureCounts". When they compared featureCounts with summarizeOverlaps and HTSeq in section 5.2, the results of summarizeOverlaps and HTSeq also slightly vary from each other. 

My question now is, why summarizeOverlaps and HTSeq slightly vary. Unfortunately, I could not find further reading on the differences in the algorithm. So I assume that both tools are not the same, as I previously thought. 

Thanks for some information!

R summarizeoverlaps htseqcounts • 2.3k views
ADD COMMENT
1
Entering edit mode
thokall ▴ 160
@thokall-14310
Last seen 6 weeks ago
Swedish Museum of Natural History

Hi,

In the paper you link to they discuss the differences between all three count methods. Could this be enough to explain the difference you observe?

"htseq-count counted slightly fewer reads than featureCounts and summarizeOverlaps. We had a close look at the summarization results for each read given by htseq-count and featureCounts and found that only a small number of reads were assigned to different genes by the two methods (Fig. 2a). By comparing the features regions with the regions these reads were mapped to, we identified the reason causing this discrepancy. htseq-counttakes the right-most base position of each feature as an open position and excludes it from read summarization, whereas featureCounts and summarizeOverlaps take it as a closed position and includes it in their summarizations. The GFF specification states that the start and end positions of features are inclusive (Wellcome Trust Sanger Institute, 2013), so the interpretation of featureCounts and summarizeOverlaps appears to be correct."

Thomas

 

ADD COMMENT

Login before adding your answer.

Traffic: 765 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6