To count or not to count multi-overlapping reads?
1
0
Entering edit mode
Arindam ▴ 80
@ag1805x-15215
Last seen 21 hours ago
University of Eastern Finland

The featureCounts manuscript mentions that for RNA-Seq multi-overlapping reads must not be counted and the reasoning seems logical.

We recommend that reads or fragments overlapping more than one gene are not counted for RNA-seq experiments because any single fragment must originate from only one of the target genes but the identity of the true target gene cannot be confidently determined. On the other hand, we recommend that multi-overlap reads or fragments are counted for most ChIP-seq experiments because epigenetic modifications inferred from these reads may regulate the biological functions of all their overlapping genes.

How does it handle reads that map to a gene that is located in a region that also has another gene but on the alternate strand? If strand information is provided, I understand this should not be an issue. But what about unstranded sequencing?

I was particularly considering the situations as shown in this figure: https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-9-174/figures/1

RNA-Seq featureCounts Rsubread • 119 views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

How does it handle reads that map to a gene that is located in a region that also has another gene but on the alternate strand?

Overlapping genes typically are on different strands, but that is irrelevant if the sequencing is unstranded because it is impossible for the alignment to know which strand the read came from. With unstranded sequencing, reads overlapping two genes on different strands will by default not be counted. With stranded sequencing, they will.

Overlapping genes most often involve a pseudogene overlapping a protein-coding gene on the other strand. When I use featureCounts for RNA-seq data, I prefer to restrict the gene annotation to curated RefSeq genes. This has the effect of removing computationally predicted genes, most pseudogenes, and most cases of overlaps. If you're interested, you can see my Rsubread SAF files at https://bioinf.wehi.edu.au/Rsubread/annot/.

In our experiments with mouse stranded and unstranded RNA-seq data, we find that an extra 4% of reads can be assigned to genes with stranded instead of unstranded sequencing when using featureCounts with Gencode annotation. When using strict RefSeq annotation, the difference between stranded and unstranded sequencing is 2.4% extra reads assigned.

ADD COMMENT

Login before adding your answer.

Traffic: 719 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6