I am working with paired-end RNA-seq data and have aligned the reads to the reference genome (unstranded) using STAR. Now, I need to count the aligned reads using featureCounts
.
I am aware of the --countReadPairs
option in featureCounts
, but I'm unsure about its importance in the context of paired-end data. Specifically:
- What is the role of
--countReadPairs
when usingfeatureCounts
on paired-end data? - How does using or not using this option affect the downstream analysis, especially in terms of gene expression quantification?
- Which command should I use to ensure accurate read counting for paired-end data?
Here are the commands I'm considering:
featureCounts -T 14 -s 0 -p -t exon -g gene_id -a genome.gtf -o counts.txt *.bam
or
featureCounts -T 14 -s 0 -p --countReadPairs -t exon -g gene_id -a genome.gtf -o counts.txt *.bam
Any guidance on the proper usage and its implications would be greatly appreciated.
Since I don't know which organism you are working on, I would recommend using the following:
-s 0
if library is unstranded-C
Discard reads where R1 maps to one chromosome and R2 maps to another chromosome or vice-versa`If your genome has gene duplication events (like I work with Plasmodium and Toxoplasma and even though 95-96% of reads are uniquely aligned I wish to account for multimapping reads that emnate from small gene family comprising of 60 genes) or for some reason you expect more multimapping reads (like in rice genome) and wanna account for them you can use some extra flags such as
--fraction -M
which will add 1/x to the read counts for every read that maps to x locations in the genome. The you can useround()
function in R to get integer that DESeq2 loves. Maybe @james-w-macdonald-5106 can attest to this usage can you?