Question

Behaviour of featureCounts when quantifying multi-mapping reads

0

Entering edit mode

Rachael • 0

@6f00b97d

Last seen 10 months ago

United Kingdom

Hi,

I am having some problems understanding the settings for counting multi-mapping reads using featureCounts

Previously we did the following:

Aligned with HISAT2
Used featureCounts from Subread 1.5.0 (for consistency with another paper)

featureCounts -p -C -B -f -T 5 --primary \
    -a examples.gtf \
    -o output.counts \
    input.bam >> featurecounts.txt 2>&1

This did count multimapping reads once each

Alternatively using

featureCounts -p -C -B -f -T 5 \
    -a examples.gtf \
    -o output.counts \
    input.bam >> featurecounts.txt 2>&1

Did not count multimapping reads

(We know that they were/were not being counted from a small amplicon sequencing dataset with few reads)

Now due to some strange gzip error that suddenly appeared using Subread 1.5.0 when the cluster hardware was changed by our admin (GZIP error 2 with any BAM file) we have upgraded to Subread 1.6.4

So now we do the following:

Align with HISAT2
Use featureCounts from Subread 1.6.4

However if we run

featureCounts -p -C -B -f -T 5 --primary \
    -a examples.gtf \
    -o output.counts \
    input.bam >> featurecounts.txt 2>&1

OR

featureCounts -p -C -B -f -T 5 \
    -a examples.gtf \
    -o output.counts \
    input.bam >> featurecounts.txt 2>&1

We get the same output...

Are the multi-mapping reads being counted when --primary is used in version 1.6.4?
Has there been a change in the usage of --primary in different versions or has something else happened?
How do we specify that we do or do not wish to count multi-mapping reads in 1.6.4? - Some research has lead us to the -M option but the manual says that --primary disregards it
Why are we getting the GZIP error 2 with 1.5.0? - we have tried freshly downloaded TCGA BAMs

Bioconductor featurecounts subread • 1.4k views

ADD COMMENT • link 10 months ago Rachael • 0

score 4 · Accepted Answer · 2024-06-12

Both v1.5.0 and v1.6.4 versions are very old. I suggest upgrading to v2.0.6 (the latest version).

There was a change regarding the --primary option between v1.5.0 and v1.6.4. And it seems that the manual has some discordant descriptions regarding the usage of --primary. If you read section 6.2.6, it says:

When multi-mapping reads are reported with primary and secondary alignments and both -M and --primary are specified, only primary alignments will be considered in counting and secondary alignments will be ignored. If -M is specified but --primary is not specified, both primary and secondary alignments will be considered in counting.

This is the correct description for the --primary option. You must specify the -M option in v1.6.4 (and versions thereafter) to allow multimapping reads (even if you only want to count their primary alignments). I hope this can answer your first three questions.

For your fourth question, I have no idea about why gzip error 2 suddenly appeared after a hardware change was made. But because v1.6.4 works on the BAM files, it may be caused by a upgraded zlib version in your system with the hardware change.