Hi,
running Rsubread 2.8.2/2.12.0 or featureCounts 2.0.3/2.0.1, I stumbled over two issues when allowing ambiguous read assignment (-O/allowMultiOverlap)
1) regarding assignment via minimum fractional overlap (--fracOverlap) using featureCounts stand-alone binary.
2) when combined with --/largestOverlap and --/fraction using Rsubread featureCounts function or the stand-alone binary.
to 1)
Assume a read of length 20, which covers two regions, both by 4 nucleotides. Using option --minOverlap 4 reports a count of 1 for both regions, whereas --fracOverlap 0.2 does not (0.1999 does). According to the manual, --fracOverlap defines the minimum fraction of overlapping bases. So propably the check is not performed by len >= frac but len > frac?
to 2)
Assume a read, which covers two regions. Using -O/allowMultiOverlap option results in a count of 1 for both regions. When in addition fractional counting is enabled (--/fraction), both regions get a 0.5 count (as expected).
Now lets parameterize featureCounts via --/minOverlap, --/fracOverlap or --/largestOverlap so that the read is assigned to only one region. For --/largestOverlap there is an inconsistency in the behaviour when fractional counting (--/fraction) is enabled too. Whereas for --/minOverlap or --/fracOverlap the returned value is 1 (as expected), for --/largestOverlap it is 0.5. I guess this is due to a "wrong" logic, where the fraction is calculated before the assignment. Is this intended or an issue?
Please find below my test Bash script which generates BAM and SAF files and runs featureCounts stand-alone or Rsubread featureCounts via Rscript. It requires SAMtools, Rscript and featureCounts (>2.0.1) to be in in PATH.
echo
echo "to 1) fracOverlap"
# SE read of length 20, mapped from positions 1 to 20 without a mismatch
cat <<- 'EOF' | samtools view --no-PG -b > test.bam && samtools index test.bam
@HD VN:1.6 SO:coordinate
@SQ SN:chr LN:16569
read1 0 chr 1 0 20= * 0 0 AAAAAAAAAAAAAAAAAAAA FFFFFFFFFFFFFFFFFFFF MD:Z:20 NH:i:1 HI:i:0 NM:i:0
EOF
cat <<- 'EOF' > test.saf
chr:0-4 chr 1 4 . . .
chr:16-20 chr 17 20 . . .
EOF
echo "expected minoverlap 4"
featureCounts -O --minOverlap 4 -F SAF -a test.saf --tmpDir /tmp -o /dev/stdout test.bam 2> /dev/null | tail -n +3 | cut -f 1,7 | column -t
echo "fracOverlap 0.2 - bug?! - should be 1"
featureCounts -O --fracOverlap 0.2 -F SAF -a test.saf --tmpDir /tmp -o /dev/stdout test.bam 2> /dev/null | tail -n +3 | cut -f 1,7 | column -t
echo "fracOverlap 0.2 - works in Rsubread!"
Rscript -e 'library(Rsubread); sink("/dev/null"); x=featureCounts("test.bam",annot.ext="test.saf",allowMultiOverlap=T,fracOverlap=0.2,tmpDir="/tmp"); sink(); x$counts' | tail -n +2
echo
echo "to 2) largestOverlap - single-end"
# SE read of length 20, mapped from positions 1 to 20 without a mismatch
cat <<- 'EOF' | samtools view --no-PG -b > test.bam && samtools index test.bam
@HD VN:1.6 SO:coordinate
@SQ SN:chr LN:16569
read1 0 chr 1 0 20= * 0 0 AAAAAAAAAAAAAAAAAAAA FFFFFFFFFFFFFFFFFFFF MD:Z:20 NH:i:1 HI:i:0 NM:i:0
EOF
cat <<- 'EOF' > test.saf
chr:0-15 chr 1 15 . . .
chr:15-30 chr 16 30 . . .
EOF
echo "expected minoverlap 10"
featureCounts -O --minOverlap 10 -F SAF -a test.saf --tmpDir /tmp -o /dev/stdout test.bam 2> /dev/null | tail -n +3 | cut -f 1,7 | column -t
echo "+ fraction enabled"
featureCounts -O --minOverlap 10 --fraction -F SAF -a test.saf --tmpDir /tmp -o /dev/stdout test.bam 2> /dev/null | tail -n +3 | cut -f 1,7 | column -t
echo "expected largestoverlap"
featureCounts -O --largestOverlap -F SAF -a test.saf --tmpDir /tmp -o /dev/stdout test.bam 2> /dev/null | tail -n +3 | cut -f 1,7 | column -t
echo "+ fraction enabled - bug?! - should be 1"
featureCounts -O --largestOverlap --fraction -F SAF -a test.saf --tmpDir /tmp -o /dev/stdout test.bam 2> /dev/null | tail -n +3 | cut -f 1,7 | column -t
echo "+ fraction enabled - bug?! - also in Rsubread"
Rscript -e 'library(Rsubread); sink("/dev/null"); x=featureCounts("test.bam",annot.ext="test.saf",allowMultiOverlap=T,largestOverlap=T,fraction=T,tmpDir="/tmp"); sink(); x$counts' | tail -n +2 | column -t
echo
echo "to 2) largestOverlap - paired-end"
# PE read of lengths 20 and 30, mapped from positions 1 to 20 without a mismatch and from position 51 to 80 without a mismatch
cat <<- 'EOF' | samtools view --no-PG -b > test.bam && samtools index test.bam
@HD VN:1.6 SO:coordinate
@SQ SN:chr LN:1000
read1 99 chr 1 0 20= = 51 80 AAAAAAAAAAAAAAAAAAAA FFFFFFFFFFFFFFFFFFFF MD:Z:20 NH:i:1 HI:i:0 NM:i:0
read1 147 chr 51 0 30= = 1 -80 CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF MD:Z:30 NH:i:1 HI:i:0 NM:i:0
EOF
cat <<- 'EOF' > test.saf
chr:0-20 chr 1 20 . . .
chr:50-60 chr 51 60 . . .
chr:60-80 chr 61 80 . . .
EOF
echo "expected minoverlap 20"
featureCounts -p --countReadPairs -O --minOverlap 20 -F SAF -a test.saf --tmpDir /tmp -o /dev/stdout test.bam 2> /dev/null | tail -n +3 | cut -f 1,7 | column -t
echo "+ fraction enabled"
featureCounts -p --countReadPairs -O --minOverlap 20 --fraction -F SAF -a test.saf --tmpDir /tmp -o /dev/stdout test.bam 2> /dev/null | tail -n +3 | cut -f 1,7 | column -t
echo "expected largestoverlap"
featureCounts -p --countReadPairs -O --largestOverlap -F SAF -a test.saf --tmpDir /tmp -o /dev/stdout test.bam 2> /dev/null | tail -n +3 | cut -f 1,7 | column -t
echo "+ fraction enabled - bug?! - should be 0.5"
featureCounts -p --countReadPairs -O --largestOverlap --fraction -F SAF -a test.saf --tmpDir /tmp -o /dev/stdout test.bam 2> /dev/null | tail -n +3 | cut -f 1,7 | column -t
echo "+ fraction enabled - bug?! - also in Rsubread"
Rscript -e 'library(Rsubread); sink("/dev/null"); x=featureCounts("test.bam",annot.ext="test.saf",allowMultiOverlap=T,largestOverlap=T,isPairedEnd=T,countReadPairs=T,fraction=T,tmpDir="/tmp"); sink(); x$counts' | tail -n +2 | column -t
Thank you for reporting the bug and apologies for late response. We are currently working on fixing this bug. A fixed version of Rsubread should be released soon.