Possible bugs in Rsubread/stad-alone featureCounts options fracOverlap and largestOverlap with fractional counts
1
0
Entering edit mode
Konstantin • 0
@f38802ac
Last seen 2.1 years ago
Germany

Hi,

running Rsubread 2.8.2/2.12.0 or featureCounts 2.0.3/2.0.1, I stumbled over two issues when allowing ambiguous read assignment (-O/allowMultiOverlap)

1) regarding assignment via minimum fractional overlap (--fracOverlap) using featureCounts stand-alone binary.

2) when combined with --/largestOverlap and --/fraction using Rsubread featureCounts function or the stand-alone binary.

to 1)

Assume a read of length 20, which covers two regions, both by 4 nucleotides. Using option --minOverlap 4 reports a count of 1 for both regions, whereas --fracOverlap 0.2 does not (0.1999 does). According to the manual, --fracOverlap defines the minimum fraction of overlapping bases. So propably the check is not performed by len >= frac but len > frac?

to 2)

Assume a read, which covers two regions. Using -O/allowMultiOverlap option results in a count of 1 for both regions. When in addition fractional counting is enabled (--/fraction), both regions get a 0.5 count (as expected).

Now lets parameterize featureCounts via --/minOverlap, --/fracOverlap or --/largestOverlap so that the read is assigned to only one region. For --/largestOverlap there is an inconsistency in the behaviour when fractional counting (--/fraction) is enabled too. Whereas for --/minOverlap or --/fracOverlap the returned value is 1 (as expected), for --/largestOverlap it is 0.5. I guess this is due to a "wrong" logic, where the fraction is calculated before the assignment. Is this intended or an issue?

Please find below my test Bash script which generates BAM and SAF files and runs featureCounts stand-alone or Rsubread featureCounts via Rscript. It requires SAMtools, Rscript and featureCounts (>2.0.1) to be in in PATH.


echo
echo "to 1) fracOverlap"
# SE read of length 20, mapped from positions 1 to 20 without a mismatch
cat <<- 'EOF' | samtools view --no-PG -b > test.bam && samtools index test.bam
@HD VN:1.6  SO:coordinate
@SQ SN:chr  LN:16569
read1   0   chr 1   0   20= *   0   0   AAAAAAAAAAAAAAAAAAAA    FFFFFFFFFFFFFFFFFFFF    MD:Z:20 NH:i:1  HI:i:0  NM:i:0
EOF

cat <<- 'EOF' > test.saf
chr:0-4 chr 1   4   .   .   .
chr:16-20   chr 17  20  .   .   .
EOF
echo "expected minoverlap 4"
featureCounts -O --minOverlap 4 -F SAF -a test.saf --tmpDir /tmp -o /dev/stdout test.bam 2> /dev/null | tail -n +3 | cut -f 1,7 | column -t
echo "fracOverlap 0.2 - bug?! - should be 1"
featureCounts -O --fracOverlap 0.2 -F SAF -a test.saf --tmpDir /tmp -o /dev/stdout test.bam 2> /dev/null | tail -n +3 | cut -f 1,7 | column -t
echo "fracOverlap 0.2 - works in Rsubread!"
Rscript -e 'library(Rsubread); sink("/dev/null"); x=featureCounts("test.bam",annot.ext="test.saf",allowMultiOverlap=T,fracOverlap=0.2,tmpDir="/tmp"); sink(); x$counts' | tail -n +2

echo
echo "to 2) largestOverlap - single-end"
# SE read of length 20, mapped from positions 1 to 20 without a mismatch
cat <<- 'EOF' | samtools view --no-PG -b > test.bam && samtools index test.bam
@HD VN:1.6  SO:coordinate
@SQ SN:chr  LN:16569
read1   0   chr 1   0   20= *   0   0   AAAAAAAAAAAAAAAAAAAA    FFFFFFFFFFFFFFFFFFFF    MD:Z:20 NH:i:1  HI:i:0  NM:i:0
EOF
cat <<- 'EOF' > test.saf
chr:0-15    chr 1   15  .   .   .
chr:15-30   chr 16  30  .   .   .
EOF

echo "expected minoverlap 10"
featureCounts -O --minOverlap 10 -F SAF -a test.saf --tmpDir /tmp -o /dev/stdout test.bam 2> /dev/null | tail -n +3 | cut -f 1,7 | column -t
echo "+ fraction enabled"
featureCounts -O --minOverlap 10 --fraction -F SAF -a test.saf --tmpDir /tmp -o /dev/stdout test.bam 2> /dev/null | tail -n +3 | cut -f 1,7 | column -t
echo "expected largestoverlap"
featureCounts -O --largestOverlap -F SAF -a test.saf --tmpDir /tmp -o /dev/stdout test.bam 2> /dev/null | tail -n +3 | cut -f 1,7 | column -t
echo "+ fraction enabled - bug?! - should be 1"
featureCounts -O --largestOverlap --fraction -F SAF -a test.saf --tmpDir /tmp -o /dev/stdout test.bam 2> /dev/null | tail -n +3 | cut -f 1,7 | column -t
echo "+ fraction enabled - bug?! - also in Rsubread"
Rscript -e 'library(Rsubread); sink("/dev/null"); x=featureCounts("test.bam",annot.ext="test.saf",allowMultiOverlap=T,largestOverlap=T,fraction=T,tmpDir="/tmp"); sink(); x$counts' | tail -n +2 | column -t

echo
echo "to 2) largestOverlap - paired-end"
# PE read of lengths 20 and 30, mapped from positions 1 to 20 without a mismatch and from position 51 to 80 without a mismatch
cat <<- 'EOF' | samtools view --no-PG -b > test.bam && samtools index test.bam
@HD VN:1.6  SO:coordinate
@SQ SN:chr  LN:1000
read1   99  chr 1   0   20= =   51  80  AAAAAAAAAAAAAAAAAAAA    FFFFFFFFFFFFFFFFFFFF    MD:Z:20 NH:i:1  HI:i:0  NM:i:0
read1   147 chr 51  0   30= =   1   -80 CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC  FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF  MD:Z:30 NH:i:1  HI:i:0  NM:i:0
EOF
cat <<- 'EOF' > test.saf
chr:0-20    chr 1   20  .   .   .
chr:50-60   chr 51  60  .   .   .
chr:60-80   chr 61  80  .   .   .
EOF

echo "expected minoverlap 20"
featureCounts -p --countReadPairs -O --minOverlap 20 -F SAF -a test.saf --tmpDir /tmp -o /dev/stdout test.bam 2> /dev/null | tail -n +3 | cut -f 1,7 | column -t
echo "+ fraction enabled"
featureCounts -p --countReadPairs -O --minOverlap 20 --fraction -F SAF -a test.saf --tmpDir /tmp -o /dev/stdout test.bam 2> /dev/null | tail -n +3 | cut -f 1,7 | column -t
echo "expected largestoverlap"
featureCounts -p --countReadPairs -O --largestOverlap -F SAF -a test.saf --tmpDir /tmp -o /dev/stdout test.bam 2> /dev/null | tail -n +3 | cut -f 1,7 | column -t
echo "+ fraction enabled - bug?! - should be 0.5"
featureCounts -p --countReadPairs -O --largestOverlap --fraction -F SAF -a test.saf --tmpDir /tmp -o /dev/stdout test.bam 2> /dev/null | tail -n +3 | cut -f 1,7 | column -t
echo "+ fraction enabled - bug?! - also in Rsubread"
Rscript -e 'library(Rsubread); sink("/dev/null"); x=featureCounts("test.bam",annot.ext="test.saf",allowMultiOverlap=T,largestOverlap=T,isPairedEnd=T,countReadPairs=T,fraction=T,tmpDir="/tmp"); sink(); x$counts' | tail -n +2 | column -t
Rsubread featureCounts • 1.2k views
ADD COMMENT
0
Entering edit mode

Thank you for reporting the bug and apologies for late response. We are currently working on fixing this bug. A fixed version of Rsubread should be released soon.

ADD REPLY
0
Entering edit mode
Wei Shi ★ 3.6k
@wei-shi-2183
Last seen 4 weeks ago
Australia/Melbourne

We have fixed these issues. Please try the latest version. If the problem persists, please let us know.

ADD COMMENT

Login before adding your answer.

Traffic: 725 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6