I increased nthethreads value in featureCounts but the running time didnt decrease.
I used the following command: subread_res <- featureCounts(files, annot.ext = GTF_file, isGTFAnnotationFile = TRUE, nthreads = X, countMultiMappingReads = TRUE, fraction = TRUE)
First run with 5 threads, 2.7 min used
//========================== featureCounts setting ===========================\\ || || || Input files : 1 BAM file || || S /Result ... || || || || Dir for temp files : . || || Threads : 5 || || Level : meta-feature level || || Paired-end : no || || Strand specific : no || || Multimapping reads : counted || || Multi-overlapping reads : not counted || || Min overlapping bases : 1 || || || \\===================== http://subread.sourceforge.net/ ======================// //================================= Running ==================================\\ || || || Load annotation file us_muscu ... || || Features : 729264 || || Meta-features : 48795 || || Chromosomes/contigs : 45 || || || || Process BAM file sults/01_STAR_r ... || || Single-end reads are included. || || Assign reads to features... || || Total reads : 53383794 || || Successfully assigned reads : 12345398 (23.1%) || || Running time : 2.70 minutes || || || || Read assignment finished. || || || \\===================== http://subread.sourceforge.net/ ======================//
then with 1 thread, 2.73 min. Nearly no difference. How are your data with the nthreads? Suggestions? Thanks!
========== _____ _ _ ____ _____ ______ _____ ===== / ____| | | | _ \| __ \| ____| /\ | __ \ ===== | (___ | | | | |_) | |__) | |__ / \ | | | | ==== \___ \| | | | _ <| _ /| __| / /\ \ | | | | ==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| | ========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/ Rsubread 1.26.1 //========================== featureCounts setting ===========================\\ || || || Input files : 1 BAM file || || S s/Result ... || || || || Dir for temp files : . || || Threads : 1 || || Level : meta-feature level || || Paired-end : no || || Strand specific : no || || Multimapping reads : counted || || Multi-overlapping reads : not counted || || Min overlapping bases : 1 || || || \\===================== http://subread.sourceforge.net/ ======================// //================================= Running ==================================\\ || || || Load annotation file ensembl_86/Mus_muscu ... || || Features : 729264 || || Meta-features : 48795 || || Chromosomes/contigs : 45 || || || || Process BAM file Results/01_STAR_r ... || || Single-end reads are included. || || Assign reads to features... || || Total reads : 53383794 || || Successfully assigned reads : 12345398 (23.1%) || || Running time : 2.73 minutes || || || || Read assignment finished. || || || \\===================== http://subread.sourceforge.net/ ======================//
Bellow is the session info
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: .linuxbrew/Cellar/r/3.4.1_2/lib/R/lib/libRblas.so
LAPACK: .linuxbrew/Cellar/r/3.4.1_2/lib/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Rsubread_1.26.1
loaded via a namespace (and not attached):
[1] compiler_3.4.1
Thanks for the quick reply! I saw more than 100% CPU usage in the later stage of featureCounts (macosx binary version), the second reason explained the time usage. Would be great to have faster read for bam files.
It is true for me too that the majority of the time is spent in reading/decompressing. For this reason it would be nice if Rsubread::featureCounts allowed some degree to parallelization at this level. Ideally it might support new parameter, nbams, being how many .bams to process at a time (some smaller number than nthreads). If I could, on my compute server, I would probably call:
featureCounts(nbams = 5, nthreads = 40, ...)
... without getting IO bound.
Thanks for the suggestion. We will have an investigation about this.