HTSeqGenie runs very slow
zh9118 • 0
Last seen 5 months ago
United States


I am running HTSeqGenie on paired-end RNA-Seq samples. Two fastq files are both around 3G. I tried the same code on both Mac and Red Hat Linux. They are running super slow on both OS. The running messages are below the codes. Thanks for helping.

# This is the R code used to run the sample
save_dir <- runPipeline( = T,

    ## input
    input_file = "~/Downloads/Pilot/RNA_R1_001.fastq.gz",
    input_file2 = "~/Downloads/Pilot/RNA_R2_001.fastq.gz",
    paired_ends = TRUE,
    quality_encoding = "illumina1.8",

    ## system
    num_cores = 6,
    debug.tracemem = F,

    ## output
    save_dir = paste("~/Downloads/Pilot/analysis/", Sample.ID, sep=''),
    prepend_str = Sample.ID,
    overwrite_save_dir = "erase",
    remove_processedfastq = F,
    remove_chunkdir = F,

    ## trim reads = FALSE,
    # trimReads.length = NULL,
    # trimReads.trim5 = 0,

    ## Filter = T,
    filterQuality.minQuality = 23,
    filterQuality.minFrac = 0.7,
    filterQuality.minLength = 18,

    ## detect adapter contamination = T,
    detectAdapterContam.force_paired_end_adapter = F,

    ## detect ribosomal RNA = T,
    detectRRNA.rrna_genome = "gencode_v43_rRNA",

    ## aligner
    path.gsnap_genomes = "~/Downloads/Genome/Human/",
    alignReads.genome = "hg38",
    alignReads.static_parameters = "-M 2 -n 10 -B 2 -i 1 -N 1 -w 200000 -E 1 --pairmax-rna=200000 --clip-overlap",
    alignReads.sam_id = Sample.ID,
    alignReads.use_gmapR_gsnap = F,

    ## gene model
    path.genomic_features = "~/Downloads/Gencode.v43/", = F,
    countGenomicFeatures.gfeatures = "Gencode.v43.RData",

    # Other process off = F, = F, = F
# Below are the running messages on Mac:
checkConfig.R/checkConfig.template: loading template config= inst/config/default-config.txt 
sh: line 1: 82406 Abort trap: 6           samtools 2> /dev/null
sh: line 1: 82408 Abort trap: 6           samtools 2>&1
2023-09-25 17:53:04 INFO::preprocessReads.R/preprocessReads: starting...
2023-09-25 17:53:04 INFO::io.R/FastQStreamer.init: initialised FastQ streamer for filename= ~/Downloads/Pilot/RNA_R1_001.fastq.gz
2023-09-25 17:53:04 INFO::io.R/FastQStreamer.init: initialised FastQ streamer for filename= ~/Downloads/Pilot/RNA_R2_001.fastq.gz
2023-09-25 17:53:04 DEBUG::tools.R/processChunks: starting...
2023-09-25 17:53:12 DEBUG::tools.R/processChunks: waiting for chunkid=[  ] ...
2023-09-25 17:53:12 DEBUG::tools.R/processChunks: starting chunkid= 1 ; see logfile= ~/Downloads/Pilot/analysis/xxx/chunks/chunk_000001/logs/progress.log
2023-09-25 17:53:19 DEBUG::tools.R/processChunks: starting chunkid= 2 ; see logfile= ~/Downloads/Pilot/analysis/xxx/chunks/chunk_000002/logs/progress.log
2023-09-25 17:53:27 DEBUG::tools.R/processChunks: starting chunkid= 3 ; see logfile= ~/Downloads/Pilot/analysis/xxx/chunks/chunk_000003/logs/progress.log
2023-09-25 17:53:35 DEBUG::tools.R/processChunks: starting chunkid= 4 ; see logfile= ~/Downloads/Pilot/analysis/xxx/chunks/chunk_000004/logs/progress.log
2023-09-25 17:53:44 DEBUG::tools.R/processChunks: starting chunkid= 5 ; see logfile= ~/Downloads/Pilot/analysis/xxx/chunks/chunk_000005/logs/progress.log
2023-09-25 17:53:53 DEBUG::tools.R/processChunks: starting chunkid= 6 ; see logfile= ~/Downloads/Pilot/analysis/xxx/chunks/chunk_000006/logs/progress.log
2023-09-25 17:54:14 DEBUG::tools.R/processChunks: waiting for chunkid=[ 1, 2, 3, 4, 5, 6 ] ...
2023-09-25 17:55:14 DEBUG::tools.R/processChunks: waiting for chunkid=[ 1, 2, 3, 4, 5, 6 ] ...
2023-09-25 17:56:14 DEBUG::tools.R/processChunks: waiting for chunkid=[ 1, 2, 3, 4, 5, 6 ] ...
2023-09-25 17:57:14 DEBUG::tools.R/processChunks: waiting for chunkid=[ 1, 2, 3, 4, 5, 6 ] ...
2023-09-25 17:58:02 DEBUG::tools.R/processChunks: done with chunkid= 1 ; elapsed.time= 4.845 minutes
2023-09-25 17:58:02 DEBUG::tools.R/processChunks: starting chunkid= 7 ; see logfile= ~/Downloads/Pilot/analysis/xxx/chunks/chunk_000007/logs/progress.log
2023-09-25 17:58:14 DEBUG::tools.R/processChunks: waiting for chunkid=[ 2, 3, 4, 5, 6, 7 ] ...
2023-09-25 18:30:43 INFO::preprocessReads.R/preprocessReads: done
2023-09-25 18:30:43 INFO::preprocessReads.R/buildShortReadReports: generating report_dir= ~/Downloads/Pilot/analysis/xxx/reports/shortReadReport_1 ...
2023-09-25 18:33:10 INFO::preprocessReads.R/buildShortReadReports: generating report_dir= ~/Downloads/Pilot/analysis/xxx/reports/shortReadReport_2 ...
# Afterwards kept running with no message. It has been more than 30 hours.
# Below are the sessionInfo() on Mac
sessionInfo( )
R version 4.2.3 (2023-03-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Ventura 13.4

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] HTSeqGenie_4.28.1           VariantAnnotation_1.42.1    ShortRead_1.54.0            GenomicAlignments_1.32.1    SummarizedExperiment_1.26.1
 [6] Biobase_2.56.0              MatrixGenerics_1.8.1        matrixStats_0.63.0          BiocParallel_1.30.4         gmapR_1.38.0               
[11] Rsamtools_2.12.0            Biostrings_2.64.1           XVector_0.36.0              GenomicRanges_1.48.0        GenomeInfoDb_1.32.4        
[16] IRanges_2.30.1              S4Vectors_0.34.0            BiocGenerics_0.42.0        

loaded via a namespace (and not attached):
 [1] httr_1.4.5             bit64_4.0.5            VariantTools_1.38.0    BiocFileCache_2.4.0    latticeExtra_0.6-30    blob_1.2.3            
 [7] BSgenome_1.64.0        GenomeInfoDbData_1.2.8 yaml_2.3.7             progress_1.2.2         pillar_1.9.0           RSQLite_2.3.0         
[13] lattice_0.20-45        glue_1.6.2             digest_0.6.31          RColorBrewer_1.1-3     Matrix_1.5-3           chipseq_1.46.0        
[19] XML_3.99-0.13          pkgconfig_2.0.3        biomaRt_2.52.0         zlibbioc_1.42.0        jpeg_0.1-10            tibble_3.2.1          
[25] KEGGREST_1.36.3        generics_0.1.3         ellipsis_0.3.2         cachem_1.0.7           GenomicFeatures_1.48.4 cli_3.6.0             
[31] deldir_1.0-6           magrittr_2.0.3         crayon_1.5.2           memoise_2.0.1          fansi_1.0.4            xml2_1.3.3            
[37] hwriter_1.3.2.1        Cairo_1.6-0            tools_4.2.3            prettyunits_1.1.1      hms_1.1.2              BiocIO_1.6.0          
[43] lifecycle_1.0.3        stringr_1.5.0          interp_1.1-3           DelayedArray_0.22.0    AnnotationDbi_1.58.0   compiler_4.2.3        
[49] rlang_1.1.0            grid_4.2.3             RCurl_1.98-1.10        rstudioapi_0.14        rjson_0.2.21           rappdirs_0.3.3        
[55] bitops_1.0-7           restfulr_0.0.15        codetools_0.2-19       DBI_1.1.3              curl_5.0.0             R6_2.5.1              
[61] dplyr_1.1.1            rtracklayer_1.56.1     fastmap_1.1.1          bit_4.0.5              utf8_1.2.3             filelock_1.0.2        
[67] stringi_1.7.12         parallel_4.2.3         Rcpp_1.0.10            vctrs_0.6.1            png_0.1-8              dbplyr_2.3.1          
[73] tidyselect_1.2.0
I found that it's always the last chunk that took time forever, even though the last chunk is the smallest one.

enter image description here


