Hi everyone.
My aim was to find alternative splicing alteration between conditions of mice samples. I made a workflow with Rsubread to align and featureCounts to count. For splicing I switched to the subjunc() function. It worked well.
Then I discovered the "reportAllJunctions" argument, which I set to TRUE in the subjunc() function and received an Error "ERROR: Cigar section longer than read length: 1170268831 >= 1210, '22M164379026079M'". I though that is strange but continued without having all Junctions reported.
Some weeks later, I tried my luck again, only feeding in one sample fastq file and this time I only receive a message by Rstudio that my session was aborted due to a fatal error.
If "reportAllJunction" is set to FALSE, all works fine, also feeding into 42 sample fastq files. I am working on a Ubuntu system and should have above 60GB of RAM. Genome and GTF are both from the Gencode Website.
Below, I send you the code and the output because the alignment reached 20% but then stopped there until it crashed after a few minutes being idle.
> subjunc(index = "/home/chuddy/bioinformatics/ref_genomes/mouse_39/genome/rsubread/GRCm39",
+ nthreads = 20,
+ readfile1 = fastq_files_abs[1],
+ sortReadsByCoordinates = TRUE,
+ reportAllJunctions = T, # if T then: ERROR: Cigar section longer than read length: 1170268831 >= 1210, '22M164379026079M'
+ isGTF = T,
+ useAnnotation = T,
+ annot.ext = "/home/chuddy/bioinformatics/ref_genomes/mouse_39/annotation/gencode_vm34/gencode.vM34.annotation.gtf")
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
Rsubread 2.14.2
//================================= setting ==================================\\
|| ||
|| Function : Read alignment + Junction/Fusion detection (RNA-Seq) ||
|| Input file : 07-06h-zh_rep1_R1.fastq.gz ||
|| Output file : 07-06h-zh_rep1_R1.fastq.gz.subjunc.BAM (BAM), Sorted ||
|| Index name : GRCm39 ||
|| ||
|| ------------------------------------ ||
|| ||
|| Threads : 20 ||
|| Phred offset : 33 ||
|| Min votes : 1 / 28 ||
|| Max mismatches : 3 ||
|| Max indel length : 5 ||
|| Report multi-mapping reads : yes ||
|| Max alignments per multi-mapping read : 1 ||
|| Annotations : gencode.vM34.annotation.gtf (GTF) ||
|| ||
\\============================================================================//
//=============== Running (25-Mär-2024 13:57:46, pid=128251) ================\\
|| ||
|| Check the input reads. ||
|| The input file contains base space reads. ||
|| Initialise the memory objects. ||
|| Estimate the mean read length. ||
|| The range of Phred scores observed in the data is [2,37] ||
|| Create the output BAM file. ||
|| Check the index. ||
|| Init the voting space. ||
|| Load the annotation file. ||
|| 863151 annotation records were loaded. ||
|| ||
|| Chromosomes/contigs in index but not in annotation : ||
|| GL456394.1 ||
|| GL456210.1 ||
|| JH584295.1 ||
|| GL456381.1 ||
|| GL456379.1 ||
|| GL456233.2 ||
|| JH584299.1 ||
|| GL456385.1 ||
|| MU069434.1 ||
|| JH584303.1 ||
|| GL456366.1 ||
|| GL456372.1 ||
|| GL456389.1 ||
|| GL456211.1 ||
|| JH584296.1 ||
|| GL456382.1 ||
|| JH584300.1 ||
|| MU069435.1 ||
|| JH584304.1 ||
|| GL456367.1 ||
|| GL456221.1 ||
|| GL456392.1 ||
|| GL456354.1 ||
|| GL456219.1 ||
|| GL456360.1 ||
|| GL456396.1 ||
|| GL456212.1 ||
|| JH584297.1 ||
|| GL456383.1 ||
|| JH584301.1 ||
|| GL456370.1 ||
|| GL456387.1 ||
|| GL456368.1 ||
|| GL456239.1 ||
|| GL456378.1 ||
|| GL456359.1 ||
|| JH584298.1 ||
|| JH584302.1 ||
|| GL456390.1 ||
|| ||
|| Global environment is initialised. ||
|| Load the 1-th index block... ||
|| The index block has been loaded. ||
|| Start read mapping in chunk. ||
|| 0% completed, 0.6 mins elapsed, rate=214.6k reads per second ||
|| 7% completed, 0.7 mins elapsed, rate=224.4k reads per second ||
|| 13% completed, 0.8 mins elapsed, rate=220.4k reads per second ||
|| 20% completed, 0.9 mins elapsed, rate=219.0k reads per second ||
...
HERE IT CRASHED
...
sessionInfo( ) # right before execution of subjunc()
R version 4.3.3 (2024-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=de_CH.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=de_CH.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=de_CH.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=de_CH.UTF-8 LC_IDENTIFICATION=C
time zone: Europe/Zurich
tzcode source: system (glibc)
attached base packages:
[1] grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] VennDiagram_1.7.3 futile.logger_1.4.3 fgsea_1.26.0 ggrepel_0.9.5 ggforce_0.4.1
[6] viridis_0.6.5 viridisLite_0.4.2 ggConvexHull_0.1.0 edgeR_3.42.4 limma_3.56.2
[11] gghighcontrast_0.1.0 Rsubread_2.14.2 lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1
[16] dplyr_1.1.4 purrr_1.0.2 readr_2.1.5 tidyr_1.3.1 tibble_3.2.1
[21] ggplot2_3.4.4 tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] utf8_1.2.4 generics_0.1.3 futile.options_1.0.1 stringi_1.8.3 lattice_0.22-5
[6] hms_1.1.3 magrittr_2.0.3 timechange_0.3.0 Matrix_1.6-5 formatR_1.14
[11] gridExtra_2.3 fansi_1.0.6 scales_1.3.0 tweenr_2.0.2 codetools_0.2-19
[16] cli_3.6.2 rlang_1.1.3 polyclip_1.10-6 cowplot_1.1.3 munsell_0.5.0
[21] withr_3.0.0 parallel_4.3.3 tools_4.3.3 BiocParallel_1.34.2 tzdb_0.4.0
[26] colorspace_2.1-0 fastmatch_1.1-4 locfit_1.5-9.8 lambda.r_1.2.4 vctrs_0.6.5
[31] R6_2.5.1 lifecycle_1.0.4 MASS_7.3-60 pkgconfig_2.0.3 pillar_1.9.0
[36] gtable_0.3.4 data.table_1.15.0 glue_1.7.0 Rcpp_1.0.12 xfun_0.41
[41] tidyselect_1.2.0 rstudioapi_0.15.0 knitr_1.45 farver_2.1.1 compiler_4.3.3
Thank for replying. I understand. Will this update be available soon in the next version and with R4.3 since the devel version needs R4.4? Because I wouldn't want to mess around with the current workflows and packages by updating my R version.
Thank you again!
This update will be included in the next Bioc release scheduled next month.