Hello, I'm trying to use RSubread's cellCounts
function to process a scRNAseq dataset but am encountering an error. We sequenced the same sample on two instruments:
- Illumina NextSeq 500 (28 + 56 nt)
- Illumina NovoSeq 6000 (150 + 150 nt)
The cellCounts
function returns successfully when run on the FASTQ files of the NextSeq instrument (28 + 56), but throws the following memory mapping error on the NovoSeq instrument (at 150 + 150 nt):
|| Sort the 28395 genes... ||
|| Load the 1-st index block... ||
*** caught segfault ***
address (nil), cause 'memory not mapped'
ERROR: the UMI length is abnormaly long (135 bases). This can be caused by an incorrect cell barcode file.
Has anyone encountered this before or could offer advice to successfully process the NovoSeq sample?
I am including the head from each R1 and R2 file below to show the data structure of each instrument. Perhaps there is something obvious I'm missing. Thanks for your time.
NextSeq 500, R1:
@NB552224:567:HKCJKBGXV:1:11101:19095:1039 1:N:0:GTAGACGA
ACGGTNGCATTGCAACATAACTGAACCN
+
AAAAA#EEEEEEEEEEEEEEEEEAEEE#
@NB552224:567:HKCJKBGXV:1:11101:6118:1039 1:N:0:GTAGACGA
CCGATNTAGTGCAAATCTGCCCAAAGTN
+
AAAAA#EEEEEEEEEEEEEEEEEEEEE#
@NB552224:567:HKCJKBGXV:1:11101:18652:1039 1:N:0:GTAGACGA
TGAATNCAGCCTCTTCTCCCCGCCTCTN
NextSeq 500, R2:
@NB552224:567:HKCJKBGXV:1:11101:19095:1039 2:N:0:GTAGACGA
NCTAACCGCTAACATTACTGCAGGCCACCTACTCATGCACCTAATTGGAAGCGCCA
+
#AAAAEEA<AEEEEE<E<E/<EE/</E<6EAAEAEE<6/6AEEEEE/<EEA///6E
@NB552224:567:HKCJKBGXV:1:11101:6118:1039 2:N:0:GTAGACGA
NTACGAAATCCTCCCCACTTTTGATGTTCTGCATTTCAAATCTGAAGGGTACAACG
+
#AAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEAEAEEEEEEEEA
@NB552224:567:HKCJKBGXV:1:11101:18652:1039 2:N:0:GTAGACGA
NTGATCTCTAGTGGAGATCTCTTTGAGGGCTGTAGTACTAAAGTGTACTTAATGTT
NovoSeq 6000, R1:
@A01071:356:HTWLKDSX7:2:1101:24858:1016 1:N:0:GTAGACGAAA+NCCACACTAG
TNAATTCCAGACCTATCTTTTTACAGCCTCCCGGACGCCGATCGCCCATAGTGTTTTAGAAGTGAAAGTATATAATCCCAAATATACAGATTGAATTTGAGGGGACTATTTTGTCTGTGTGAGCCACAGAAAACACAGGTTGGTATTAAAT
+
:#,FFFFFFFFFFFF::FFFFFFFFFFF,,,,,,,,F,,,,,,:,,,F,:,F,,,,F:,,F,FF,F,,F,FF,::,,,FF:FFFF:::,:,F,:F,F,,,,::F:,F,,:,:,,,,:F,FF,,,FFFF,FF:FFF,,:,,F,,:,,,,::,
@A01071:356:HTWLKDSX7:2:1101:24876:1016 1:N:0:GTAGACGAAA+NCCACACTAG
GNACAGTAGCCAGAGTCGTACGGAAAGGGGTTTTAGGTATAAAATTTTTTTGTGTAGAAAAAAGCGGTGGTTAAATTTTGTCCAACGCTTGTTAGGTTAGTTAATAAACCTGCCTATTTGCTGTCGTGAGAGGCTTATGCAGAATCCCGAT
+
F#FFFFFFFFF,FFFFFFFFFFFF:FFF,,,,,,,,,,:,F,,,F,F,F,,,:F,,,,,F,,:F:F,,,,FF,,,F:FFFFFFFF,FF,,,::F,F,FFF:F,F:FFF,FFF,F,FFF:F,F:,,F,:,FFFFF,,,,FF,,FF,,:FF,F
@A01071:356:HTWLKDSX7:2:1101:24894:1016 1:N:0:GTAGACGAAA+NCCACACTAG
ANCCTATTCGGCGATCGTGGTCCGATGGTTATCAGGGTTTTGTTTCTGGCCCCTAACAGAGAGCCGTGCTCTGTCTCCCAGGCTGGAGTGCAGTGGCACAATCTCAGCTCACTGCACCCTCCACCTTCCGGGTTCACGCAATTTTCCTGCC
NovoSeq 6000, R2:
@A01071:356:HTWLKDSX7:2:1101:24858:1016 2:N:0:GTAGACGAAA+NCCACACTAG
TCATCTGGGAGCCTGTGCCCCTGGGTCCTCGAGGGTCATGGCTTGTCCCTGGTCAGTCCTGTCTGACTGACCTCAGGGCCTCACCTCTCTGCCCTTCCCTGCCCGGTTCCTACTCACCTGGCTAGGGCCAGTGCCCATTTTCAGCCCTACC
+
:,F::FFFFFFFFFFFFFFFFFFFFFFF:FF:FFFFFFFFFFF:FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFF:,FFFFFFFFFFF,:FFF:FF,F,F:FFFFFFFFFFFF:FFFF,F,FFFFFFFF,,FF
@A01071:356:HTWLKDSX7:2:1101:24876:1016 2:N:0:GTAGACGAAA+NCCACACTAG
GAAGCTTATCAAGAAGATGGGTGACCACCTGACCAACCTCCACAGGCTGGGTGGCCCGGAGGCTGGGCTGGGCGAGTATCTCTTCGAAAGGCTCACTCTCAAGCACGACTAAGAGCCTTCTGAGCCCAGCGACTTCTGAAGGGCCCCTTGC
+
FFFFFFFFFFFFFFFF:FFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFF
@A01071:356:HTWLKDSX7:2:1101:24894:1016 2:N:0:GTAGACGAAA+NCCACACTAG
GTCAAAATAGGCTGGGCGTGGTGGCTCACGCCTGTAATCCCTGCACTTTGGGAGGCCAAGGCGGGTAGATTACCTGAGGTCAGGAACTCAAGACCAGCCTGGCCAGCATGGCGAAATCCTGTCTCTACTAAAAATACAAAAATTAGCTGAG
sessionInfo()
> sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Amazon Linux 2
Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.9.so; LAPACK version 3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: UTC
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] stringr_1.5.1 biomaRt_2.58.0 org.Hs.eg.db_3.18.0
[4] AnnotationDbi_1.64.1 IRanges_2.36.0 S4Vectors_0.40.2
[7] Biobase_2.62.0 BiocGenerics_0.48.1 Rsubread_2.16.0
loaded via a namespace (and not attached):
[1] rappdirs_0.3.3 utf8_1.2.4 generics_0.1.3
[4] xml2_1.3.6 bitops_1.0-7 RSQLite_2.3.4
[7] stringi_1.8.3 lattice_0.22-5 hms_1.1.3
[10] digest_0.6.34 magrittr_2.0.3 grid_4.3.2
[13] fastmap_1.1.1 blob_1.2.4 Matrix_1.6-5
[16] progress_1.2.3 GenomeInfoDb_1.38.5 DBI_1.2.1
[19] httr_1.4.7 fansi_1.0.6 XML_3.99-0.16
[22] Biostrings_2.70.1 cli_3.6.2 rlang_1.1.3
[25] crayon_1.5.2 dbplyr_2.4.0 XVector_0.42.0
[28] bit64_4.0.5 cachem_1.0.8 tools_4.3.2
[31] memoise_2.0.1 dplyr_1.1.4 filelock_1.0.3
[34] GenomeInfoDbData_1.2.11 curl_5.2.0 vctrs_0.6.5
[37] R6_2.5.1 png_0.1-8 lifecycle_1.0.4
[40] BiocFileCache_2.10.1 zlibbioc_1.48.0 KEGGREST_1.42.0
[43] bit_4.0.5 pkgconfig_2.0.3 pillar_1.9.0
[46] glue_1.7.0 tibble_3.2.1 tidyselect_1.2.0
[49] compiler_4.3.2 prettyunits_1.2.0 RCurl_1.98-1.14
Can you provide more information about chemistry / protocol used for generating the data that had the error? We are wondering if it was from 3' or 5' gene expression sequencing, or some other sequencing protocols from 10x.
It will be very helpful if you can also provide a few tens of thousands of reads in both R1 and R2 fastq.gz files. This can help us to figure out the type of data and do some tests.