I am aligning RNA-seq data from 36 samples to the CHM13v2.0 reference using subjunc()
:
align.res <- subjunc(index="chm13v2.0_maskedY", readfile1=fwdname, readfile2=revname, output_file=bamname, nthreads = 12)
where fwdname
, revname
, and bamname
represent character vectors with 36 elements.
Total alignment time is 16 minutes per sample on average; of these, 5 minutes are spent on "Global environment is initialised"
for each sample,
where it seems that the index (~18 GB) is loaded into memory. This is also reflected in the working set of the Rgui
process,
which drops to 1.6 GB after completion of a sample, increasing to ~19 GB during preparation and peaking at 21.4 GB during alignment.
My question is now, whether their is a parameter to tell subjunc()
to reuse the index loaded already for the first sample also for alignment of the
subsequent samples.
The environment is as follows:
R version 4.3.0 (2023-04-21 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22621)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Rsubread_2.14.2
loaded via a namespace (and not attached):
[1] compiler_4.3.0 Matrix_1.5-4.1 tools_4.3.0 grid_4.3.0 lattice_0.21-8
P.S.: The ungapped, single-block index was created with buildindex(basename="chm13v2.0_maskedY", reference="chm13v2.0_maskedY.fa.gz", memory=18000)