Dear BioConductor community,
I've recently been trying to normalizing large metagenome abundance matrices with the function cpm(y) of edgeR. I get the following error:
Error in .isAllZero(counts) :
long vectors not supported yet: memory.c:3438
Calls: mergeTables -> DGEList -> .isAllZero -> .Call
This is obviously a memory issue and was just wondering if long vectors will be supported in the near future. I usually have no problems with this step, except for this time where this is the largest dataset I've been processing so far.
Thanks!
-J
my edgeR version is: edgeR_3.18.1 with R 3.4.0
Okay it works fine with the devel package. Many thanks!
Well it seems I've got another error downstream - here is part of my code:
y <- DGEList(df, remove.zeros=TRUE)
y <- calcNormFactors(y, method="TMM") # Altough not sure if necessary...
cpms = cpm(y)
cpms = round(cpms, digits=3)
write.table(cpms, outfileCpm, quote=FALSE, sep="\t", row.names=TRUE, col.names=NA)
and in output:
Removing 1241 rows with all zero counts
Error in write.table(cpms, outfileCpm, quote = FALSE, sep = "\t", row.names = TRUE, :
corrupt matrix -- dims not not match length
Calls: mergeTables -> write.table
Execution halted
I have 9,582,472 genes in in the df object, if that's relevant.
Cheers,
I can't reproduce your error. I tried the same code in edgeR 3.19.7 with a 9582472 x 10 count matrix, and all ran fine.
I have a 9582472 x 381 matrix. I can share if needed. Thx!
This seems like a problem with
write.table
than with any edgeR functions. And little wonder - a 9582472 x 381 double-precision matrix occupies 29 GB in memory! Are you sure you want to write this to file?It is huge I concur and I have to admit that I might have to revisit my SOPs as projects (and datasets) are becoming bigger and bigger. For downstream analyses, only subsets of this final normalized matrix will be pulled (i.e. certain genes with selected functions) at a time, so it shouldn't be a problem at that time. I just need to normalize everything together before going forward. I'll try to write the table with
fwrite (data.table)
- any other suggestions will be welcomed.It worked with
fwrite (data.table)
.