Entering edit mode
Ludo Pagie
▴
40
@ludo-pagie-6130
Last seen 8.8 years ago
Dear all,
I know I'm on slippery slopes here with stating some R-feature uses
too
much memory. Still I find something odd which hampers me in getting my
job
done. I make a DataFrame from a large integer matrix (rowaxcols = 7e6
x
300) and the process of creating the DataFrame consumes more memory
than I
have. I'm working on a Ubuntu machine.
My question is: Am I overlooking something; can I change my code such
that
memory overhead is more reasonable. Or is there a problem with the
implementation of DataFrame's, or is there another issue?
Thanks for helping me out. Ludo
(smaller) USE CASE:
##################
# load IRanges
library(IRanges)
# create a largish matrix
mm <- matrix(as.integer(NA),1e7,20)
# consumes about 800 Mb
print(object.size(mm), unit="Mb")
# 762.9 Mb
# at this point program top suggests that my R job consumes 900Mb
(VIRT) ?
833Mb (RES) memory. That's reasonable I think
# coerce the matrix into a regular data.frame
df <- as.data.frame(mm)
# also consumes about 800Mb
print(object.size(df), unit="Mb")
# 762.9 Mb
# at this point program top suggests that my R job consumes 1800Mb
(VIRT) ?
1750Mb (RES) memory. That's reasonable I think; about double the size
as
before.
# corerce the same matrix into a DataFrame
DF <- as(mm, "DataFrame")
# also consumes about 800Mb
print(object.size(DF), unit="Mb")
# 762.9 Mb
# but now top says that my R job takes 5500Mb (VIRT) / 5400Mb (RES)
memory!!! That's 3700Mb for coercing a 800Mb object
# sessioninfo
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=nl_NL.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=nl_NL.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=nl_NL.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods
[8] base
other attached packages:
[1] IRanges_1.20.6 BiocGenerics_0.8.0
loaded via a namespace (and not attached):
[1] stats4_3.0.2
########################################################
---
Ludo Pagie
Netherlands Cancer Institute
Gene Regulation (B4)
van Steensel Group
Plesmanlaan 121
1066 CX Amsterdam
The Netherlands
Tel.: ++ 20 512 7986
Fax: ++ 20 669 1383
email: l.pagie@nki.nl
[[alternative HTML version deleted]]