Hi,
Does anyone know of an R function, code, or extension that makes and
stores multiple randomizations of a data matrix quickly? Specifically,
a
function that maintains the sums of columns (samples), or rows
(genes),
or both. Those functions I have found are slow for large data
matrices.
Thank you in advance!
Matt
On Thu, Apr 3, 2008 at 3:06 PM, Matthew R. Helmus <mrhelmus at="" wisc.edu=""> wrote:
> Hi,
> Does anyone know of an R function, code, or extension that makes
and
> stores multiple randomizations of a data matrix quickly?
Specifically, a
> function that maintains the sums of columns (samples), or rows
(genes),
> or both. Those functions I have found are slow for large data
matrices.
Hi, Matt.
You'll probably want to give us some code or at least more of a
description of what you are trying to do. That said, sample() is
useful for doing resampling. colSums() and rowSums() are optimized
for just those computations.
Sean
Hi Sean,
Thanks for your reply. I have a large data matrix (500x50) of values
and
I need to:
1) randomize
2) for each row apply a function
3) calculate the mean of the outputs of the function across the rows
4) store this mean
5) then repeat the loop 10,000 times to produce a vector of 10,000
random means.
I have used the following code, apply(X, 1, sample), to make a
randomized matrix maintaining row sums, but the for loop runs a bit
slow
that creates each random matrix. I have not developed or found code
that
maintains both row and sum totals while randomizing.
Thank in advance!
Matt
Sean Davis wrote:
> On Thu, Apr 3, 2008 at 3:06 PM, Matthew R. Helmus <mrhelmus at="" wisc.edu=""> wrote:
>
>> Hi,
>> Does anyone know of an R function, code, or extension that makes
and
>> stores multiple randomizations of a data matrix quickly?
Specifically, a
>> function that maintains the sums of columns (samples), or rows
(genes),
>> or both. Those functions I have found are slow for large data
matrices.
>>
>
> Hi, Matt.
>
> You'll probably want to give us some code or at least more of a
> description of what you are trying to do. That said, sample() is
> useful for doing resampling. colSums() and rowSums() are optimized
> for just those computations.
>
> Sean
>
On Thu, Apr 3, 2008 at 3:41 PM, Matthew R. Helmus <mrhelmus at="" wisc.edu=""> wrote:
> Hi Sean,
> Thanks for your reply. I have a large data matrix (500x50) of
values and I
> need to:
> 1) randomize
> 2) for each row apply a function
> 3) calculate the mean of the outputs of the function across the
rows
> 4) store this mean
> 5) then repeat the loop 10,000 times to produce a vector of 10,000
random
> means.
>
> I have used the following code, apply(X, 1, sample), to make a
randomized
> matrix maintaining row sums, but the for loop runs a bit slow that
creates
> each random matrix. I have not developed or found code that
maintains both
> row and sum totals while randomizing.
> a <- matrix(rnorm(500*50),nc=50,nr=500)
> for(i in 1:100) {b <- apply(a,1,sample)}
> system.time(for(i in 1:100) {b <- apply(a,1,sample)})
user system elapsed
1.164 0.008 1.170
> system.time(for(i in 1:100) {b <-
a[,sample(ncol(a),replace=FALSE)]})
user system elapsed
0.024 0.000 0.022
So, at least the randomization code can be quite a bit more efficient
than your "apply(X,1,sample)". For 10000 replicates, it runs is just
over 2 seconds. Give this a try and then you may want to let us know
more details if you are still having problems.
Sean