I am using the rbsurv packaeg to select surival-assosiated probes. My data was pre-processed microarray data with ~7,000 probes and 192 tumor samples. Data was split in half for modeling and validation(train and test data). I also included in my model a risk predictor, tumor stage, to help model fitting. But I am confused as to how to determine a suitable value for the max.n.genes
and n.iter
argument. I understand that n.seq
is for multi-model fitting and n.fold
for sampling and validation. Here is the code I tried and error message:
# code 1 fit4 <- rbsurv(time=t.train, status=s.train, x=x.train, method="efron", z=z.train, alpha=0.05, gene.ID=rownames(x.train), max.n.genes=30, n.iter=100, n.fold=3, n.seq=6, seed = 1234) Please wait...[1] "Too few genes or samples" Error in rep(i, nrow(out$model)) : invalid 'times' argument # code 2 fit4 <- rbsurv(time=t.train, status=s.train, x=x.train, method="efron", z=z.train, alpha=0.05, gene.ID=rownames(x.train), max.n.genes=20, n.iter=50, n.fold=3, n.seq=6, seed = 1234) Please wait...Error in if ((ncol(x) < 5) | (nrow(x) < 10)) { : argument is of length zero # code 3 fit4 <- rbsurv(time=t.train, status=s.train, x=x.train, method="efron", z=z.train, alpha=0.05, gene.ID=rownames(x.train), max.n.genes=60, n.iter=50, n.fold=3, n.seq=6, seed = 1234) Please wait... Done. # this one ran without any errors
To be brief:
In code 1, max.n.genes=30, n.iter=100, error.
In code 2, max.n.genes=20, n.iter=50, error.
In code 3, max.n.genes=60, n.iter=50, no error.
Though code 3 ran without any error, max.n.gene
is 60, but I want to get a gene signiture model with 5~15 genes or so.
And I don't really understand what n.iter
does and how it affect the modeling process.
Why deos the error happen whenever an samller max.n.genes
is used? How should I determine a optimal value of max.n.gene
and n.iter
?