(Note that I'm taking this back to the mailing list in case others are
interested.)
Orthogonal. One strategy is
Randomly separate the data into training and test (using whatever
proportions you think are appropriate for your size dataset).
On the training set, use the combination of polr+step to find an
optimal
model.
Repeat this lots of times.
Collect data on how often each predictor gets selected in the optimal
model (which depends on the exact composition of the training set).
Also collect data on how well the trained model fits its test data.
(Tricky with an ordinal outcome. Key question is how to weight the
penalties for prediction errors that are off by one ordinal category
as
opposed to two or more categories. Might want something like a
weighted
Cohen's kappa.)
Finally, you need to summarize the cross-validation results to decide
which predictors are the best. With only five possible predictors,
the
idea might be just to combine the out-of-band prediction results for
each of the 2^5 possible model structures (or whatever subset actually
gets selected some number of times.) The other possibility is to
claim
that every time a predictor gets selected in one of the optimal
models,
then it gets credit (or blame) for all of the predictions those models
makes. Intuitively, I prefer the first of these alternatives.
Kevin
On 6/24/2011 12:37 PM, Tim Triche, Jr. wrote:
> you prefer AIC to crossvalidation for model selection? or feel
> they're orthogonal?
>
> thanks for the tip about polr, I had a vague recollection of it, but
> this is the first time I actually read the man page. appreciate
your
> taking the time to send it.
>
> --t
>
>
> On Fri, Jun 24, 2011 at 10:26 AM, Kevin R. Coombes
> <kevin.r.coombes@gmail.com <mailto:kevin.r.coombes@gmail.com="">>
wrote:
>
> The standard MASS package includes the "polr" function to
perform
> ordinal regression. After running polr to fit the base model
with
> all parameters, you can pass the results throught the "step"
> function to use AIC to select the best set of predictors.
>
> Kevin
>
>
> On 6/24/2011 10:38 AM, Tim Triche, Jr. wrote:
>
> You have an ordinal response, so you might consider an
ordered
> probit model
> with interaction terms and a penalized likelihood fit, and
> determine the
> best penalty by cross-validation. I don't recall whether
CMA
> supports
> ordered probit models, but it's probably the best approach,
> and you could
> just brute-force it -- you've only got 120 different models
to
> fit under
> this scheme. At the very least, CMA would generate the
> cross-validation
> sets for you.
>
> You might also want to consider recursively fitting a
shrunken
> LDA model
> (diseased/healthy, moderate/severe) and see how that
compares
> to an ordinal
> model. Regardless, cross-validation is the obvious answer
to
> how to pick
> one.
>
> Hope this helps,
> -t
>
> On Fri, Jun 24, 2011 at 8:24 AM, David
> martin<vilanew@gmail.com <mailto:vilanew@gmail.com="">> wrote:
>
> thanks.
> Is not binary since i have three categories and 5 genes.
I
> have tried LDA
> and stepclass
>
> #LDR stepwise
> disc<-stepclass(Group~ ., data =dataf, method =
> "lda",improvement = 0.001)
>
> where group contains my three categories
> ("healthy","moderate disease",
> "severe disease") and dataf the pcr values for my 5
genes.
>
> The problem i have is that stepwise generates a
different
> signature each
> time (as it randomly picks up a gene to start with)?
This
> is ok for me but
> how many times do you need to run stepclass so that you
> found your mopst
> probable genes that classify your groups , Do i need to
do
> a loop for
> stepclass ???
>
> thanks
>
>
>
> On 06/24/2011 05:17 PM, Kevin R. Coombes wrote:
>
> .. and probably should ...
>
> For a binary classification with only a few
> predictors, you can, for
> example, use logistic regression with some standard
> criterion like AIC,
> BIC, or Bayesian model averaging to decide which
> predictors should be
> retained.
>
> Kevin
>
> On 6/23/2011 6:10 PM, Moshe Olshansky wrote:
>
> If you have just 5 genes and a decent number of
> samples you can use
> any of
> the "conventional" (i.e. not high throughput)
> methods like LDA, trees,
> Random Forest, SVM, etc.
>
> I will have a look at both packages. It's pcr
> data by the way
>
> thanks
>
> On 06/23/2011 05:56 PM, Tim Triche, Jr.
wrote:
>
> or CMA, which is perhaps a more
systematic
> approach for classification.
> (the package name stands for
> Classification of MicroArrays) Very well
> thought out.
>
>
> On Thu, Jun 23, 2011 at 8:02 AM, Sean
> Davis<sdavis2@mail.nih.gov> <mailto:sdavis2@mail.nih.gov>>
> wrote:
>
> On Thu, Jun 23, 2011 at 10:58 AM, David
>
> martin<vilanew@gmail.com> <mailto:vilanew@gmail.com>>
> wrote:
>
> Hi,
> I have 5 genes of interest. I
> would like to know which
> combination(s)
> of
> genes gives the best disease
> separation. Which test could i
use
> in my
> training set to see which
> combination is the best
> classificer between
> my
> disease and my healthy
population.
>
> Thanks for any comment or test
> that could be useful to answer
that
>
> question.
>
> Check out the MLInterfaces package.
It
> should give you some ideas on
> where to start.
>
> Sean
>
>
______________________________**_________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> <mailto:bioconductor@r-project.org>
>
https://stat.ethz.ch/mailman/**listi
nfo/bioconductor<https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">
> Search the archives:
>
http://news.gmane.org/gmane.**scienc
e.biology.informatics.**conductor<http: news.gmane.org="" gmane.science.="" biology.informatics.conductor="">
>
>
>
______________________________**_________________
>
> Bioconductor mailing list
> Bioconductor@r-project.org
> <mailto:bioconductor@r-project.org>
>
https://stat.ethz.ch/mailman/**listinfo/bioc
onductor<https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">
> Search the archives:
>
http://news.gmane.org/gmane.**science.biolog
y.informatics.**conductor<http: news.gmane.org="" gmane.science.biology.="" informatics.conductor="">
>
>
>
______________________________**______________________________**
> __________
> The information in this email is confidential
and
> intend...{{dropped:4}}
>
>
______________________________**_________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> <mailto:bioconductor@r-project.org>
>
https://stat.ethz.ch/mailman/**listinfo/biocondu
ctor<https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">
> Search the archives:
>
http://news.gmane.org/gmane.**science.biology.in
formatics.**conductor<http: news.gmane.org="" gmane.science.biology.info="" rmatics.conductor="">
>
> ______________________________**_________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> <mailto:bioconductor@r-project.org>
>
https://stat.ethz.ch/mailman/**listinfo/bioconductor
<https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">
> Search the archives:
>
http://news.gmane.org/gmane.**science.biology.inform
atics.**conductor<http: news.gmane.org="" gmane.science.biology.informat="" ics.conductor="">
>
>
> ______________________________**_________________
> Bioconductor mailing list
> Bioconductor@r-project.org
<mailto:bioconductor@r-project.org>
>
https://stat.ethz.ch/mailman/**listinfo/bioconductor<htt ps:="" stat.ethz.ch="" mailman="" listinfo="" bioconductor="">
>
>
> Search the archives:
http://news.gmane.org/gmane.**
> science.biology.informatics.**conductor<http: news.gman="" e.org="" gmane.science.biology.informatics.conductor="">
>
>
>
>
>
>
> --
> When you emerge in a few years, you can ask someone what you missed,
> and you'll find it can be summed up in a few minutes.
>
> Derek Sivers <http: sivers.org="" berklee="">
>
[[alternative HTML version deleted]]