Classification

0

Entering edit mode

David ▴ 860

@david-3335

Last seen 7.1 years ago

Hi, I have 5 genes of interest. I would like to know which combination(s) of genes gives the best disease separation. Which test could i use in my training set to see which combination is the best classificer between my disease and my healthy population. Thanks for any comment or test that could be useful to answer that question.

• 2.4k views

ADD COMMENT • link updated 13.8 years ago by Kevin Coombes ▴ 430 • written 13.8 years ago by David ▴ 860

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 9 weeks ago

United States

On Thu, Jun 23, 2011 at 10:58 AM, David martin <vilanew at="" gmail.com=""> wrote: > Hi, > I have 5 genes of interest. I would like to know which combination(s) of > genes gives the best disease separation. Which test could i use in my > training set to see which combination is the best classificer between my > disease and my healthy population. > > Thanks for any comment or test that could be useful to answer that question. Check out the MLInterfaces package. It should give you some ideas on where to start. Sean

ADD COMMENT • link 13.8 years ago Sean Davis 21k

0

Entering edit mode

or CMA, which is perhaps a more systematic approach for classification. (the package name stands for Classification of MicroArrays) Very well thought out. On Thu, Jun 23, 2011 at 8:02 AM, Sean Davis <sdavis2@mail.nih.gov> wrote: > On Thu, Jun 23, 2011 at 10:58 AM, David martin <vilanew@gmail.com> wrote: > > Hi, > > I have 5 genes of interest. I would like to know which combination(s) of > > genes gives the best disease separation. Which test could i use in my > > training set to see which combination is the best classificer between my > > disease and my healthy population. > > > > Thanks for any comment or test that could be useful to answer that > question. > > Check out the MLInterfaces package. It should give you some ideas on > where to start. > > Sean > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- If people do not believe that mathematics is simple, it is only because they do not realize how complicated life is. John von Neumann<http: www-groups.dcs.st-="" and.ac.uk="" ~history="" biographies="" von_neumann.html=""> [[alternative HTML version deleted]]

ADD REPLY • link 13.8 years ago Tim Triche ★ 4.2k

0

Entering edit mode

I will have a look at both packages. It's pcr data by the way thanks On 06/23/2011 05:56 PM, Tim Triche, Jr. wrote: > or CMA, which is perhaps a more systematic approach for classification. > (the package name stands for Classification of MicroArrays) Very well > thought out. > > > On Thu, Jun 23, 2011 at 8:02 AM, Sean Davis<sdavis2 at="" mail.nih.gov=""> wrote: > >> On Thu, Jun 23, 2011 at 10:58 AM, David martin<vilanew at="" gmail.com=""> wrote: >>> Hi, >>> I have 5 genes of interest. I would like to know which combination(s) of >>> genes gives the best disease separation. Which test could i use in my >>> training set to see which combination is the best classificer between my >>> disease and my healthy population. >>> >>> Thanks for any comment or test that could be useful to answer that >> question. >> >> Check out the MLInterfaces package. It should give you some ideas on >> where to start. >> >> Sean >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > >

ADD REPLY • link 13.8 years ago David ▴ 860

0

Entering edit mode

If you have just 5 genes and a decent number of samples you can use any of the "conventional" (i.e. not high throughput) methods like LDA, trees, Random Forest, SVM, etc. > I will have a look at both packages. It's pcr data by the way > thanks > > On 06/23/2011 05:56 PM, Tim Triche, Jr. wrote: >> or CMA, which is perhaps a more systematic approach for classification. >> (the package name stands for Classification of MicroArrays) Very well >> thought out. >> >> >> On Thu, Jun 23, 2011 at 8:02 AM, Sean Davis<sdavis2 at="" mail.nih.gov=""> >> wrote: >> >>> On Thu, Jun 23, 2011 at 10:58 AM, David martin<vilanew at="" gmail.com=""> >>> wrote: >>>> Hi, >>>> I have 5 genes of interest. I would like to know which combination(s) >>>> of >>>> genes gives the best disease separation. Which test could i use in my >>>> training set to see which combination is the best classificer between >>>> my >>>> disease and my healthy population. >>>> >>>> Thanks for any comment or test that could be useful to answer that >>> question. >>> >>> Check out the MLInterfaces package. It should give you some ideas on >>> where to start. >>> >>> Sean >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 13.8 years ago Moshe Olshansky ▴ 260

0

Entering edit mode

.. and probably should ... For a binary classification with only a few predictors, you can, for example, use logistic regression with some standard criterion like AIC, BIC, or Bayesian model averaging to decide which predictors should be retained. Kevin On 6/23/2011 6:10 PM, Moshe Olshansky wrote: > If you have just 5 genes and a decent number of samples you can use any of > the "conventional" (i.e. not high throughput) methods like LDA, trees, > Random Forest, SVM, etc. > >> I will have a look at both packages. It's pcr data by the way >> thanks >> >> On 06/23/2011 05:56 PM, Tim Triche, Jr. wrote: >>> or CMA, which is perhaps a more systematic approach for classification. >>> (the package name stands for Classification of MicroArrays) Very well >>> thought out. >>> >>> >>> On Thu, Jun 23, 2011 at 8:02 AM, Sean Davis<sdavis2 at="" mail.nih.gov=""> >>> wrote: >>> >>>> On Thu, Jun 23, 2011 at 10:58 AM, David martin<vilanew at="" gmail.com=""> >>>> wrote: >>>>> Hi, >>>>> I have 5 genes of interest. I would like to know which combination(s) >>>>> of >>>>> genes gives the best disease separation. Which test could i use in my >>>>> training set to see which combination is the best classificer between >>>>> my >>>>> disease and my healthy population. >>>>> >>>>> Thanks for any comment or test that could be useful to answer that >>>> question. >>>> >>>> Check out the MLInterfaces package. It should give you some ideas on >>>> where to start. >>>> >>>> Sean >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > ______________________________________________________________________ > The information in this email is confidential and intend...{{dropped:4}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 13.8 years ago Kevin Coombes ▴ 430

0

Entering edit mode

thanks. Is not binary since i have three categories and 5 genes. I have tried LDA and stepclass #LDR stepwise disc<-stepclass(Group~ ., data =dataf, method = "lda",improvement = 0.001) where group contains my three categories ("healthy","moderate disease", "severe disease") and dataf the pcr values for my 5 genes. The problem i have is that stepwise generates a different signature each time (as it randomly picks up a gene to start with)? This is ok for me but how many times do you need to run stepclass so that you found your mopst probable genes that classify your groups , Do i need to do a loop for stepclass ??? thanks On 06/24/2011 05:17 PM, Kevin R. Coombes wrote: > .. and probably should ... > > For a binary classification with only a few predictors, you can, for > example, use logistic regression with some standard criterion like AIC, > BIC, or Bayesian model averaging to decide which predictors should be > retained. > > Kevin > > On 6/23/2011 6:10 PM, Moshe Olshansky wrote: >> If you have just 5 genes and a decent number of samples you can use >> any of >> the "conventional" (i.e. not high throughput) methods like LDA, trees, >> Random Forest, SVM, etc. >> >>> I will have a look at both packages. It's pcr data by the way >>> thanks >>> >>> On 06/23/2011 05:56 PM, Tim Triche, Jr. wrote: >>>> or CMA, which is perhaps a more systematic approach for classification. >>>> (the package name stands for Classification of MicroArrays) Very well >>>> thought out. >>>> >>>> >>>> On Thu, Jun 23, 2011 at 8:02 AM, Sean >>>> Davis<sdavis2 at="" mail.nih.gov=""> >>>> wrote: >>>> >>>>> On Thu, Jun 23, 2011 at 10:58 AM, David >>>>> martin<vilanew at="" gmail.com=""> >>>>> wrote: >>>>>> Hi, >>>>>> I have 5 genes of interest. I would like to know which combination(s) >>>>>> of >>>>>> genes gives the best disease separation. Which test could i use in my >>>>>> training set to see which combination is the best classificer between >>>>>> my >>>>>> disease and my healthy population. >>>>>> >>>>>> Thanks for any comment or test that could be useful to answer that >>>>> question. >>>>> >>>>> Check out the MLInterfaces package. It should give you some ideas on >>>>> where to start. >>>>> >>>>> Sean >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>> >>>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> >> ______________________________________________________________________ >> The information in this email is confidential and intend...{{dropped:4}} >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 13.8 years ago David ▴ 860

0

Entering edit mode

You have an ordinal response, so you might consider an ordered probit model with interaction terms and a penalized likelihood fit, and determine the best penalty by cross-validation. I don't recall whether CMA supports ordered probit models, but it's probably the best approach, and you could just brute-force it -- you've only got 120 different models to fit under this scheme. At the very least, CMA would generate the cross- validation sets for you. You might also want to consider recursively fitting a shrunken LDA model (diseased/healthy, moderate/severe) and see how that compares to an ordinal model. Regardless, cross-validation is the obvious answer to how to pick one. Hope this helps, -t On Fri, Jun 24, 2011 at 8:24 AM, David martin <vilanew@gmail.com> wrote: > thanks. > Is not binary since i have three categories and 5 genes. I have tried LDA > and stepclass > > #LDR stepwise > disc<-stepclass(Group~ ., data =dataf, method = "lda",improvement = 0.001) > > where group contains my three categories ("healthy","moderate disease", > "severe disease") and dataf the pcr values for my 5 genes. > > The problem i have is that stepwise generates a different signature each > time (as it randomly picks up a gene to start with)? This is ok for me but > how many times do you need to run stepclass so that you found your mopst > probable genes that classify your groups , Do i need to do a loop for > stepclass ??? > > thanks > > > > On 06/24/2011 05:17 PM, Kevin R. Coombes wrote: > >> .. and probably should ... >> >> For a binary classification with only a few predictors, you can, for >> example, use logistic regression with some standard criterion like AIC, >> BIC, or Bayesian model averaging to decide which predictors should be >> retained. >> >> Kevin >> >> On 6/23/2011 6:10 PM, Moshe Olshansky wrote: >> >>> If you have just 5 genes and a decent number of samples you can use >>> any of >>> the "conventional" (i.e. not high throughput) methods like LDA, trees, >>> Random Forest, SVM, etc. >>> >>> I will have a look at both packages. It's pcr data by the way >>>> thanks >>>> >>>> On 06/23/2011 05:56 PM, Tim Triche, Jr. wrote: >>>> >>>>> or CMA, which is perhaps a more systematic approach for classification. >>>>> (the package name stands for Classification of MicroArrays) Very well >>>>> thought out. >>>>> >>>>> >>>>> On Thu, Jun 23, 2011 at 8:02 AM, Sean >>>>> Davis<sdavis2@mail.nih.gov> >>>>> wrote: >>>>> >>>>> On Thu, Jun 23, 2011 at 10:58 AM, David >>>>>> martin<vilanew@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> I have 5 genes of interest. I would like to know which combination(s) >>>>>>> of >>>>>>> genes gives the best disease separation. Which test could i use in my >>>>>>> training set to see which combination is the best classificer between >>>>>>> my >>>>>>> disease and my healthy population. >>>>>>> >>>>>>> Thanks for any comment or test that could be useful to answer that >>>>>>> >>>>>> question. >>>>>> >>>>>> Check out the MLInterfaces package. It should give you some ideas on >>>>>> where to start. >>>>>> >>>>>> Sean >>>>>> >>>>>> ______________________________**_________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor@r-project.org >>>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: st="" at.ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.**science.biology.informatics.**con ductor<http: news.gmane.org="" gmane.science.biology.informatics.conduct="" or=""> >>>>>> >>>>>> >>>>> >>>>> ______________________________**_________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>> Search the archives: >>>> http://news.gmane.org/gmane.**science.biology.informatics.**condu ctor<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>>> >>>> >>> >>> ______________________________**______________________________** >>> __________ >>> The information in this email is confidential and intend...{{dropped:4}} >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: >>> http://news.gmane.org/gmane.**science.biology.informatics.**conduc tor<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>> >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: >> http://news.gmane.org/gmane.**science.biology.informatics.**conduct or<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >> >> > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > -- If people do not believe that mathematics is simple, it is only because they do not realize how complicated life is. John von Neumann<http: www-groups.dcs.st-="" and.ac.uk="" ~history="" biographies="" von_neumann.html=""> [[alternative HTML version deleted]]

ADD REPLY • link 13.8 years ago Tim Triche ★ 4.2k

0

Entering edit mode

The standard MASS package includes the "polr" function to perform ordinal regression. After running polr to fit the base model with all parameters, you can pass the results throught the "step" function to use AIC to select the best set of predictors. Kevin On 6/24/2011 10:38 AM, Tim Triche, Jr. wrote: > You have an ordinal response, so you might consider an ordered probit model > with interaction terms and a penalized likelihood fit, and determine the > best penalty by cross-validation. I don't recall whether CMA supports > ordered probit models, but it's probably the best approach, and you could > just brute-force it -- you've only got 120 different models to fit under > this scheme. At the very least, CMA would generate the cross- validation > sets for you. > > You might also want to consider recursively fitting a shrunken LDA model > (diseased/healthy, moderate/severe) and see how that compares to an ordinal > model. Regardless, cross-validation is the obvious answer to how to pick > one. > > Hope this helps, > -t > > On Fri, Jun 24, 2011 at 8:24 AM, David martin<vilanew at="" gmail.com=""> wrote: > >> thanks. >> Is not binary since i have three categories and 5 genes. I have tried LDA >> and stepclass >> >> #LDR stepwise >> disc<-stepclass(Group~ ., data =dataf, method = "lda",improvement = 0.001) >> >> where group contains my three categories ("healthy","moderate disease", >> "severe disease") and dataf the pcr values for my 5 genes. >> >> The problem i have is that stepwise generates a different signature each >> time (as it randomly picks up a gene to start with)? This is ok for me but >> how many times do you need to run stepclass so that you found your mopst >> probable genes that classify your groups , Do i need to do a loop for >> stepclass ??? >> >> thanks >> >> >> >> On 06/24/2011 05:17 PM, Kevin R. Coombes wrote: >> >>> .. and probably should ... >>> >>> For a binary classification with only a few predictors, you can, for >>> example, use logistic regression with some standard criterion like AIC, >>> BIC, or Bayesian model averaging to decide which predictors should be >>> retained. >>> >>> Kevin >>> >>> On 6/23/2011 6:10 PM, Moshe Olshansky wrote: >>> >>>> If you have just 5 genes and a decent number of samples you can use >>>> any of >>>> the "conventional" (i.e. not high throughput) methods like LDA, trees, >>>> Random Forest, SVM, etc. >>>> >>>> I will have a look at both packages. It's pcr data by the way >>>>> thanks >>>>> >>>>> On 06/23/2011 05:56 PM, Tim Triche, Jr. wrote: >>>>> >>>>>> or CMA, which is perhaps a more systematic approach for classification. >>>>>> (the package name stands for Classification of MicroArrays) Very well >>>>>> thought out. >>>>>> >>>>>> >>>>>> On Thu, Jun 23, 2011 at 8:02 AM, Sean >>>>>> Davis<sdavis2 at="" mail.nih.gov=""> >>>>>> wrote: >>>>>> >>>>>> On Thu, Jun 23, 2011 at 10:58 AM, David >>>>>>> martin<vilanew at="" gmail.com=""> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> I have 5 genes of interest. I would like to know which combination(s) >>>>>>>> of >>>>>>>> genes gives the best disease separation. Which test could i use in my >>>>>>>> training set to see which combination is the best classificer between >>>>>>>> my >>>>>>>> disease and my healthy population. >>>>>>>> >>>>>>>> Thanks for any comment or test that could be useful to answer that >>>>>>>> >>>>>>> question. >>>>>>> >>>>>>> Check out the MLInterfaces package. It should give you some ideas on >>>>>>> where to start. >>>>>>> >>>>>>> Sean >>>>>>> >>>>>>> ______________________________**_________________ >>>>>>> Bioconductor mailing list >>>>>>> Bioconductor at r-project.org >>>>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: s="" tat.ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>>>>> Search the archives: >>>>>>> http://news.gmane.org/gmane.**science.biology.informatics.**co nductor<http: news.gmane.org="" gmane.science.biology.informatics.conduc="" tor=""> >>>>>>> >>>>>>> >>>>>> ______________________________**_________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: sta="" t.ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.**science.biology.informatics.**cond uctor<http: news.gmane.org="" gmane.science.biology.informatics.conducto="" r=""> >>>>> >>>>> >>>> ______________________________**______________________________** >>>> __________ >>>> The information in this email is confidential and intend...{{dropped:4}} >>>> >>>> ______________________________**_________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>> Search the archives: >>>> http://news.gmane.org/gmane.**science.biology.informatics.**condu ctor<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: >>> http://news.gmane.org/gmane.**science.biology.informatics.**conduc tor<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>> >>> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > >

ADD REPLY • link 13.8 years ago Kevin Coombes ▴ 430

0

Entering edit mode

Agree , need crossvalidation !!! thanks for your comments. On 06/24/2011 05:38 PM, Tim Triche, Jr. wrote: > You have an ordinal response, so you might consider an ordered probit model > with interaction terms and a penalized likelihood fit, and determine the > best penalty by cross-validation. I don't recall whether CMA supports > ordered probit models, but it's probably the best approach, and you could > just brute-force it -- you've only got 120 different models to fit under > this scheme. At the very least, CMA would generate the cross- validation > sets for you. > > You might also want to consider recursively fitting a shrunken LDA model > (diseased/healthy, moderate/severe) and see how that compares to an ordinal > model. Regardless, cross-validation is the obvious answer to how to pick > one. > > Hope this helps, > -t > > On Fri, Jun 24, 2011 at 8:24 AM, David martin<vilanew at="" gmail.com=""> wrote: > >> thanks. >> Is not binary since i have three categories and 5 genes. I have tried LDA >> and stepclass >> >> #LDR stepwise >> disc<-stepclass(Group~ ., data =dataf, method = "lda",improvement = 0.001) >> >> where group contains my three categories ("healthy","moderate disease", >> "severe disease") and dataf the pcr values for my 5 genes. >> >> The problem i have is that stepwise generates a different signature each >> time (as it randomly picks up a gene to start with)? This is ok for me but >> how many times do you need to run stepclass so that you found your mopst >> probable genes that classify your groups , Do i need to do a loop for >> stepclass ??? >> >> thanks >> >> >> >> On 06/24/2011 05:17 PM, Kevin R. Coombes wrote: >> >>> .. and probably should ... >>> >>> For a binary classification with only a few predictors, you can, for >>> example, use logistic regression with some standard criterion like AIC, >>> BIC, or Bayesian model averaging to decide which predictors should be >>> retained. >>> >>> Kevin >>> >>> On 6/23/2011 6:10 PM, Moshe Olshansky wrote: >>> >>>> If you have just 5 genes and a decent number of samples you can use >>>> any of >>>> the "conventional" (i.e. not high throughput) methods like LDA, trees, >>>> Random Forest, SVM, etc. >>>> >>>> I will have a look at both packages. It's pcr data by the way >>>>> thanks >>>>> >>>>> On 06/23/2011 05:56 PM, Tim Triche, Jr. wrote: >>>>> >>>>>> or CMA, which is perhaps a more systematic approach for classification. >>>>>> (the package name stands for Classification of MicroArrays) Very well >>>>>> thought out. >>>>>> >>>>>> >>>>>> On Thu, Jun 23, 2011 at 8:02 AM, Sean >>>>>> Davis<sdavis2 at="" mail.nih.gov=""> >>>>>> wrote: >>>>>> >>>>>> On Thu, Jun 23, 2011 at 10:58 AM, David >>>>>>> martin<vilanew at="" gmail.com=""> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> I have 5 genes of interest. I would like to know which combination(s) >>>>>>>> of >>>>>>>> genes gives the best disease separation. Which test could i use in my >>>>>>>> training set to see which combination is the best classificer between >>>>>>>> my >>>>>>>> disease and my healthy population. >>>>>>>> >>>>>>>> Thanks for any comment or test that could be useful to answer that >>>>>>>> >>>>>>> question. >>>>>>> >>>>>>> Check out the MLInterfaces package. It should give you some ideas on >>>>>>> where to start. >>>>>>> >>>>>>> Sean >>>>>>> >>>>>>> ______________________________**_________________ >>>>>>> Bioconductor mailing list >>>>>>> Bioconductor at r-project.org >>>>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: s="" tat.ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>>>>> Search the archives: >>>>>>> http://news.gmane.org/gmane.**science.biology.informatics.**co nductor<http: news.gmane.org="" gmane.science.biology.informatics.conduc="" tor=""> >>>>>>> >>>>>>> >>>>>> >>>>>> ______________________________**_________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: sta="" t.ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.**science.biology.informatics.**cond uctor<http: news.gmane.org="" gmane.science.biology.informatics.conducto="" r=""> >>>>> >>>>> >>>> >>>> ______________________________**______________________________** >>>> __________ >>>> The information in this email is confidential and intend...{{dropped:4}} >>>> >>>> ______________________________**_________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>> Search the archives: >>>> http://news.gmane.org/gmane.**science.biology.informatics.**condu ctor<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>>> >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: >>> http://news.gmane.org/gmane.**science.biology.informatics.**conduc tor<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>> >>> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > >

ADD REPLY • link 13.8 years ago David ▴ 860

0

Entering edit mode

Kevin Coombes ▴ 430

@kevin-coombes-3935

Last seen 2.5 years ago

United States

(Note that I'm taking this back to the mailing list in case others are interested.) Orthogonal. One strategy is Randomly separate the data into training and test (using whatever proportions you think are appropriate for your size dataset). On the training set, use the combination of polr+step to find an optimal model. Repeat this lots of times. Collect data on how often each predictor gets selected in the optimal model (which depends on the exact composition of the training set). Also collect data on how well the trained model fits its test data. (Tricky with an ordinal outcome. Key question is how to weight the penalties for prediction errors that are off by one ordinal category as opposed to two or more categories. Might want something like a weighted Cohen's kappa.) Finally, you need to summarize the cross-validation results to decide which predictors are the best. With only five possible predictors, the idea might be just to combine the out-of-band prediction results for each of the 2^5 possible model structures (or whatever subset actually gets selected some number of times.) The other possibility is to claim that every time a predictor gets selected in one of the optimal models, then it gets credit (or blame) for all of the predictions those models makes. Intuitively, I prefer the first of these alternatives. Kevin On 6/24/2011 12:37 PM, Tim Triche, Jr. wrote: > you prefer AIC to crossvalidation for model selection? or feel > they're orthogonal? > > thanks for the tip about polr, I had a vague recollection of it, but > this is the first time I actually read the man page. appreciate your > taking the time to send it. > > --t > > > On Fri, Jun 24, 2011 at 10:26 AM, Kevin R. Coombes > <kevin.r.coombes@gmail.com <mailto:kevin.r.coombes@gmail.com="">> wrote: > > The standard MASS package includes the "polr" function to perform > ordinal regression. After running polr to fit the base model with > all parameters, you can pass the results throught the "step" > function to use AIC to select the best set of predictors. > > Kevin > > > On 6/24/2011 10:38 AM, Tim Triche, Jr. wrote: > > You have an ordinal response, so you might consider an ordered > probit model > with interaction terms and a penalized likelihood fit, and > determine the > best penalty by cross-validation. I don't recall whether CMA > supports > ordered probit models, but it's probably the best approach, > and you could > just brute-force it -- you've only got 120 different models to > fit under > this scheme. At the very least, CMA would generate the > cross-validation > sets for you. > > You might also want to consider recursively fitting a shrunken > LDA model > (diseased/healthy, moderate/severe) and see how that compares > to an ordinal > model. Regardless, cross-validation is the obvious answer to > how to pick > one. > > Hope this helps, > -t > > On Fri, Jun 24, 2011 at 8:24 AM, David > martin<vilanew@gmail.com <mailto:vilanew@gmail.com="">> wrote: > > thanks. > Is not binary since i have three categories and 5 genes. I > have tried LDA > and stepclass > > #LDR stepwise > disc<-stepclass(Group~ ., data =dataf, method = > "lda",improvement = 0.001) > > where group contains my three categories > ("healthy","moderate disease", > "severe disease") and dataf the pcr values for my 5 genes. > > The problem i have is that stepwise generates a different > signature each > time (as it randomly picks up a gene to start with)? This > is ok for me but > how many times do you need to run stepclass so that you > found your mopst > probable genes that classify your groups , Do i need to do > a loop for > stepclass ??? > > thanks > > > > On 06/24/2011 05:17 PM, Kevin R. Coombes wrote: > > .. and probably should ... > > For a binary classification with only a few > predictors, you can, for > example, use logistic regression with some standard > criterion like AIC, > BIC, or Bayesian model averaging to decide which > predictors should be > retained. > > Kevin > > On 6/23/2011 6:10 PM, Moshe Olshansky wrote: > > If you have just 5 genes and a decent number of > samples you can use > any of > the "conventional" (i.e. not high throughput) > methods like LDA, trees, > Random Forest, SVM, etc. > > I will have a look at both packages. It's pcr > data by the way > > thanks > > On 06/23/2011 05:56 PM, Tim Triche, Jr. wrote: > > or CMA, which is perhaps a more systematic > approach for classification. > (the package name stands for > Classification of MicroArrays) Very well > thought out. > > > On Thu, Jun 23, 2011 at 8:02 AM, Sean > Davis<sdavis2@mail.nih.gov> <mailto:sdavis2@mail.nih.gov>> > wrote: > > On Thu, Jun 23, 2011 at 10:58 AM, David > > martin<vilanew@gmail.com> <mailto:vilanew@gmail.com>> > wrote: > > Hi, > I have 5 genes of interest. I > would like to know which > combination(s) > of > genes gives the best disease > separation. Which test could i use > in my > training set to see which > combination is the best > classificer between > my > disease and my healthy population. > > Thanks for any comment or test > that could be useful to answer that > > question. > > Check out the MLInterfaces package. It > should give you some ideas on > where to start. > > Sean > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > <mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/**listi nfo/bioconductor<https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: > http://news.gmane.org/gmane.**scienc e.biology.informatics.**conductor<http: news.gmane.org="" gmane.science.="" biology.informatics.conductor=""> > > > ______________________________**_________________ > > Bioconductor mailing list > Bioconductor@r-project.org > <mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/**listinfo/bioc onductor<https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: > http://news.gmane.org/gmane.**science.biolog y.informatics.**conductor<http: news.gmane.org="" gmane.science.biology.="" informatics.conductor=""> > > > ______________________________**______________________________** > __________ > The information in this email is confidential and > intend...{{dropped:4}} > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > <mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/**listinfo/biocondu ctor<https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: > http://news.gmane.org/gmane.**science.biology.in formatics.**conductor<http: news.gmane.org="" gmane.science.biology.info="" rmatics.conductor=""> > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > <mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/**listinfo/bioconductor <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: > http://news.gmane.org/gmane.**science.biology.inform atics.**conductor<http: news.gmane.org="" gmane.science.biology.informat="" ics.conductor=""> > > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/**listinfo/bioconductor<htt ps:="" stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > > > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gman="" e.org="" gmane.science.biology.informatics.conductor=""> > > > > > > > -- > When you emerge in a few years, you can ask someone what you missed, > and you'll find it can be summed up in a few minutes. > > Derek Sivers <http: sivers.org="" berklee=""> > [[alternative HTML version deleted]]

ADD COMMENT • link 13.8 years ago Kevin Coombes ▴ 430

0

Entering edit mode

Kellie Archer at VCU has done some work with weighting ordinal model selection in exactly this manner. rPart for recursive partitioning, for example, since in a model with (say) categories ranging from "progressive disease" to "complete remission", it's a lot smaller sin for the model to guess "partial remission" in a patient who experiences a complete remission, than it is for the model to guess "progressive disease" (opposite end of the scale). More recently she has been doing the same for e.g. Lasso fits: http://cran.r-project.org/web/packages/glmnetcr/index.html On Fri, Jun 24, 2011 at 11:21 AM, Kevin R. Coombes < kevin.r.coombes@gmail.com> wrote: > ** > (Note that I'm taking this back to the mailing list in case others are > interested.) > > Orthogonal. One strategy is > > Randomly separate the data into training and test (using whatever > proportions you think are appropriate for your size dataset). > On the training set, use the combination of polr+step to find an optimal > model. > Repeat this lots of times. > Collect data on how often each predictor gets selected in the optimal model > (which depends on the exact composition of the training set). > Also collect data on how well the trained model fits its test data. (Tricky > with an ordinal outcome. Key question is how to weight the penalties for > prediction errors that are off by one ordinal category as opposed to two or > more categories. Might want something like a weighted Cohen's kappa.) > > Finally, you need to summarize the cross-validation results to decide which > predictors are the best. With only five possible predictors, the idea > might be just to combine the out-of-band prediction results for each of the > 2^5 possible model structures (or whatever subset actually gets selected > some number of times.) The other possibility is to claim that every time a > predictor gets selected in one of the optimal models, then it gets credit > (or blame) for all of the predictions those models makes. Intuitively, I > prefer the first of these alternatives. > > Kevin > > > On 6/24/2011 12:37 PM, Tim Triche, Jr. wrote: > > you prefer AIC to crossvalidation for model selection? or feel they're > orthogonal? > > thanks for the tip about polr, I had a vague recollection of it, but this > is the first time I actually read the man page. appreciate your taking the > time to send it. > > --t > > > On Fri, Jun 24, 2011 at 10:26 AM, Kevin R. Coombes < > kevin.r.coombes@gmail.com> wrote: > >> The standard MASS package includes the "polr" function to perform ordinal >> regression. After running polr to fit the base model with all parameters, >> you can pass the results throught the "step" function to use AIC to select >> the best set of predictors. >> >> Kevin >> >> >> On 6/24/2011 10:38 AM, Tim Triche, Jr. wrote: >> >>> You have an ordinal response, so you might consider an ordered probit >>> model >>> with interaction terms and a penalized likelihood fit, and determine the >>> best penalty by cross-validation. I don't recall whether CMA supports >>> ordered probit models, but it's probably the best approach, and you could >>> just brute-force it -- you've only got 120 different models to fit under >>> this scheme. At the very least, CMA would generate the cross- validation >>> sets for you. >>> >>> You might also want to consider recursively fitting a shrunken LDA model >>> (diseased/healthy, moderate/severe) and see how that compares to an >>> ordinal >>> model. Regardless, cross-validation is the obvious answer to how to pick >>> one. >>> >>> Hope this helps, >>> -t >>> >>> On Fri, Jun 24, 2011 at 8:24 AM, David martin<vilanew@gmail.com> wrote: >>> >>> thanks. >>>> Is not binary since i have three categories and 5 genes. I have tried >>>> LDA >>>> and stepclass >>>> >>>> #LDR stepwise >>>> disc<-stepclass(Group~ ., data =dataf, method = "lda",improvement = >>>> 0.001) >>>> >>>> where group contains my three categories ("healthy","moderate disease", >>>> "severe disease") and dataf the pcr values for my 5 genes. >>>> >>>> The problem i have is that stepwise generates a different signature each >>>> time (as it randomly picks up a gene to start with)? This is ok for me >>>> but >>>> how many times do you need to run stepclass so that you found your mopst >>>> probable genes that classify your groups , Do i need to do a loop for >>>> stepclass ??? >>>> >>>> thanks >>>> >>>> >>>> >>>> On 06/24/2011 05:17 PM, Kevin R. Coombes wrote: >>>> >>>> .. and probably should ... >>>>> >>>>> For a binary classification with only a few predictors, you can, for >>>>> example, use logistic regression with some standard criterion like AIC, >>>>> BIC, or Bayesian model averaging to decide which predictors should be >>>>> retained. >>>>> >>>>> Kevin >>>>> >>>>> On 6/23/2011 6:10 PM, Moshe Olshansky wrote: >>>>> >>>>> If you have just 5 genes and a decent number of samples you can use >>>>>> any of >>>>>> the "conventional" (i.e. not high throughput) methods like LDA, trees, >>>>>> Random Forest, SVM, etc. >>>>>> >>>>>> I will have a look at both packages. It's pcr data by the way >>>>>> >>>>>>> thanks >>>>>>> >>>>>>> On 06/23/2011 05:56 PM, Tim Triche, Jr. wrote: >>>>>>> >>>>>>> or CMA, which is perhaps a more systematic approach for >>>>>>>> classification. >>>>>>>> (the package name stands for Classification of MicroArrays) Very >>>>>>>> well >>>>>>>> thought out. >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jun 23, 2011 at 8:02 AM, Sean >>>>>>>> Davis<sdavis2@mail.nih.gov> >>>>>>>> wrote: >>>>>>>> >>>>>>>> On Thu, Jun 23, 2011 at 10:58 AM, David >>>>>>>> >>>>>>>>> martin<vilanew@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>>> I have 5 genes of interest. I would like to know which >>>>>>>>>> combination(s) >>>>>>>>>> of >>>>>>>>>> genes gives the best disease separation. Which test could i use in >>>>>>>>>> my >>>>>>>>>> training set to see which combination is the best classificer >>>>>>>>>> between >>>>>>>>>> my >>>>>>>>>> disease and my healthy population. >>>>>>>>>> >>>>>>>>>> Thanks for any comment or test that could be useful to answer that >>>>>>>>>> >>>>>>>>>> question. >>>>>>>>> >>>>>>>>> Check out the MLInterfaces package. It should give you some ideas >>>>>>>>> on >>>>>>>>> where to start. >>>>>>>>> >>>>>>>>> Sean >>>>>>>>> >>>>>>>>> ______________________________**_________________ >>>>>>>>> Bioconductor mailing list >>>>>>>>> Bioconductor@r-project.org >>>>>>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor< >>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor> >>>>>>>>> Search the archives: >>>>>>>>> >>>>>>>>> http://news.gmane.org/gmane.**science.biology.informatics.**conductor >>>>>>>>> <http: news.gmane.org="" gmane.science.biology.informatics.conductor="">>>>>>>>> > >>>>>>>>> >>>>>>>>> >>>>>>>>> ______________________________**_________________ >>>>>>>> >>>>>>> Bioconductor mailing list >>>>>>> Bioconductor@r-project.org >>>>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor< >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor> >>>>>>> Search the archives: >>>>>>> http://news.gmane.org/gmane.**science.biology.informatics.**conductor >>>>>>> <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>>>>>> >>>>>>> >>>>>>> ______________________________**______________________________** >>>>>> __________ >>>>>> The information in this email is confidential and >>>>>> intend...{{dropped:4}} >>>>>> >>>>>> ______________________________**_________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor@r-project.org >>>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor< >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor> >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.**science.biology.informatics.**conductor >>>>>> <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>>>>> >>>>>> ______________________________**_________________ >>>>> Bioconductor mailing list >>>>> Bioconductor@r-project.org >>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor< >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor> >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.**science.biology.informatics.**conductor< >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor> >>>>> >>>>> >>>>> ______________________________**_________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor< >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor> >>>> >>>> Search the archives: http://news.gmane.org/gmane.** >>>> science.biology.informatics.**conductor< >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor> >>>> >>>> >>> >>> > > > -- > When you emerge in a few years, you can ask someone what you missed, and > you'll find it can be summed up in a few minutes. > > Derek Sivers <http: sivers.org="" berklee=""> > > -- When you emerge in a few years, you can ask someone what you missed, and you'll find it can be summed up in a few minutes. Derek Sivers <http: sivers.org="" berklee=""> [[alternative HTML version deleted]]

ADD REPLY • link 13.8 years ago Tim ▴ 160

0

Entering edit mode

Thanks for great discussions. I'm just wondering if someone has already a tutorial or R script that runs this pipeline ? could easily be adapted i guess. On 06/24/2011 09:07 PM, Tim Triche, Jr. wrote: > Kellie Archer at VCU has done some work with weighting ordinal model > selection in exactly this manner. rPart for recursive partitioning, for > example, since in a model with (say) categories ranging from "progressive > disease" to "complete remission", it's a lot smaller sin for the model to > guess "partial remission" in a patient who experiences a complete remission, > than it is for the model to guess "progressive disease" (opposite end of the > scale). More recently she has been doing the same for e.g. Lasso fits: > > http://cran.r-project.org/web/packages/glmnetcr/index.html > > > > On Fri, Jun 24, 2011 at 11:21 AM, Kevin R. Coombes< > kevin.r.coombes at gmail.com> wrote: > >> ** >> (Note that I'm taking this back to the mailing list in case others are >> interested.) >> >> Orthogonal. One strategy is >> >> Randomly separate the data into training and test (using whatever >> proportions you think are appropriate for your size dataset). >> On the training set, use the combination of polr+step to find an optimal >> model. >> Repeat this lots of times. >> Collect data on how often each predictor gets selected in the optimal model >> (which depends on the exact composition of the training set). >> Also collect data on how well the trained model fits its test data. (Tricky >> with an ordinal outcome. Key question is how to weight the penalties for >> prediction errors that are off by one ordinal category as opposed to two or >> more categories. Might want something like a weighted Cohen's kappa.) >> >> Finally, you need to summarize the cross-validation results to decide which >> predictors are the best. With only five possible predictors, the idea >> might be just to combine the out-of-band prediction results for each of the >> 2^5 possible model structures (or whatever subset actually gets selected >> some number of times.) The other possibility is to claim that every time a >> predictor gets selected in one of the optimal models, then it gets credit >> (or blame) for all of the predictions those models makes. Intuitively, I >> prefer the first of these alternatives. >> >> Kevin >> >> >> On 6/24/2011 12:37 PM, Tim Triche, Jr. wrote: >> >> you prefer AIC to crossvalidation for model selection? or feel they're >> orthogonal? >> >> thanks for the tip about polr, I had a vague recollection of it, but this >> is the first time I actually read the man page. appreciate your taking the >> time to send it. >> >> --t >> >> >> On Fri, Jun 24, 2011 at 10:26 AM, Kevin R. Coombes< >> kevin.r.coombes at gmail.com> wrote: >> >>> The standard MASS package includes the "polr" function to perform ordinal >>> regression. After running polr to fit the base model with all parameters, >>> you can pass the results throught the "step" function to use AIC to select >>> the best set of predictors. >>> >>> Kevin >>> >>> >>> On 6/24/2011 10:38 AM, Tim Triche, Jr. wrote: >>> >>>> You have an ordinal response, so you might consider an ordered probit >>>> model >>>> with interaction terms and a penalized likelihood fit, and determine the >>>> best penalty by cross-validation. I don't recall whether CMA supports >>>> ordered probit models, but it's probably the best approach, and you could >>>> just brute-force it -- you've only got 120 different models to fit under >>>> this scheme. At the very least, CMA would generate the cross- validation >>>> sets for you. >>>> >>>> You might also want to consider recursively fitting a shrunken LDA model >>>> (diseased/healthy, moderate/severe) and see how that compares to an >>>> ordinal >>>> model. Regardless, cross-validation is the obvious answer to how to pick >>>> one. >>>> >>>> Hope this helps, >>>> -t >>>> >>>> On Fri, Jun 24, 2011 at 8:24 AM, David martin<vilanew at="" gmail.com=""> wrote: >>>> >>>> thanks. >>>>> Is not binary since i have three categories and 5 genes. I have tried >>>>> LDA >>>>> and stepclass >>>>> >>>>> #LDR stepwise >>>>> disc<-stepclass(Group~ ., data =dataf, method = "lda",improvement = >>>>> 0.001) >>>>> >>>>> where group contains my three categories ("healthy","moderate disease", >>>>> "severe disease") and dataf the pcr values for my 5 genes. >>>>> >>>>> The problem i have is that stepwise generates a different signature each >>>>> time (as it randomly picks up a gene to start with)? This is ok for me >>>>> but >>>>> how many times do you need to run stepclass so that you found your mopst >>>>> probable genes that classify your groups , Do i need to do a loop for >>>>> stepclass ??? >>>>> >>>>> thanks >>>>> >>>>> >>>>> >>>>> On 06/24/2011 05:17 PM, Kevin R. Coombes wrote: >>>>> >>>>> .. and probably should ... >>>>>> >>>>>> For a binary classification with only a few predictors, you can, for >>>>>> example, use logistic regression with some standard criterion like AIC, >>>>>> BIC, or Bayesian model averaging to decide which predictors should be >>>>>> retained. >>>>>> >>>>>> Kevin >>>>>> >>>>>> On 6/23/2011 6:10 PM, Moshe Olshansky wrote: >>>>>> >>>>>> If you have just 5 genes and a decent number of samples you can use >>>>>>> any of >>>>>>> the "conventional" (i.e. not high throughput) methods like LDA, trees, >>>>>>> Random Forest, SVM, etc. >>>>>>> >>>>>>> I will have a look at both packages. It's pcr data by the way >>>>>>> >>>>>>>> thanks >>>>>>>> >>>>>>>> On 06/23/2011 05:56 PM, Tim Triche, Jr. wrote: >>>>>>>> >>>>>>>> or CMA, which is perhaps a more systematic approach for >>>>>>>>> classification. >>>>>>>>> (the package name stands for Classification of MicroArrays) Very >>>>>>>>> well >>>>>>>>> thought out. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jun 23, 2011 at 8:02 AM, Sean >>>>>>>>> Davis<sdavis2 at="" mail.nih.gov=""> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> On Thu, Jun 23, 2011 at 10:58 AM, David >>>>>>>>> >>>>>>>>>> martin<vilanew at="" gmail.com=""> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>>> I have 5 genes of interest. I would like to know which >>>>>>>>>>> combination(s) >>>>>>>>>>> of >>>>>>>>>>> genes gives the best disease separation. Which test could i use in >>>>>>>>>>> my >>>>>>>>>>> training set to see which combination is the best classificer >>>>>>>>>>> between >>>>>>>>>>> my >>>>>>>>>>> disease and my healthy population. >>>>>>>>>>> >>>>>>>>>>> Thanks for any comment or test that could be useful to answer that >>>>>>>>>>> >>>>>>>>>>> question. >>>>>>>>>> >>>>>>>>>> Check out the MLInterfaces package. It should give you some ideas >>>>>>>>>> on >>>>>>>>>> where to start. >>>>>>>>>> >>>>>>>>>> Sean >>>>>>>>>> >>>>>>>>>> ______________________________**_________________ >>>>>>>>>> Bioconductor mailing list >>>>>>>>>> Bioconductor at r-project.org >>>>>>>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor< >>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor> >>>>>>>>>> Search the archives: >>>>>>>>>> >>>>>>>>>> http://news.gmane.org/gmane.**science.biology.informatics.**conductor >>>>>>>>>> <http: news.gmane.org="" gmane.science.biology.informatics.conductor="">>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ______________________________**_________________ >>>>>>>>> >>>>>>>> Bioconductor mailing list >>>>>>>> Bioconductor at r-project.org >>>>>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor< >>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor> >>>>>>>> Search the archives: >>>>>>>> http://news.gmane.org/gmane.**science.biology.informatics.**conductor >>>>>>>> <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>>>>>>> >>>>>>>> >>>>>>>> ______________________________**______________________________** >>>>>>> __________ >>>>>>> The information in this email is confidential and >>>>>>> intend...{{dropped:4}} >>>>>>> >>>>>>> ______________________________**_________________ >>>>>>> Bioconductor mailing list >>>>>>> Bioconductor at r-project.org >>>>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor< >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor> >>>>>>> Search the archives: >>>>>>> http://news.gmane.org/gmane.**science.biology.informatics.**conductor >>>>>>> <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>>>>>> >>>>>>> ______________________________**_________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at r-project.org >>>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor< >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor> >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.**science.biology.informatics.**conductor< >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor> >>>>>> >>>>>> >>>>>> ______________________________**_________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor< >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor> >>>>> >>>>> Search the archives: http://news.gmane.org/gmane.** >>>>> science.biology.informatics.**conductor< >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor> >>>>> >>>>> >>>> >>>> >> >> >> -- >> When you emerge in a few years, you can ask someone what you missed, and >> you'll find it can be summed up in a few minutes. >> >> Derek Sivers<http: sivers.org="" berklee=""> >> >> > >

ADD REPLY • link 13.8 years ago David ▴ 860

Login before adding your answer.