Question

about label permutation test for binary classification

0

Entering edit mode

James Anderson ▴ 820

@james-anderson-1641

Last seen 10.6 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070913/ 60759ef4/attachment.pl

• 733 views

ADD COMMENT • link updated 17.6 years ago by Joern Toedling ▴ 730 • written 17.6 years ago by James Anderson ▴ 820

score 0 · Answer 1 · 2007-09-13

Hello, I am a bit puzzled about what you actually want to ask. James Anderson wrote: > For binary classification problem in microarray, if you do some random subsampling classification (every time split data into 80% training and 20% test with stratification (perserving the ratio in each class), repeat many times). When you get some results, one thing you would normally look at is how significantly different is your results from what you are going to get by chance, that's why people do label permutation test. My question is that: Are the final results of label permutation test for accuracy equal to the proportion of the large class (say there are 80 normal vs. 20 disease, is the mean accuracy of label permutation test equal to 80/(80+20) as long as you repeat enough times? Is this classifier independent? > The mean accuracy of your classifier after label permutation, in a cross-validation setting presumably, depends very much on your classifier. What you should contrast it to is the accuracy of the naive classifier "assign every sample to the larger class", 80% in your case. A good reason for label permutation in your case is that you want to assess the classifier's generalizability, because one can always construct a classifier that has an accuracy of 100% on the training data, but performs badly on independent test data. That is one reason why people do label permutation with classification because the classifier's mean accuracy in a cross-validation setting gives a better estimate of the classifier's accuracy on test data. (You have to make sure that you do not use any aspect of the set-aside training data for training the classifier, though.) An even better estimate for your classifier's performance, however, would be its accuracy on a completely independent test data set. Cross-validation on your training data could then be used to select parameters of your classifier, if needed. Hope this helps. Regards, Joern > Thanks a lot! > > James > > > --------------------------------- > Building a website is a piece of cake. > well, classification sometimes isn't.