Seeking assistance on ROC

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 9 weeks ago

United States

On Mon, Feb 1, 2010 at 8:55 AM, Susan Bosco <susanbosco86@yahoo.com> wrote: > Dear Sean, > > Thanks for your reply. > > Before getting into ROC we have gone through many research papers on > ROC, but our understanding on ROC was wrong. We fit filter on continuous > data (logratio information) to classify data as 0 and 1. As on discussing > with list we came to know that the cut off should not applied on the data, > since we are looking for the cut off using ROC. Hence we used rbinorm() > function as used in the ROC documentation. > > Later, as you suggested,we conacted local Biostatistician . Local > statistician is new to Microaaray. So, explained us ROC analysis taking > example of diseased vs normal patients in the context of Blood Pressure and > also mentioned classification should be based on categorical data. But our > microarray data (Medip enriched) does not contain any comparision groups and > are in duplicates. So, we are literally confused a lot now to implement ROC > > So, if you do not mind, can you please share your experience with ROC from > scratch? and can you please provide suggestions on implementing ROC in our > context? > > Thanking you in anticipation. > > Hi, Susan. I cannot think of a way to explain better than I have up to this point (but others might be able to, I admit). The answer to your questions from your local statistician sounds perfectly correct and your reaction makes me suspect that ROC analysis may not be what you need here. I would suggest that if you are still unclear about where to go from here that you continue to work with a local statistician or find a collaborator willing to work with you to complete your study. There is a limit to what can be done by email, unfortunately. Sean > > --- On *Mon, 25/1/10, Sean Davis <seandavi@gmail.com>* wrote: > > > From: Sean Davis <seandavi@gmail.com> > Subject: Re: [BioC] Seeking assistance on ROC > To: "Susan Bosco" <susanbosco86@yahoo.com> > Cc: bioconductor@stat.math.ethz.ch, "prashantha hebbar" < > prashantha.hebbar@manipal.edu> > Date: Monday, 25 January, 2010, 6:54 PM > > > On Sat, Jan 23, 2010 at 6:28 AM, Susan Bosco <susanbosco86@yahoo.com <http:="" in.mc953.mail.yahoo.com="" mc="" compose?to="susanbosco86@yahoo.com">> > wrote: > > Dear Sean, > > > > Thanks again. > > > > I corrected the script changing the value of 'truth' variable with > rbinom() function. Since my data size is quite large(data is of 244K),I > tried with the first 200,for which I was able to find proper ROC curve. > However, when I include the complete data, the plot changes. For the whole > data,I get > > a linear graph with small variations. > > > > My sessionInfo() looks like this: > > For 100 values of the data: > > library(ROC) > > load("RGKma.RData") > > state= rbinom(length(RGKma$M[1:100,3]),1,0.33) > > data = RGKma$M[1:200,3] > > R1<-rocdemo.sca(truth=state,data,dxrule.sca) > > pdf("ROCk.pdf") > > plot(R1, show.thresh=TRUE,col = "red") > > dev.off() > > > > For the complete data: > > library(ROC) > > load("RGKma.RData") > > state= rbinom(length(RGKma$M[,3]),1,0.33) > > data = RGKma$M[,3] > > R1<-rocdemo.sca(truth=state,data,dxrule.sca) > > pdf("ROCallk.pdf") > > plot(R1, show.thresh=TRUE,col = "red") > > dev.off() > > > > I've hereby attached the pdfs of the plots.I would appreciate if you > could help me out with this problem that I encountered with a large data > size. > > Hi, Susan. The problem is not the large data size, in particular. > You need to know the TRUTH. You cannot assign the TRUTH using a > random binomial. You need to KNOW which samples are of one class > versus the other. Do you know that information? If not, then ROC > analysis is not a useful thing to apply. > > Sean > > > Thanking you sincerely, > > Susan. > > > > > > --- On Wed, 20/1/10, Sean Davis > > <seandavi@gmail.com<http: in.mc953.mail.yahoo.com="" mc="" compose?to="seandavi@gmail.com">> > wrote: > > > > From: Sean Davis > > <seandavi@gmail.com<http: in.mc953.mail.yahoo.com="" mc="" compose?to="seandavi@gmail.com"> > > > > Subject: Re: [BioC] Seeking assistance on ROC > > To: "Susan Bosco" <susanbosco86@yahoo.com<http: in.mc953.mail.yah="" oo.com="" mc="" compose?to="susanbosco86@yahoo.com"> > > > > Cc: bioconductor@stat.math.ethz.ch<http: in.mc953.mail.yahoo.com="" mc="" compose?to="bioconductor@stat.math.ethz.ch">, > "prashantha hebbar" <prashantha.hebbar@manipal.edu<http: in.mc953.m="" ail.yahoo.com="" mc="" compose?to="prashantha.hebbar@manipal.edu"> > > > > Date: Wednesday, 20 January, 2010, 12:05 PM > > > > > > > > On Wed, Jan 20, 2010 at 12:39 AM, Susan Bosco <susanbosco86@yahoo. com<http:="" in.mc953.mail.yahoo.com="" mc="" compose?to="susanbosco86@yahoo.co" m="">> > wrote: > > > > > > Dear > > Sean, > > > > Thank you so much for the help. > > > > > > I tried with a range of thresholds from 0-0.9..As you had mentioned,the > > true positive rates no doubt increased with thresholds below > > 0.9.However I did get some false positive rates even at a minimum > threshold > > of 0.1.Could you kindly explain the reason? > > > > > > > > Is > > there any method of finding the optimal threshold,maximizing the true > > positive rates while minimizing the false positives,instead of randomly > > choosing between 0-0.9? > > > > > > Hi, Susan. The ROC curve IS that method. The ROC curve represents ALL > thresholds as applied to the data. If you plot with show.thresh=TRUE, you > will see the thresholds that were tried and where they are on the curve. > > > > > > If the threshold to which you are referring is the one that you used to > determine the variable you called "state", then we are talking about two > different things. The "truth" variable is meant to be assigned by some > source other than the data themselves. If you do not know the true state of > your samples and find yourself assigning the state the data, then ROC curve > analysis will not be of any use. > > > > > > Sean > > > > > > Thanks in advance, > > > > Susan. > > > > > > > > > > > > > > The INTERNET now has a personality. YOURS! See your Yahoo! Homepage. > > > > > > > > > > Your Mail works best with the New Yahoo Optimized IE8. Get it NOW! > http://downloads.yahoo.com/in/internetexplorer/ > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch<http: in.mc953.mail.yahoo.com="" mc="" c="" ompose?to="Bioconductor@stat.math.ethz.ch"> > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > ------------------------------ > Your Mail works best with the New Yahoo Optimized IE8. Get it NOW!<h ttp:="" in.rd.yahoo.com="" tagline_ie8_new="" *http:="" downloads.yahoo.com="" in="" i="" nternetexplorer=""/> > . > [[alternative HTML version deleted]]

Microarray GO Classification ROC graph ASSIGN Microarray GO Classification ROC graph • 1.8k views

ADD COMMENT • link updated 15.2 years ago by Steve Lianoglou ★ 13k • written 15.2 years ago by Sean Davis 21k

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 10 days ago

United States

It's probably a mistake ... but I feel compelled to just try to add some input. Susan: forget about the code that's necessary to do "ROC Analysis" with R, as Sean mentioned, perhaps you should look to see if ROC analysis is what you want. The thing that seems most painful to me is that you previously wrote this: """ I tried with a range of thresholds from 0-0.9..As you had mentioned,the true positive rates no doubt increased with thresholds below 0.9.However I did get some false positive rates even at a minimum threshold of 0.1.Could you kindly explain the reason? Is there any method of finding the optimal threshold,maximizing the true positive rates while minimizing the false positives,instead of randomly choosing between 0-0.9? """ And, again, as Sean mentioned, that's what a ROC curve is for. So: 1. Presumably you have some binary classifier that is classifying something of interest. 2. This classifier has some parameter(s) you can tune that adjusts it's sensitivity vs. specificity tradeoff. 3. You want to determine the optimal value of this parameter for your classifier that gives you the best trade off. 4. Let's assume you vary this parameter over MANY values. You can now plot the sensitivity vs. specifity (1 - specificity to be precise) of your classifier for all of these values to see *visually* what the tradeoff is. 5. Assuming you're plotting this sensitivity vs. specificity point for all of the values of your parameter ON THE SAME GRAPH, when you squint your eyes enough, the shape that you will see emerging from your plot is a curve. Your job is to find "the best" point on this curve. 6. You just have to find the top-left-most point on this curve: this is your best value for the parameter since it gives you the best combination of sensitivity (as high as possible on the y -axis) and specificity (as left as possible on the x axis, since x = 1 - specificity). 7. Finally, since you made this plot, you know which value of your parameter was used to create all the points on your curve. Just take the value of the parameter that gives you the point on the curve you found in (6). Sorry, I'm not really inclined to provide any code that does this for you ... or walk you through a tutorial of any package that does this either. It's actually pretty straight forward to do so yourself without using any R packages and just using "straight R". Read through the ROC page on wikipedia, the intro and basic concept really tells you all you need to know, and that (along with my surely lucid list of points above :-) should help you decide if "ROC analysis" is what you're after: http://en.wikipedia.org/wiki/Receiver_operating_characteristic All the formulas you need are there (really just the true positive rate, false positive rate). Assuming you have the known labels for a set of data that you are using your classifier to predict on, you should be able to whip up some R code that makes the ROC plot with some effort. HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD COMMENT • link 15.2 years ago Steve Lianoglou ★ 13k

Login before adding your answer.