limma_analysis result

0

Entering edit mode

Abhilash Venu ▴ 340

@abhilash-venu-2680

Last seen 10.6 years ago

Hi list, I am still wonder about the data, which I analyzed by the limma. I accept that I am a biology graduate student, and in the learning stage. I am analyzing the single color data, which had been generated by Agilent 4x44k platform. With the help of mailing list and limma users guide, I have done the following analysis. But logFC gives very high values like 320, 1320 etc. I don't know how really the fitting is happening. Can I rely on this result. How should I go about it. #Reading the data. > RG<-read.maimages(txt_files, columns = list(G = "gMeanSignal", Gb = > "gBGMeanSignal", R="gMedianSignal",Rb="gBGMedian > > Signal"), > annotation= c("Row", "Col", > "ProbeUID","ProbeName", "GeneName",)) Rgene<-backgroundCorrect(RG,method='subtract') #Considering only G as it is single color experiment. MA<-normalizeBetweenArrays(Rgene$G,method="quantile") design <- cbind(norm=1,normvstest=c(1,1,1,1,0,0,0,0)) fit <- lmFit(MA, design) fit <- eBayes(fit) topTable(fit, coef="normvstest", adjust="fdr") -- Regards, Abhilash [[alternative HTML version deleted]]

GO limma GO limma • 1.6k views

ADD COMMENT • link updated 16.8 years ago by Mark Cowley ▴ 400 • written 16.8 years ago by Abhilash Venu ▴ 340

0

Entering edit mode

Mark Cowley ▴ 400

@mark-cowley-2858

Last seen 9.6 years ago

Australia

Hi Abhilash, Your code looks good, except that usually you will want to normalise log transformed data. thus try: > MA<-normalizeBetweenArrays( log2(Rgene$G), method="quantile") If your logFC ratios still look very high, then try convincing yourself of their accuracy by looking at the raw data (RG$R) for some of the most differentially expressed genes, and also plot the expression values for some of these DE genes. good luck, Mark Peter Wills Bioinformatics Centre Garvan Institute of Medical Research On 17/06/2008, at 1:17 AM, Abhilash Venu wrote: > Hi list, > > I am still wonder about the data, which I analyzed by the limma. I > accept > that I am a biology graduate student, and in the learning stage. I am > analyzing the single color data, which had been generated by Agilent > 4x44k > platform. With the help of mailing list and limma users guide, I > have done > the following analysis. But logFC gives very high values like 320, > 1320 etc. > I don't know how really the fitting is happening. Can I rely on this > result. > How should I go about it. > #Reading the data. > >> RG<-read.maimages(txt_files, columns = list(G = "gMeanSignal", Gb = >> > "gBGMeanSignal", > R="gMedianSignal",Rb="gBGMedian >> >> Signal"), >> annotation= c("Row", "Col", >> "ProbeUID","ProbeName", "GeneName",)) > > > Rgene<-backgroundCorrect(RG,method='subtract') > > #Considering only G as it is single color experiment. > MA<-normalizeBetweenArrays(Rgene$G,method="quantile") > > design <- cbind(norm=1,normvstest=c(1,1,1,1,0,0,0,0)) > fit <- lmFit(MA, design) > fit <- eBayes(fit) > topTable(fit, coef="normvstest", adjust="fdr") > -- > > Regards, > Abhilash > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ---------------------------------------------------------------------- Mark Cowley, BSc (Bioinformatics)(Hons) Peter Wills Bioinformatics Centre Garvan Institute of Medical Research 384 Victoria St Tel: +61 2 9295 8542 Darlinghurst, NSW 2010 Fax: +61 2 9295 8538 Australia email: m.cowley at garvan.org.au www.garvan.org.au

ADD COMMENT • link 16.8 years ago Mark Cowley ▴ 400

0

Entering edit mode

Hi Mark, I think it was a great suggestion and I am providing the result, which I got after the command, topTable(fit, coef="normvstest", adjust="fdr"). I believe this preliminary results solved the problem of very high fold value, which I was getting earlier. I will be looking at the entire data and proved a better view at the earliest. But I have some doubts in this table itself. Here I am getting negative odds ratio and in some cases negative t value. What should I do in these scenario? topTable(fit, coef="normvstest", adjust="fdr") logFC AveExpr t P.Value adj.P.Val B 35726 -2.1103554 11.936825 -9.602581 8.028641e-06 0.3614093 -3.912633 1968 -1.3413791 9.974470 -6.960746 9.138620e-05 0.7305539 -3.971277 5558 -1.6566417 10.885625 -6.506724 1.487751e-04 0.7305539 -3.987395 34497 1.0445013 10.251047 6.185219 2.132063e-04 0.7305539 -4.000460 33195 -1.3603874 13.373106 -6.116817 2.305481e-04 0.7305539 -4.003438 44662 0.9528248 11.180259 6.045345 2.503347e-04 0.7305539 -4.006630 24980 -1.5689151 10.824414 -5.932376 2.855094e-04 0.7305539 -4.011846 30206 2.2991372 13.647875 5.926758 2.873946e-04 0.7305539 -4.012112 26046 -1.1709614 9.505652 -5.746545 3.557246e-04 0.7305539 -4.020911 27210 1.4815342 9.416698 5.656415 3.964086e-04 0.7305539 -4.025537 Thanks in advance Best Abhilash On Tue, Jun 17, 2008 at 4:38 AM, Mark Cowley <m.cowley0@gmail.com> wrote: > Hi Abhilash, > Your code looks good, except that usually you will want to normalise log > transformed data. thus try: > >> MA<-normalizeBetweenArrays( log2(Rgene$G), method="quantile") >> > > If your logFC ratios still look very high, then try convincing yourself of > their accuracy by looking at the raw data (RG$R) for some of the most > differentially expressed genes, and also plot the expression values for some > of these DE genes. > > good luck, > Mark > Peter Wills Bioinformatics Centre > Garvan Institute of Medical Research > > > > On 17/06/2008, at 1:17 AM, Abhilash Venu wrote: > > Hi list, >> >> I am still wonder about the data, which I analyzed by the limma. I accept >> that I am a biology graduate student, and in the learning stage. I am >> analyzing the single color data, which had been generated by Agilent 4x44k >> platform. With the help of mailing list and limma users guide, I have done >> the following analysis. But logFC gives very high values like 320, 1320 >> etc. >> I don't know how really the fitting is happening. Can I rely on this >> result. >> How should I go about it. >> #Reading the data. >> >> RG<-read.maimages(txt_files, columns = list(G = "gMeanSignal", Gb = >>> >>> "gBGMeanSignal", >> R="gMedianSignal",Rb="gBGMedian >> >>> >>> Signal"), >>> annotation= c("Row", "Col", >>> "ProbeUID","ProbeName", "GeneName",)) >>> >> >> >> Rgene<-backgroundCorrect(RG,method='subtract') >> >> #Considering only G as it is single color experiment. >> MA<-normalizeBetweenArrays(Rgene$G,method="quantile") >> >> design <- cbind(norm=1,normvstest=c(1,1,1,1,0,0,0,0)) >> fit <- lmFit(MA, design) >> fit <- eBayes(fit) >> topTable(fit, coef="normvstest", adjust="fdr") >> -- >> >> Regards, >> Abhilash >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > ---------------------------------------------------------------------- > Mark Cowley, BSc (Bioinformatics)(Hons) > > Peter Wills Bioinformatics Centre > Garvan Institute of Medical Research > 384 Victoria St Tel: +61 2 9295 8542 > Darlinghurst, NSW 2010 Fax: +61 2 9295 8538 > Australia email: > m.cowley@garvan.org.au > www.garvan.org.au > ---------------------------------------------------------------------- > > -- Regards, Abhilash [[alternative HTML version deleted]]

ADD REPLY • link 16.8 years ago Abhilash Venu ▴ 340

0

Entering edit mode

Hi Abilash, you might want to read up a bit on those statistics and how they are generated. It definitely helps when it comes to interpreting their results properly. The B statistic is a log odds ratio. It simply means that the actual log odds are between 0 and 1: less than 50% probability of differential expression according to this test. The t statistics can be positive or negative. A negative t-statistic simply means that the mean of the second group ("test" in this case) is higher. A high negative t-statistic would have the same evidence of differential expression as a high positive t-statistic. Francois Abhilash Venu wrote: > Hi Mark, > > I think it was a great suggestion and I am providing the result, which I got > after the command, topTable(fit, coef="normvstest", adjust="fdr"). > I believe this preliminary results solved the problem of very high fold > value, which I was getting earlier. I will be looking at the entire data and > proved a better view at the earliest. But I have some doubts in this table > itself. Here I am getting negative odds ratio and in some cases negative t > value. What should I do in these scenario? > > > topTable(fit, coef="normvstest", adjust="fdr") > logFC AveExpr t P.Value adj.P.Val B > 35726 -2.1103554 11.936825 -9.602581 8.028641e-06 0.3614093 -3.912633 > 1968 -1.3413791 9.974470 -6.960746 9.138620e-05 0.7305539 -3.971277 > 5558 -1.6566417 10.885625 -6.506724 1.487751e-04 0.7305539 -3.987395 > 34497 1.0445013 10.251047 6.185219 2.132063e-04 0.7305539 -4.000460 > 33195 -1.3603874 13.373106 -6.116817 2.305481e-04 0.7305539 -4.003438 > 44662 0.9528248 11.180259 6.045345 2.503347e-04 0.7305539 -4.006630 > 24980 -1.5689151 10.824414 -5.932376 2.855094e-04 0.7305539 -4.011846 > 30206 2.2991372 13.647875 5.926758 2.873946e-04 0.7305539 -4.012112 > 26046 -1.1709614 9.505652 -5.746545 3.557246e-04 0.7305539 -4.020911 > 27210 1.4815342 9.416698 5.656415 3.964086e-04 0.7305539 -4.025537 > > Thanks in advance > > Best > Abhilash > > > On Tue, Jun 17, 2008 at 4:38 AM, Mark Cowley <m.cowley0 at="" gmail.com=""> wrote: > >> Hi Abhilash, >> Your code looks good, except that usually you will want to normalise log >> transformed data. thus try: >> >>> MA<-normalizeBetweenArrays( log2(Rgene$G), method="quantile") >>> >> If your logFC ratios still look very high, then try convincing yourself of >> their accuracy by looking at the raw data (RG$R) for some of the most >> differentially expressed genes, and also plot the expression values for some >> of these DE genes. >> >> good luck, >> Mark >> Peter Wills Bioinformatics Centre >> Garvan Institute of Medical Research >> >> >> >> On 17/06/2008, at 1:17 AM, Abhilash Venu wrote: >> >> Hi list, >>> I am still wonder about the data, which I analyzed by the limma. I accept >>> that I am a biology graduate student, and in the learning stage. I am >>> analyzing the single color data, which had been generated by Agilent 4x44k >>> platform. With the help of mailing list and limma users guide, I have done >>> the following analysis. But logFC gives very high values like 320, 1320 >>> etc. >>> I don't know how really the fitting is happening. Can I rely on this >>> result. >>> How should I go about it. >>> #Reading the data. >>> >>> RG<-read.maimages(txt_files, columns = list(G = "gMeanSignal", Gb = >>>> "gBGMeanSignal", >>> R="gMedianSignal",Rb="gBGMedian >>> >>>> Signal"), >>>> annotation= c("Row", "Col", >>>> "ProbeUID","ProbeName", "GeneName",)) >>>> >>> >>> Rgene<-backgroundCorrect(RG,method='subtract') >>> >>> #Considering only G as it is single color experiment. >>> MA<-normalizeBetweenArrays(Rgene$G,method="quantile") >>> >>> design <- cbind(norm=1,normvstest=c(1,1,1,1,0,0,0,0)) >>> fit <- lmFit(MA, design) >>> fit <- eBayes(fit) >>> topTable(fit, coef="normvstest", adjust="fdr") >>> -- >>> >>> Regards, >>> Abhilash >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> >> ---------------------------------------------------------------------- >> Mark Cowley, BSc (Bioinformatics)(Hons) >> >> Peter Wills Bioinformatics Centre >> Garvan Institute of Medical Research >> 384 Victoria St Tel: +61 2 9295 8542 >> Darlinghurst, NSW 2010 Fax: +61 2 9295 8538 >> Australia email: >> m.cowley at garvan.org.au >> www.garvan.org.au >> ---------------------------------------------------------------------- >> >> > >

ADD REPLY • link 16.8 years ago Francois Pepin ★ 1.3k

0

Entering edit mode

Hi Abhilash, In addition to Francois' comments, for me the biggest indicator is that your adjusted P-values are all greater than 0.05. My interpretation of this is that after multiple testing correction, none of your genes are statistically significantly differentially expressed. This probably implies that either the differences between your two groups are not large, or that there is higher inter-sample variance; again, plotting some of these DE genes will help inform you as to which is the case. cheers, Mark On 18/06/2008, at 2:48 AM, Francois Pepin wrote: > Hi Abilash, > > you might want to read up a bit on those statistics and how they are > generated. It definitely helps when it comes to interpreting their > results properly. > > The B statistic is a log odds ratio. It simply means that the actual > log odds are between 0 and 1: less than 50% probability of > differential expression according to this test. > > The t statistics can be positive or negative. A negative t-statistic > simply means that the mean of the second group ("test" in this case) > is higher. A high negative t-statistic would have the same evidence > of differential expression as a high positive t-statistic. > > Francois > > Abhilash Venu wrote: >> Hi Mark, >> I think it was a great suggestion and I am providing the result, >> which I got >> after the command, topTable(fit, coef="normvstest", adjust="fdr"). >> I believe this preliminary results solved the problem of very high >> fold >> value, which I was getting earlier. I will be looking at the entire >> data and >> proved a better view at the earliest. But I have some doubts in >> this table >> itself. Here I am getting negative odds ratio and in some cases >> negative t >> value. What should I do in these scenario? >> topTable(fit, coef="normvstest", adjust="fdr") >> logFC AveExpr t P.Value adj.P.Val B >> 35726 -2.1103554 11.936825 -9.602581 8.028641e-06 0.3614093 -3.912633 >> 1968 -1.3413791 9.974470 -6.960746 9.138620e-05 0.7305539 -3.971277 >> 5558 -1.6566417 10.885625 -6.506724 1.487751e-04 0.7305539 -3.987395 >> 34497 1.0445013 10.251047 6.185219 2.132063e-04 0.7305539 -4.000460 >> 33195 -1.3603874 13.373106 -6.116817 2.305481e-04 0.7305539 -4.003438 >> 44662 0.9528248 11.180259 6.045345 2.503347e-04 0.7305539 -4.006630 >> 24980 -1.5689151 10.824414 -5.932376 2.855094e-04 0.7305539 -4.011846 >> 30206 2.2991372 13.647875 5.926758 2.873946e-04 0.7305539 -4.012112 >> 26046 -1.1709614 9.505652 -5.746545 3.557246e-04 0.7305539 -4.020911 >> 27210 1.4815342 9.416698 5.656415 3.964086e-04 0.7305539 -4.025537 >> Thanks in advance >> Best >> Abhilash >> On Tue, Jun 17, 2008 at 4:38 AM, Mark Cowley <m.cowley0 at="" gmail.com=""> >> wrote: >>> Hi Abhilash, >>> Your code looks good, except that usually you will want to >>> normalise log >>> transformed data. thus try: >>> >>>> MA<-normalizeBetweenArrays( log2(Rgene$G), method="quantile") >>>> >>> If your logFC ratios still look very high, then try convincing >>> yourself of >>> their accuracy by looking at the raw data (RG$R) for some of the >>> most >>> differentially expressed genes, and also plot the expression >>> values for some >>> of these DE genes. >>> >>> good luck, >>> Mark >>> Peter Wills Bioinformatics Centre >>> Garvan Institute of Medical Research >>> >>> >>> >>> On 17/06/2008, at 1:17 AM, Abhilash Venu wrote: >>> >>> Hi list, >>>> I am still wonder about the data, which I analyzed by the limma. >>>> I accept >>>> that I am a biology graduate student, and in the learning stage. >>>> I am >>>> analyzing the single color data, which had been generated by >>>> Agilent 4x44k >>>> platform. With the help of mailing list and limma users guide, I >>>> have done >>>> the following analysis. But logFC gives very high values like >>>> 320, 1320 >>>> etc. >>>> I don't know how really the fitting is happening. Can I rely on >>>> this >>>> result. >>>> How should I go about it. >>>> #Reading the data. >>>> >>>> RG<-read.maimages(txt_files, columns = list(G = "gMeanSignal", Gb = >>>>> "gBGMeanSignal", >>>> R="gMedianSignal",Rb="gBGMedian >>>> >>>>> Signal"), >>>>> annotation= c("Row", "Col", >>>>> "ProbeUID","ProbeName", "GeneName",)) >>>>> >>>> >>>> Rgene<-backgroundCorrect(RG,method='subtract') >>>> >>>> #Considering only G as it is single color experiment. >>>> MA<-normalizeBetweenArrays(Rgene$G,method="quantile") >>>> >>>> design <- cbind(norm=1,normvstest=c(1,1,1,1,0,0,0,0)) >>>> fit <- lmFit(MA, design) >>>> fit <- eBayes(fit) >>>> topTable(fit, coef="normvstest", adjust="fdr") >>>> -- >>>> >>>> Regards, >>>> Abhilash >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> >>> ---------------------------------------------------------------------- >>> Mark Cowley, BSc (Bioinformatics)(Hons) >>> >>> Peter Wills Bioinformatics Centre >>> Garvan Institute of Medical Research >>> 384 Victoria St Tel: +61 2 9295 >>> 8542 >>> Darlinghurst, NSW 2010 Fax: +61 2 9295 8538 >>> Australia email: >>> m.cowley at garvan.org.au >>> www.garvan.org.au >>> ---------------------------------------------------------------------- >>> >>> >

ADD REPLY • link 16.8 years ago Mark Cowley ▴ 400

0

Entering edit mode

Hi Thank you Mark Thank you Francois. Currently I am I analyzing the entire data, once it got over I will get back to you. Regards, Abhilash On Wed, Jun 18, 2008 at 11:21 AM, Mark Cowley <m.cowley0@gmail.com> wrote: > Hi Abhilash, > In addition to Francois' comments, for me the biggest indicator is that > your adjusted P-values are all greater than 0.05. > My interpretation of this is that after multiple testing correction, none > of your genes are statistically significantly differentially expressed. > This probably implies that either the differences between your two groups > are not large, or that there is higher inter-sample variance; again, > plotting some of these DE genes will help inform you as to which is the > case. > > cheers, > Mark > > > On 18/06/2008, at 2:48 AM, Francois Pepin wrote: > > Hi Abilash, >> >> you might want to read up a bit on those statistics and how they are >> generated. It definitely helps when it comes to interpreting their results >> properly. >> >> The B statistic is a log odds ratio. It simply means that the actual log >> odds are between 0 and 1: less than 50% probability of differential >> expression according to this test. >> >> The t statistics can be positive or negative. A negative t-statistic >> simply means that the mean of the second group ("test" in this case) is >> higher. A high negative t-statistic would have the same evidence of >> differential expression as a high positive t-statistic. >> >> Francois >> >> Abhilash Venu wrote: >> >>> Hi Mark, >>> I think it was a great suggestion and I am providing the result, which I >>> got >>> after the command, topTable(fit, coef="normvstest", adjust="fdr"). >>> I believe this preliminary results solved the problem of very high fold >>> value, which I was getting earlier. I will be looking at the entire data >>> and >>> proved a better view at the earliest. But I have some doubts in this >>> table >>> itself. Here I am getting negative odds ratio and in some cases negative >>> t >>> value. What should I do in these scenario? >>> topTable(fit, coef="normvstest", adjust="fdr") >>> logFC AveExpr t P.Value adj.P.Val B >>> 35726 -2.1103554 11.936825 -9.602581 8.028641e-06 0.3614093 -3.912633 >>> 1968 -1.3413791 9.974470 -6.960746 9.138620e-05 0.7305539 -3.971277 >>> 5558 -1.6566417 10.885625 -6.506724 1.487751e-04 0.7305539 -3.987395 >>> 34497 1.0445013 10.251047 6.185219 2.132063e-04 0.7305539 -4.000460 >>> 33195 -1.3603874 13.373106 -6.116817 2.305481e-04 0.7305539 -4.003438 >>> 44662 0.9528248 11.180259 6.045345 2.503347e-04 0.7305539 -4.006630 >>> 24980 -1.5689151 10.824414 -5.932376 2.855094e-04 0.7305539 -4.011846 >>> 30206 2.2991372 13.647875 5.926758 2.873946e-04 0.7305539 -4.012112 >>> 26046 -1.1709614 9.505652 -5.746545 3.557246e-04 0.7305539 -4.020911 >>> 27210 1.4815342 9.416698 5.656415 3.964086e-04 0.7305539 -4.025537 >>> Thanks in advance >>> Best >>> Abhilash >>> On Tue, Jun 17, 2008 at 4:38 AM, Mark Cowley <m.cowley0@gmail.com> >>> wrote: >>> >>>> Hi Abhilash, >>>> Your code looks good, except that usually you will want to normalise log >>>> transformed data. thus try: >>>> >>>> MA<-normalizeBetweenArrays( log2(Rgene$G), method="quantile") >>>>> >>>>> If your logFC ratios still look very high, then try convincing >>>> yourself of >>>> their accuracy by looking at the raw data (RG$R) for some of the most >>>> differentially expressed genes, and also plot the expression values for >>>> some >>>> of these DE genes. >>>> >>>> good luck, >>>> Mark >>>> Peter Wills Bioinformatics Centre >>>> Garvan Institute of Medical Research >>>> >>>> >>>> >>>> On 17/06/2008, at 1:17 AM, Abhilash Venu wrote: >>>> >>>> Hi list, >>>> >>>>> I am still wonder about the data, which I analyzed by the limma. I >>>>> accept >>>>> that I am a biology graduate student, and in the learning stage. I am >>>>> analyzing the single color data, which had been generated by Agilent >>>>> 4x44k >>>>> platform. With the help of mailing list and limma users guide, I have >>>>> done >>>>> the following analysis. But logFC gives very high values like 320, 1320 >>>>> etc. >>>>> I don't know how really the fitting is happening. Can I rely on this >>>>> result. >>>>> How should I go about it. >>>>> #Reading the data. >>>>> >>>>> RG<-read.maimages(txt_files, columns = list(G = "gMeanSignal", Gb = >>>>> >>>>>> "gBGMeanSignal", >>>>>> >>>>> R="gMedianSignal",Rb="gBGMedian >>>>> >>>>> Signal"), >>>>>> annotation= c("Row", "Col", >>>>>> "ProbeUID","ProbeName", "GeneName",)) >>>>>> >>>>>> >>>>> Rgene<-backgroundCorrect(RG,method='subtract') >>>>> >>>>> #Considering only G as it is single color experiment. >>>>> MA<-normalizeBetweenArrays(Rgene$G,method="quantile") >>>>> >>>>> design <- cbind(norm=1,normvstest=c(1,1,1,1,0,0,0,0)) >>>>> fit <- lmFit(MA, design) >>>>> fit <- eBayes(fit) >>>>> topTable(fit, coef="normvstest", adjust="fdr") >>>>> -- >>>>> >>>>> Regards, >>>>> Abhilash >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor@stat.math.ethz.ch >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>>> >>>> >>>> ---------------------------------------------------------------------- >>>> Mark Cowley, BSc (Bioinformatics)(Hons) >>>> >>>> Peter Wills Bioinformatics Centre >>>> Garvan Institute of Medical Research >>>> 384 Victoria St Tel: +61 2 9295 8542 >>>> Darlinghurst, NSW 2010 Fax: +61 2 9295 8538 >>>> Australia email: >>>> m.cowley@garvan.org.au >>>> www.garvan.org.au >>>> ---------------------------------------------------------------------- >>>> >>>> >>>> >> > -- Regards, Abhilash [[alternative HTML version deleted]]

ADD REPLY • link 16.8 years ago Abhilash Venu ▴ 340

0

Entering edit mode

Hi Mark, I a have analyzed a set of data, with the following target file. FileName Target 251485013291_S01_GE1-v5_95_Feb07_1_1.txt test 51485013291_S01_GE1-v5_95_Feb07_1_2.txt test 251485013291_S01_GE1-v5_95_Feb07_1_3.txt test 251485013291_S01_GE1-v5_95_Feb07_1_4.txt test 251485013285_S01_GE1-v5_95_Feb07_1_1.txt norm 251485013285_S01_GE1-v5_95_Feb07_1_2.txt norm 251485013285_S01_GE1-v5_95_Feb07_1_3.txt norm 251485013285_S01_GE1-v5_95_Feb07_1_4.txt norm During normalization I am getting the following warning message? is it going to create any trouble for me? MA<-normalizeBetweenArrays(log2(Rgene$G), method="quantile") Warning message: In log(c(65230.44761, 175.96476, 170.2672, 186.6677, 176.24364, : NaNs produced Created the following design file, > design=cbind(norm=c(0,0,0,0,1,1,1,1),test=c(1,1,1,1,0,0,0,0)) > design norm test [1,] 0 1 [2,] 0 1 [3,] 0 1 [4,] 0 1 [5,] 1 0 [6,] 1 0 [7,] 1 0 [8,] 1 0 > fit<- lmFit(MA,design cont.matrix=makeContrasts(normvstest=test-norm,levels=design)) fit2=contrasts.fit(fit,cont.matrix) fit3=eBayes(fit2) toptable(fit3,genelist=RG$genes,adjust="fdr") Whether my analysis is fine? As you mentioned I am getting the adjusted p values above 0.05, and the data is showing high variance among the samples (within the test and in normal also). In this scenario should I try other normalization methods like 'normexp' or 'vsn'. One more question I would like to ask is Can I get the fold value for each test sample by comparing with all the normal? or is there any other better approach for single color (Agilent) analysis. This will help me to create the heatmap for genes across the samples. Thank you in advance. Best Abhilash On Wed, Jun 18, 2008 at 8:43 PM, Abhilash Venu <abhivenu@gmail.com> wrote: > Hi > > Thank you Mark > Thank you Francois. > Currently I am I analyzing the entire data, once it got over I will get > back to you. > > Regards, > Abhilash > > > On Wed, Jun 18, 2008 at 11:21 AM, Mark Cowley <m.cowley0@gmail.com> wrote: > >> Hi Abhilash, >> In addition to Francois' comments, for me the biggest indicator is that >> your adjusted P-values are all greater than 0.05. >> My interpretation of this is that after multiple testing correction, none >> of your genes are statistically significantly differentially expressed. >> This probably implies that either the differences between your two groups >> are not large, or that there is higher inter-sample variance; again, >> plotting some of these DE genes will help inform you as to which is the >> case. >> >> cheers, >> Mark >> >> >> On 18/06/2008, at 2:48 AM, Francois Pepin wrote: >> >> Hi Abilash, >>> >>> you might want to read up a bit on those statistics and how they are >>> generated. It definitely helps when it comes to interpreting their results >>> properly. >>> >>> The B statistic is a log odds ratio. It simply means that the actual log >>> odds are between 0 and 1: less than 50% probability of differential >>> expression according to this test. >>> >>> The t statistics can be positive or negative. A negative t-statistic >>> simply means that the mean of the second group ("test" in this case) is >>> higher. A high negative t-statistic would have the same evidence of >>> differential expression as a high positive t-statistic. >>> >>> Francois >>> >>> Abhilash Venu wrote: >>> >>>> Hi Mark, >>>> I think it was a great suggestion and I am providing the result, which I >>>> got >>>> after the command, topTable(fit, coef="normvstest", adjust="fdr"). >>>> I believe this preliminary results solved the problem of very high fold >>>> value, which I was getting earlier. I will be looking at the entire data >>>> and >>>> proved a better view at the earliest. But I have some doubts in this >>>> table >>>> itself. Here I am getting negative odds ratio and in some cases negative >>>> t >>>> value. What should I do in these scenario? >>>> topTable(fit, coef="normvstest", adjust="fdr") >>>> logFC AveExpr t P.Value adj.P.Val B >>>> 35726 -2.1103554 11.936825 -9.602581 8.028641e-06 0.3614093 -3.912633 >>>> 1968 -1.3413791 9.974470 -6.960746 9.138620e-05 0.7305539 -3.971277 >>>> 5558 -1.6566417 10.885625 -6.506724 1.487751e-04 0.7305539 -3.987395 >>>> 34497 1.0445013 10.251047 6.185219 2.132063e-04 0.7305539 -4.000460 >>>> 33195 -1.3603874 13.373106 -6.116817 2.305481e-04 0.7305539 -4.003438 >>>> 44662 0.9528248 11.180259 6.045345 2.503347e-04 0.7305539 -4.006630 >>>> 24980 -1.5689151 10.824414 -5.932376 2.855094e-04 0.7305539 -4.011846 >>>> 30206 2.2991372 13.647875 5.926758 2.873946e-04 0.7305539 -4.012112 >>>> 26046 -1.1709614 9.505652 -5.746545 3.557246e-04 0.7305539 -4.020911 >>>> 27210 1.4815342 9.416698 5.656415 3.964086e-04 0.7305539 -4.025537 >>>> Thanks in advance >>>> Best >>>> Abhilash >>>> On Tue, Jun 17, 2008 at 4:38 AM, Mark Cowley <m.cowley0@gmail.com> >>>> wrote: >>>> >>>>> Hi Abhilash, >>>>> Your code looks good, except that usually you will want to normalise >>>>> log >>>>> transformed data. thus try: >>>>> >>>>> MA<-normalizeBetweenArrays( log2(Rgene$G), method="quantile") >>>>>> >>>>>> If your logFC ratios still look very high, then try convincing >>>>> yourself of >>>>> their accuracy by looking at the raw data (RG$R) for some of the most >>>>> differentially expressed genes, and also plot the expression values for >>>>> some >>>>> of these DE genes. >>>>> >>>>> good luck, >>>>> Mark >>>>> Peter Wills Bioinformatics Centre >>>>> Garvan Institute of Medical Research >>>>> >>>>> >>>>> >>>>> On 17/06/2008, at 1:17 AM, Abhilash Venu wrote: >>>>> >>>>> Hi list, >>>>> >>>>>> I am still wonder about the data, which I analyzed by the limma. I >>>>>> accept >>>>>> that I am a biology graduate student, and in the learning stage. I am >>>>>> analyzing the single color data, which had been generated by Agilent >>>>>> 4x44k >>>>>> platform. With the help of mailing list and limma users guide, I have >>>>>> done >>>>>> the following analysis. But logFC gives very high values like 320, >>>>>> 1320 >>>>>> etc. >>>>>> I don't know how really the fitting is happening. Can I rely on this >>>>>> result. >>>>>> How should I go about it. >>>>>> #Reading the data. >>>>>> >>>>>> RG<-read.maimages(txt_files, columns = list(G = "gMeanSignal", Gb = >>>>>> >>>>>>> "gBGMeanSignal", >>>>>>> >>>>>> R="gMedianSignal",Rb="gBGMedian >>>>>> >>>>>> Signal"), >>>>>>> annotation= c("Row", "Col", >>>>>>> "ProbeUID","ProbeName", "GeneName",)) >>>>>>> >>>>>>> >>>>>> Rgene<-backgroundCorrect(RG,method='subtract') >>>>>> >>>>>> #Considering only G as it is single color experiment. >>>>>> MA<-normalizeBetweenArrays(Rgene$G,method="quantile") >>>>>> >>>>>> design <- cbind(norm=1,normvstest=c(1,1,1,1,0,0,0,0)) >>>>>> fit <- lmFit(MA, design) >>>>>> fit <- eBayes(fit) >>>>>> topTable(fit, coef="normvstest", adjust="fdr") >>>>>> -- >>>>>> >>>>>> Regards, >>>>>> Abhilash >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor@stat.math.ethz.ch >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> >>>>>> >>>>> >>>>> ---------------------------------------------------------------------- >>>>> Mark Cowley, BSc (Bioinformatics)(Hons) >>>>> >>>>> Peter Wills Bioinformatics Centre >>>>> Garvan Institute of Medical Research >>>>> 384 Victoria St Tel: +61 2 9295 8542 >>>>> Darlinghurst, NSW 2010 Fax: +61 2 9295 8538 >>>>> Australia email: >>>>> m.cowley@garvan.org.au >>>>> www.garvan.org.au >>>>> ---------------------------------------------------------------------- >>>>> >>>>> >>>>> >>> >> > > > -- > > Regards, > Abhilash -- Regards, Abhilash [[alternative HTML version deleted]]

ADD REPLY • link 16.8 years ago Abhilash Venu ▴ 340

0

Entering edit mode

Hi Abhilash, On 26/06/2008, at 1:59 AM, Abhilash Venu wrote: > Hi Mark, > > I a have analyzed a set of data, with the following target file. > FileName > Target > 251485013291_S01_GE1-v5_95_Feb07_1_1.txt test > 51485013291_S01_GE1-v5_95_Feb07_1_2.txt test > 251485013291_S01_GE1-v5_95_Feb07_1_3.txt test > 251485013291_S01_GE1-v5_95_Feb07_1_4.txt test > 251485013285_S01_GE1-v5_95_Feb07_1_1.txt norm > 251485013285_S01_GE1-v5_95_Feb07_1_2.txt norm > 251485013285_S01_GE1-v5_95_Feb07_1_3.txt norm > 251485013285_S01_GE1-v5_95_Feb07_1_4.txt norm > > During normalization I am getting the following warning message? is > it going to create any trouble for me? > MA<-normalizeBetweenArrays(log2(Rgene$G), method="quantile") > Warning message: > In log(c(65230.44761, 175.96476, 170.2672, 186.6677, 176.24364, : > NaNs produced The NaN warning means that some of your numbers cannot be logged. Check which values have been converted to NaN's, and determine if they are because of malformed numbers in the text files (unlikely) or due to negative raw values. If you leave the NaN's in, then the ratio for that probe will also be NaN, and you'll loose info about those genes that may go from off to highly on. Thus adding a small offset to make them positive (BEFORE logging them) is quite commonly used. > > Created the following design file, > > design=cbind(norm=c(0,0,0,0,1,1,1,1),test=c(1,1,1,1,0,0,0,0)) > > design > norm test > [1,] 0 1 > [2,] 0 1 > [3,] 0 1 > [4,] 0 1 > [5,] 1 0 > [6,] 1 0 > [7,] 1 0 > [8,] 1 0 > > > fit<- lmFit(MA,design > cont.matrix=makeContrasts(normvstest=test-norm,levels=design)) > fit2=contrasts.fit(fit,cont.matrix) > fit3=eBayes(fit2) > toptable(fit3,genelist=RG$genes,adjust="fdr") > > Whether my analysis is fine? As you mentioned I am getting the > adjusted p values above 0.05, and the data is showing high variance > among the samples (within the test and in normal also). Your code looks good, however i'm not 100% sure that your use of MA is correct, since MA objects usually contain 2 colour data. Generally stats on MA objects operates on the MA$M data.frame, that is, the log2 ratios of red vs green channel. A foolproof approach would be to provide the matrix of log2, bg subtracted single channel data to a single channel normalisation method, such as normalizeQuantiles, then provide this single channel object to the lmFit. > In this scenario should I try other normalization methods like > 'normexp' or 'vsn'. Well, you would need to look into the assumptions behind these methods, and whether their application is appropriate. I haven't used normexp, but for vsn, i usually start by plotting the average expression level vs the stdev of each probe to see if there is an intensity dependent bias that could be fixed by vsn. Perhaps someone else can comment here. > > > One more question I would like to ask is Can I get the fold value > for each test sample by comparing with all the normal? or is there > any other better approach for single color (Agilent) analysis. This > will help me to create the heatmap for genes across the samples. - you can get the logged values for each array by MA$A you can get the averaged values for test and norm from fit $coefficients, and the log ratio for test vs norm by fit2$coefficients, or from the logFC column in the topTable(fit3). If you want a heatmap of all 8 columns of data, then you'll need to use the single channel, normalised object that i talked about above. cheers, Mark > > > Thank you in advance. > > Best > Abhilash > > On Wed, Jun 18, 2008 at 8:43 PM, Abhilash Venu <abhivenu@gmail.com> > wrote: > Hi > > Thank you Mark > Thank you Francois. > Currently I am I analyzing the entire data, once it got over I will > get back to you. > > Regards, > Abhilash > > > On Wed, Jun 18, 2008 at 11:21 AM, Mark Cowley <m.cowley0@gmail.com> > wrote: > Hi Abhilash, > In addition to Francois' comments, for me the biggest indicator is > that your adjusted P-values are all greater than 0.05. > My interpretation of this is that after multiple testing correction, > none of your genes are statistically significantly differentially > expressed. > This probably implies that either the differences between your two > groups are not large, or that there is higher inter-sample variance; > again, plotting some of these DE genes will help inform you as to > which is the case. > > cheers, > Mark > > > On 18/06/2008, at 2:48 AM, Francois Pepin wrote: > > Hi Abilash, > > you might want to read up a bit on those statistics and how they are > generated. It definitely helps when it comes to interpreting their > results properly. > > The B statistic is a log odds ratio. It simply means that the actual > log odds are between 0 and 1: less than 50% probability of > differential expression according to this test. > > The t statistics can be positive or negative. A negative t-statistic > simply means that the mean of the second group ("test" in this case) > is higher. A high negative t-statistic would have the same evidence > of differential expression as a high positive t-statistic. > > Francois > > Abhilash Venu wrote: > Hi Mark, > I think it was a great suggestion and I am providing the result, > which I got > after the command, topTable(fit, coef="normvstest", adjust="fdr"). > I believe this preliminary results solved the problem of very high > fold > value, which I was getting earlier. I will be looking at the entire > data and > proved a better view at the earliest. But I have some doubts in this > table > itself. Here I am getting negative odds ratio and in some cases > negative t > value. What should I do in these scenario? > topTable(fit, coef="normvstest", adjust="fdr") > logFC AveExpr t P.Value adj.P.Val B > 35726 -2.1103554 11.936825 -9.602581 8.028641e-06 0.3614093 -3.912633 > 1968 -1.3413791 9.974470 -6.960746 9.138620e-05 0.7305539 -3.971277 > 5558 -1.6566417 10.885625 -6.506724 1.487751e-04 0.7305539 -3.987395 > 34497 1.0445013 10.251047 6.185219 2.132063e-04 0.7305539 -4.000460 > 33195 -1.3603874 13.373106 -6.116817 2.305481e-04 0.7305539 -4.003438 > 44662 0.9528248 11.180259 6.045345 2.503347e-04 0.7305539 -4.006630 > 24980 -1.5689151 10.824414 -5.932376 2.855094e-04 0.7305539 -4.011846 > 30206 2.2991372 13.647875 5.926758 2.873946e-04 0.7305539 -4.012112 > 26046 -1.1709614 9.505652 -5.746545 3.557246e-04 0.7305539 -4.020911 > 27210 1.4815342 9.416698 5.656415 3.964086e-04 0.7305539 -4.025537 > Thanks in advance > Best > Abhilash > On Tue, Jun 17, 2008 at 4:38 AM, Mark Cowley <m.cowley0@gmail.com> > wrote: > Hi Abhilash, > Your code looks good, except that usually you will want to normalise > log > transformed data. thus try: > > MA<-normalizeBetweenArrays( log2(Rgene$G), method="quantile") > > If your logFC ratios still look very high, then try convincing > yourself of > their accuracy by looking at the raw data (RG$R) for some of the most > differentially expressed genes, and also plot the expression values > for some > of these DE genes. > > good luck, > Mark > Peter Wills Bioinformatics Centre > Garvan Institute of Medical Research > > > > On 17/06/2008, at 1:17 AM, Abhilash Venu wrote: > > Hi list, > I am still wonder about the data, which I analyzed by the limma. I > accept > that I am a biology graduate student, and in the learning stage. I am > analyzing the single color data, which had been generated by Agilent > 4x44k > platform. With the help of mailing list and limma users guide, I > have done > the following analysis. But logFC gives very high values like 320, > 1320 > etc. > I don't know how really the fitting is happening. Can I rely on this > result. > How should I go about it. > #Reading the data. > > RG<-read.maimages(txt_files, columns = list(G = "gMeanSignal", Gb = > "gBGMeanSignal", > R="gMedianSignal",Rb="gBGMedian > > Signal"), > annotation= c("Row", "Col", > "ProbeUID","ProbeName", "GeneName",)) > > > Rgene<-backgroundCorrect(RG,method='subtract') > > #Considering only G as it is single color experiment. > MA<-normalizeBetweenArrays(Rgene$G,method="quantile") > > design <- cbind(norm=1,normvstest=c(1,1,1,1,0,0,0,0)) > fit <- lmFit(MA, design) > fit <- eBayes(fit) > topTable(fit, coef="normvstest", adjust="fdr") > -- > > Regards, > Abhilash > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > ---------------------------------------------------------------------- > Mark Cowley, BSc (Bioinformatics)(Hons) > > Peter Wills Bioinformatics Centre > Garvan Institute of Medical Research > 384 Victoria St Tel: +61 2 9295 8542 > Darlinghurst, NSW 2010 Fax: +61 2 9295 8538 > Australia email: > m.cowley@garvan.org.au > www.garvan.org.au > ---------------------------------------------------------------------- > > > > > > > > -- > > Regards, > Abhilash > > > > -- > > Regards, > Abhilash [[alternative HTML version deleted]]

ADD REPLY • link 16.8 years ago Mark Cowley ▴ 400

Login before adding your answer.