goodness-of-fit for limma

0

Entering edit mode

Ye, Bin ▴ 150

@ye-bin-1280

Last seen 10.2 years ago

Hello, everybody, I'm doing some affy microarray analysis using limma, and I'm not a statistician. I was told that I need to check if the model fits the data before get the significant gene lists. So how should I do it in limma? And is it really necessary? If not, why? Thanks a lot! Bin

Microarray affy limma Microarray affy limma • 1.0k views

ADD COMMENT • link 19.5 years ago Ye, Bin ▴ 150

0

Entering edit mode

Fangxin Hong ▴ 810

@fangxin-hong-912

Last seen 10.2 years ago

Hi, Bin > I'm doing some affy microarray analysis using limma, and I'm not a > statistician. I was told that I need to check if the model fits the data > before get the significant gene lists. So how should I do it in limma? And > is it really necessary? If not, why? >From a statistical viewpoint, only when the model ( for example, linear model used in limma) is a good analogy of the true data generating mechanism, the results ( like differential genes found) are valid. The common check is residual plots, to see whether the residual ( difference between the true value and fitted value) satisfy the assumption. You would extract residuals from lmFit. However, I don't see many people doing this when identifying genes. If you worry about that linear model might not explain the data well, you can go to some other non-parametric methods, like RankProd and siggenes. Hopu this will help Fangxin > Thanks a lot! > > > Bin > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > > -------------------- Fangxin Hong Ph.D. Plant Biology Laboratory The Salk Institute 10010 N. Torrey Pines Rd. La Jolla, CA 92037 E-mail: fhong@salk.edu (Phone): 858-453-4100 ext 1105

ADD COMMENT • link 19.5 years ago Fangxin Hong ▴ 810

0

Entering edit mode

Ye, Bin ▴ 150

@ye-bin-1280

Last seen 10.2 years ago

Dear Fangxin, Thank you very much! I'll check those out. BTW, for our dataset, it's kinda 3x3 factorial design , where we have three different treatment and three time points, but because of the nature of the cell line that we are using, we have controls (untreated cells at the same time points) for all the 3x3= 9 different combinations. So far, I'm doing my analysis by first taking the ratio of each of the 9 data points, then using limma for comparing the differential expression. Example as follows: (c1 to c9 is the corresponding controls). time1 time2 time3 drug1 d1t1/c1 d1t2/c2 d1t3/c3 drug2 d2t1/c4 d2t2/c5 d2t3/c6 drug3 d3t1/c7 d3t2/c8 d3t3/c9 Then comparing all the possible pairs. I'm not a statistician, so I'm not sure if this is a valid way to do the analysis or I should add the dependent variable "controls" to the model? I'm trying to add it to the model, but so far it's still cloudy in my mind on how to make the matrixes. All suggestions will be appreciated. BTW, for finding the specific gene profiles that follow the time serie. Which package will be more approprate? Thanks! Bin -----Original Message----- From: fhong@salk.edu [mailto:fhong@salk.edu] Sent: Tue 6/7/2005 2:54 PM To: Ye, Bin Cc: bioconductor@stat.math.ethz.ch Subject: Re: [BioC] goodness-of-fit for limma Hi, Bin > I'm doing some affy microarray analysis using limma, and I'm not a > statistician. I was told that I need to check if the model fits the data > before get the significant gene lists. So how should I do it in limma? And > is it really necessary? If not, why? >From a statistical viewpoint, only when the model ( for example, linear model used in limma) is a good analogy of the true data generating mechanism, the results ( like differential genes found) are valid. The common check is residual plots, to see whether the residual ( difference between the true value and fitted value) satisfy the assumption. You would extract residuals from lmFit. However, I don't see many people doing this when identifying genes. If you worry about that linear model might not explain the data well, you can go to some other non-parametric methods, like RankProd and siggenes. Hopu this will help Fangxin > Thanks a lot! > > > Bin > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > > -------------------- Fangxin Hong Ph.D. Plant Biology Laboratory The Salk Institute 10010 N. Torrey Pines Rd. La Jolla, CA 92037 E-mail: fhong@salk.edu (Phone): 858-453-4100 ext 1105

ADD COMMENT • link 19.5 years ago Ye, Bin ▴ 150

0

Entering edit mode

> Dear Fangxin, > > Thank you very much! I'll check those out. BTW, for our dataset, it's > kinda 3x3 factorial design , where we have three different treatment and > three time points, but because of the nature of the cell line that we are > using, we have controls (untreated cells at the same time points) for all > the 3x3= 9 different combinations. So far, I'm doing my analysis by first > taking the ratio of each of the 9 data points, then using limma for > comparing the differential expression. Example as follows: (c1 to c9 is > the corresponding controls). > > time1 time2 time3 > drug1 d1t1/c1 d1t2/c2 d1t3/c3 > drug2 d2t1/c4 d2t2/c5 d2t3/c6 > drug3 d3t1/c7 d3t2/c8 d3t3/c9 > > Then comparing all the possible pairs. I'm not a statistician, so I'm not > sure if this is a valid way to do the analysis or I should add the > dependent variable "controls" to the model? I'm trying to add it to the > model, but so far it's still cloudy in my mind on how to make the > matrixes. All suggestions will be appreciated. First, I think you should have a clear idea about what questions you want to answer by analyzing the data. All possible pairwise comparison sounds questionable, for example, d1t1/c1 vs d2t2/c5? And depending on your sampling procedure, e.g. 9 controls are indenpdent of each other ( I assume the your samples at 3*3 table are indenpendent), I don't see why you need to include variable "controls" in your model. It will then be a typical 3*3 factorial design. > BTW, for finding the specific gene profiles that follow the time serie. > Which package will be more approprate? Thanks! Do you mean finding different pattern (time-dependent profile) of gene expression? I don't know any good package of doing that, please let me know if there is any. But for three time points, the pattern will be really simply since no smooth curve can be assumed for gene expression pattern. There are limited number of pattern ( e.g., if you order gene expression atr three time point by its value as 1 (smallest), 2(medium) and 3 (largest)), you may have time1 time2 time3 1 2 3 2 1 3 ..... If this is what you want, go to find a paper on Bioinformatics ( I forgot the title). Hope this helps Fangxin > > > Bin > > > > -----Original Message----- > From: fhong@salk.edu [mailto:fhong@salk.edu] > Sent: Tue 6/7/2005 2:54 PM > To: Ye, Bin > Cc: bioconductor@stat.math.ethz.ch > Subject: Re: [BioC] goodness-of-fit for limma > > > Hi, Bin > >> I'm doing some affy microarray analysis using limma, and I'm not a >> statistician. I was told that I need to check if the model fits the >> data >> before get the significant gene lists. So how should I do it in limma? >> And >> is it really necessary? If not, why? > > From a statistical viewpoint, only when the model ( for example, linear > model used in limma) is a good analogy of the true data generating > mechanism, the results ( like differential genes found) are valid. The > common check is residual plots, to see whether the residual ( difference > between the true value and fitted value) satisfy the assumption. You would > extract residuals from lmFit. > However, I don't see many people doing this when identifying genes. If you > worry about that linear model might not explain the data well, you can go > to some other non-parametric methods, like RankProd and siggenes. > > Hopu this will help > > Fangxin > > > >> Thanks a lot! >> >> >> Bin >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> > > > -------------------- > Fangxin Hong Ph.D. > Plant Biology Laboratory > The Salk Institute > 10010 N. Torrey Pines Rd. > La Jolla, CA 92037 > E-mail: fhong@salk.edu > (Phone): 858-453-4100 ext 1105 > > > > -------------------- Fangxin Hong Ph.D. Plant Biology Laboratory The Salk Institute 10010 N. Torrey Pines Rd. La Jolla, CA 92037 E-mail: fhong@salk.edu (Phone): 858-453-4100 ext 1105

ADD REPLY • link 19.5 years ago Fangxin Hong ▴ 810

Login before adding your answer.