Question

BioC] using limma with no replicates

0

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 14 minutes ago

WEHI, Melbourne, Australia

Dear Pedro, The strategy you are proposing is to ignore experimental factors which you think will have relatively small effects, so as to generate some degrees of freedom for error. This is an ok strategy, long used in statistics, as long as you understand clearly what you are testing for. If you do this, limma will try to find genes which have differential expression which stands out relative to the effects you have ignored. Power is not the issue here. This approach is actually conservative, in that the residual variability will be larger than if you had true replicate arrays, hence you will find fewer DE genes than you might otherwise. Best wishes Gordon >Date: Fri, 31 Mar 2006 12:48:20 +0200 >From: Pedro L?pez Romero <plopez at="" cnic.es=""> >Subject: [BioC] using limma with no replicates >To: <bioconductor at="" stat.math.ethz.ch=""> > >Dear list, > >I have been given with some data to analyze. Unfortunately they only gave 1 >replicate per experimental condition, so I do not expect to draw meaningful >information from here. Anyway, I would like to use limma, since I expect >that this could be more powerful than the mere inspection of the log2 fold >change. > >Despite I do not have "true biological replicates", I think that I can >group (in the design matrix) some arrays as if they were replicates >according to the correlations that I expect from the experimental conditions >and how the data have been generated. For example, I can group 2 arrays that >belong to the same strain, although they have been treated a bit different, >or I can group 2 arrays that belong to the same strain and treatment but >different age of the mouse. This "grouped data" are not going to be part of >the contrast. My intention (and I do not know if it is right) is to group >some correlated data to have some degrees of freedom available to make it >possible the estimates of the variance, and then to make contrasts with >other 2 non replicated arrays.- I think that this would be somehow more >powerful than the log2 fold change inspection, since the information is >better handled trough the empirical Bayes that limma implements, but I would >feel better if someone back me up, because I am not pretty sure if this is a >good idea. > > >Some piece of my code: > >design= model.matrix(~ -1 + factor(c(1,2,3,3,5,6,7,8))) >colnames(design) =c("WT","upa","g1","f5","f6","f7","f8") > > here g1 groups the same strain (and different from other >strains), and same age of the mouse but slight different pharmacologicall >treatment, and I will compare f5 vs f6 (this are the same strain and >different from g1, are the same age, but treatment are different) > >CM= makeContrasts(f5-f6,levels=design) > > >Doing this, the M values that I observe in the top list are quite high (> >6), but the differences are not significant. I think that this is due to the >absence of replication in a very noisy sistem. > >ID M A t P.Value B >23620 mCG147262 -9.0828928978708 > 7.04453315872284 -20.6287756557693 > -0.823196144084987 >19275 mCG1047122 -6.22956426050092 >.91829704792039 -15.5769614644597 1 -0.940793980765775 > >If I use genefilter to filter out some genes, some genes appear significant >DE though. Would it be possible to explain this just by saying that fdr-like >techniques becomes more sensitive as less comparison are done?? > >ID M A t P.Value B >263 mCG142389 -7.97481171094547 >.73475871266083 -5.3168578969303 0.00832939443377308 >6.57330274986848 >6756 BC027122 -7.40473059624002 >.77564203692944 -4.93678117706839 0.0313305586976585 >4.89829085664067 > > >I would appreciate any comment or suggestion very much.- >Thank you. > >plr.-

genefilter limma genefilter limma • 1.9k views

ADD COMMENT • link updated 18.7 years ago by Pedro López Romero ▴ 360 • written 18.7 years ago by Gordon Smyth 52k

score 0 · Answer 1 · 2006-04-03

Thanks for the replay, Yes, this is basically the issue, to group some arrays that I expect to be experimentally correlated to have available some df that makes affordable the estimate of the variance. My great concern was that I could be violating some limma model assumptions that lead to a parameter estimates with no sense at all. I still have to slight doubts that you probably could clarify to me. - I am doing the grouping in two ways, 1) I group for instance, arrays 1-2-3 and 4-5-6 and compare this two groups to look for DE. Here, I think that the systematic experimental effects are confounded in the two groups and the problem would be if exists some interaction effect between the expression and some of the experimetal conditions. 2) I group arrays 1-2, and compare arrays 3 vs 4, and 5 vs 6. Be aware that in this case I am grouping some arrays (to have some df) but I am comparing single replicated arrays. I think that in this way I can obtain a estimate of the variance that is going to improve the analysis in comparison to the arrays alone (log2 fold change approach), am I right?.- In fact, I am observing that I have large values of the M values but nothing seeem to be DE after multiple correction. I think that this is due to the fact that the estimated error variance is quite large and only extreme DE genes could be detected, is it right?. Thanks a lot. Pedro. -----Mensaje original----- De: Gordon Smyth [mailto:smyth at wehi.edu.au] Enviado el: domingo, 02 de abril de 2006 1:08 Para: Pedro L?pez Romero CC: bioconductor at stat.math.ethz.ch Asunto: BioC] using limma with no replicates Dear Pedro, The strategy you are proposing is to ignore experimental factors which you think will have relatively small effects, so as to generate some degrees of freedom for error. This is an ok strategy, long used in statistics, as long as you understand clearly what you are testing for. If you do this, limma will try to find genes which have differential expression which stands out relative to the effects you have ignored. Power is not the issue here. This approach is actually conservative, in that the residual variability will be larger than if you had true replicate arrays, hence you will find fewer DE genes than you might otherwise. Best wishes Gordon >Date: Fri, 31 Mar 2006 12:48:20 +0200 >From: Pedro L?pez Romero <plopez at="" cnic.es=""> >Subject: [BioC] using limma with no replicates >To: <bioconductor at="" stat.math.ethz.ch=""> > >Dear list, > >I have been given with some data to analyze. Unfortunately they only gave 1 >replicate per experimental condition, so I do not expect to draw meaningful >information from here. Anyway, I would like to use limma, since I expect >that this could be more powerful than the mere inspection of the log2 fold >change. > >Despite I do not have "true biological replicates", I think that I can >group (in the design matrix) some arrays as if they were replicates >according to the correlations that I expect from the experimental conditions >and how the data have been generated. For example, I can group 2 arrays that >belong to the same strain, although they have been treated a bit different, >or I can group 2 arrays that belong to the same strain and treatment but >different age of the mouse. This "grouped data" are not going to be part of >the contrast. My intention (and I do not know if it is right) is to group >some correlated data to have some degrees of freedom available to make it >possible the estimates of the variance, and then to make contrasts with >other 2 non replicated arrays.- I think that this would be somehow more >powerful than the log2 fold change inspection, since the information is >better handled trough the empirical Bayes that limma implements, but I would >feel better if someone back me up, because I am not pretty sure if this is a >good idea. > > >Some piece of my code: > >design= model.matrix(~ -1 + factor(c(1,2,3,3,5,6,7,8))) >colnames(design) =c("WT","upa","g1","f5","f6","f7","f8") > > here g1 groups the same strain (and different from other >strains), and same age of the mouse but slight different pharmacologicall >treatment, and I will compare f5 vs f6 (this are the same strain and >different from g1, are the same age, but treatment are different) > >CM= makeContrasts(f5-f6,levels=design) > > >Doing this, the M values that I observe in the top list are quite high (> >6), but the differences are not significant. I think that this is due to the >absence of replication in a very noisy sistem. > >ID M A t P.Value B >23620 mCG147262 -9.0828928978708 > 7.04453315872284 -20.6287756557693 > -0.823196144084987 >19275 mCG1047122 -6.22956426050092 >.91829704792039 -15.5769614644597 -0.940793980765775 > >If I use genefilter to filter out some genes, some genes appear significant >DE though. Would it be possible to explain this just by saying that fdr-like >techniques becomes more sensitive as less comparison are done?? > >ID M A t P.Value B >263 mCG142389 -7.97481171094547 >.73475871266083 -5.3168578969303 0.00832939443377308 >6.57330274986848 >6756 BC027122 -7.40473059624002 >.77564203692944 -4.93678117706839 0.0313305586976585 >4.89829085664067 > > >I would appreciate any comment or suggestion very much.- >Thank you. > >plr.-