short time-course design. Any suggestion?

0

Entering edit mode

stecalza@tiscali.it ▴ 290

@stecalzatiscaliit-259

Last seen 10.7 years ago

Hi everybody. I'm looking at a small experiment with 12 chips (Affy), from 3 different cell lines measured at 4 different time points (0,2 hours, 8 h, 24 h). 1) mas5 expression values 2) selected about 1500 genes (out of ~22000) using GO annotations for those BP of possible interest 3) selected genes with at least 25% Presence/Calls (I know this is quite arbitrary). 4) ANOVA using gls with Compound Symmetry correlation structure 5) p value corrected either using p.adjust(...,"fdr") or computing Q values. I actually get few "significant" genes and mostly with low fold-change (relative to time 0) and overall low expression intensities. Any objection about all this and/or any suggestion for improvement? Thanks in advance, Ste

GO GO • 1.3k views

ADD COMMENT • link 20.8 years ago stecalza@tiscali.it ▴ 290

0

Entering edit mode

Matthew Hannah ▴ 940

@matthew-hannah-621

Last seen 10.7 years ago

Not too in depth but in my view it would be improved by using GCRMA or RMA, ignoring PA calls. Doing an unbiased analysis and then looking at the GO annotations of the differentially expressed genes after the analysis. I can't really advise on the ANOVA, but I guess Limma would be worth a look. HTH, Matt

ADD COMMENT • link 20.8 years ago Matthew Hannah ▴ 940

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 4.0 years ago

United States

You appear to have no replicates. Without replication you cannot do any statistical analysis such as ANOVA or limma. --Naomi At 06:10 PM 7/19/2004 +0000, Stefano Calza wrote: >Hi everybody. > >I'm looking at a small experiment with 12 chips (Affy), from 3 different >cell lines measured at 4 different time points (0,2 hours, 8 h, 24 h). > >1) mas5 expression values >2) selected about 1500 genes (out of ~22000) using GO annotations for >those BP of possible interest >3) selected genes with at least 25% Presence/Calls (I know this is quite >arbitrary). >4) ANOVA using gls with Compound Symmetry correlation structure >5) p value corrected either using p.adjust(...,"fdr") or computing Q values. > >I actually get few "significant" genes and mostly with low fold- change >(relative to time 0) and overall low expression intensities. >Any objection about all this and/or any suggestion for improvement? > >Thanks in advance, >Ste > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD COMMENT • link 20.8 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

a) Are you interested in the difference in cell lines over times OR b) are you treating the different cell lines as biological replicates Assuming the latter, you have a oneway anova with time as a main factor and 3 replicates at each time point. I would suggest you try RMA and GC-RMA on the whole dataset first and truncating your list later. The truncation at step 2 ignores more than 90% of the genes and your number of true positives will be quite low. You can use GO tools (I think BioConductor have some packages to handle these) on the final gene list to see if your favourite pathway is involved. On Tue, 2004-07-20 at 18:17, Naomi Altman wrote: > You appear to have no replicates. Without replication you cannot do any > statistical analysis such as ANOVA or limma. > > --Naomi > > At 06:10 PM 7/19/2004 +0000, Stefano Calza wrote: > >Hi everybody. > > > >I'm looking at a small experiment with 12 chips (Affy), from 3 different > >cell lines measured at 4 different time points (0,2 hours, 8 h, 24 h). > > > >1) mas5 expression values > >2) selected about 1500 genes (out of ~22000) using GO annotations for > >those BP of possible interest > >3) selected genes with at least 25% Presence/Calls (I know this is quite > >arbitrary). > >4) ANOVA using gls with Compound Symmetry correlation structure > >5) p value corrected either using p.adjust(...,"fdr") or computing Q values. > > > >I actually get few "significant" genes and mostly with low fold- change > >(relative to time 0) and overall low expression intensities. > >Any objection about all this and/or any suggestion for improvement? > > > >Thanks in advance, > >Ste > > > >_______________________________________________ > >Bioconductor mailing list > >Bioconductor@stat.math.ethz.ch > >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Bioinformatics Consulting Center > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >

ADD REPLY • link 20.8 years ago Adaikalavan Ramasamy ★ 1.8k

0

Entering edit mode

On Tue, Jul 20, 2004 at 07:00:37PM +0100, Adaikalavan Ramasamy wrote: > a) Are you interested in the difference in cell lines over times OR > b) are you treating the different cell lines as biological replicates > > Assuming the latter, you have a oneway anova with time as a main factor > and 3 replicates at each time point. That's right. Sorry, my description was not that clear. This is what I did, an ANOVA with time as a main factor, but assuming a correlation structure among observations > > I would suggest you try RMA and GC-RMA on the whole dataset first and > truncating your list later. The truncation at step 2 ignores more than > 90% of the genes and your number of true positives will be quite low. 1) Using all the genes (or most of the genes after a bit of unspecified filtering such as on the lowest expression value across samples and on the CV) brings to such a big number of comparison that after correction none appears to be significant. Nevertheless I could use this as an exploratory approach, i.e. to rank genes. 2) Prefiltering using an "a priori" biological framework would mean (but please correct me if I'm wrong) asking a different question: among those genes related to some biological process I'm interested in, which are actually differentially expressed? Why shall I use RMA? E.g. with a very naive approach (i.e. computing F statistics without considering correlation among observations with arrayMagic = faster!) I get that mas5 values gives more higher F values (a simple qqplot can help). Also the overall analysis doesn't improve using rma. I know of affycomp but I never used it. I'll try. Thanks. Ste > You can use GO tools (I think BioConductor have some packages to handle > these) on the final gene list to see if your favourite pathway is > involved. > > > > On Tue, 2004-07-20 at 18:17, Naomi Altman wrote: > > You appear to have no replicates. Without replication you cannot do any > > statistical analysis such as ANOVA or limma. > > > > --Naomi > > > > At 06:10 PM 7/19/2004 +0000, Stefano Calza wrote: > > >Hi everybody. > > > > > >I'm looking at a small experiment with 12 chips (Affy), from 3 different > > >cell lines measured at 4 different time points (0,2 hours, 8 h, 24 h). > > > > > >1) mas5 expression values > > >2) selected about 1500 genes (out of ~22000) using GO annotations for > > >those BP of possible interest > > >3) selected genes with at least 25% Presence/Calls (I know this is quite > > >arbitrary). > > >4) ANOVA using gls with Compound Symmetry correlation structure > > >5) p value corrected either using p.adjust(...,"fdr") or computing Q values. > > > > > >I actually get few "significant" genes and mostly with low fold- change > > >(relative to time 0) and overall low expression intensities. > > >Any objection about all this and/or any suggestion for improvement? > > > > > >Thanks in advance, > > >Ste > > > > > >_______________________________________________ > > >Bioconductor mailing list > > >Bioconductor@stat.math.ethz.ch > > >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > > > Naomi S. Altman 814-865-3791 (voice) > > Associate Professor > > Bioinformatics Consulting Center > > Dept. of Statistics 814-863-7114 (fax) > > Penn State University 814-865-1348 (Statistics) > > University Park, PA 16802-2111 > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor -- Stefano Calza, Sezione di Statistica Medica Dip. di Scienze Biomediche e Biotecnologie Universit? degli Studi di Brescia - Italy Viale Europa, 11 25123 Brescia email: calza@med.unibs.it Telefono/Phone: +390303717532 Fax: +390303701157

ADD REPLY • link 20.8 years ago stecalza@tiscali.it ▴ 290

0

Entering edit mode

stecalza@tiscali.it ▴ 290

@stecalzatiscaliit-259

Last seen 10.7 years ago

On Tue, Jul 20, 2004 at 07:00:37PM +0100, Adaikalavan Ramasamy wrote: > a) Are you interested in the difference in cell lines over times OR > b) are you treating the different cell lines as biological replicates > > Assuming the latter, you have a oneway anova with time as a main factor > and 3 replicates at each time point. That's right. Sorry, my description was not that clear. This is what I did, an ANOVA with time as a main factor, but assuming a correlation structure among observations > > I would suggest you try RMA and GC-RMA on the whole dataset first and > truncating your list later. The truncation at step 2 ignores more than > 90% of the genes and your number of true positives will be quite low. 1) Using all the genes (or most of the genes after a bit of unspecified filtering such as on the lowest expression value across samples and on the CV) brings to such a big number of comparison that after correction none appears to be significant. Nevertheless I could use this as an exploratory approach, i.e. to rank genes. 2) Prefiltering using an "a priori" biological framework would mean (but please correct me if I'm wrong) asking a different question: among those genes related to some biological process I'm interested in, which are actually differentially expressed? Why shall I use RMA? E.g. with a very naive approach (i.e. computing F statistics without considering correlation among observations with arrayMagic = faster!) I get that mas5 values gives more higher F values (a simple qqplot can help). Also the overall analysis doesn't improve using rma. I know of affycomp but I never used it. I'll try. Thanks. Ste > You can use GO tools (I think BioConductor have some packages to handle > these) on the final gene list to see if your favourite pathway is > involved. > > > > On Tue, 2004-07-20 at 18:17, Naomi Altman wrote: > > You appear to have no replicates. Without replication you cannot do any > > statistical analysis such as ANOVA or limma. > > > > --Naomi > > > > At 06:10 PM 7/19/2004 +0000, Stefano Calza wrote: > > >Hi everybody. > > > > > >I'm looking at a small experiment with 12 chips (Affy), from 3 different > > >cell lines measured at 4 different time points (0,2 hours, 8 h, 24 h). > > > > > >1) mas5 expression values > > >2) selected about 1500 genes (out of ~22000) using GO annotations for > > >those BP of possible interest > > >3) selected genes with at least 25% Presence/Calls (I know this is quite > > >arbitrary). > > >4) ANOVA using gls with Compound Symmetry correlation structure > > >5) p value corrected either using p.adjust(...,"fdr") or computing Q values. > > > > > >I actually get few "significant" genes and mostly with low fold- change > > >(relative to time 0) and overall low expression intensities. > > >Any objection about all this and/or any suggestion for improvement? > > > > > >Thanks in advance, > > >Ste > > > > > >_______________________________________________ > > >Bioconductor mailing list > > >Bioconductor@stat.math.ethz.ch > > >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > > > Naomi S. Altman 814-865-3791 (voice) > > Associate Professor > > Bioinformatics Consulting Center > > Dept. of Statistics 814-863-7114 (fax) > > Penn State University 814-865-1348 (Statistics) > > University Park, PA 16802-2111 > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor

ADD COMMENT • link 20.8 years ago stecalza@tiscali.it ▴ 290

Login before adding your answer.