No replicates and differential analysis !!

0

Entering edit mode

Nicolas Servant ▴ 260

@nicolas-servant-1466

Last seen 2.6 years ago

France

Hello, Does anybody know a R package or function to compare expression level (affy data) of two groups with no replicates in each group ? In fact, just compare one array to an other. The purpose is to find differentially expressed genes. We cannot used statistical test (not enougth replicates), but we can used graphical approach based on scatter plot, and outliers detection approach. Thanks for your help, Regards Nicolas. -- Nicolas Servant Equipe Bioinformatique Institut Curie 26, rue d'Ulm - 75248 Paris Cedex 05 - FRANCE Email: Nicolas.Servant at curie.fr Tel: 01 44 32 42 75

• 1.9k views

ADD COMMENT • link updated 18.9 years ago by Aedin Culhane ▴ 510 • written 18.9 years ago by Nicolas Servant ▴ 260

0

Entering edit mode

Ben Bolstad ★ 1.2k

@ben-bolstad-1494

Last seen 7.3 years ago

While I agree wholeheartedly with what others state on the issue of replication and external validation (eg PCR) you might be able to do slightly better with a test statistic based on a probe level analysis. Admittedly there is no polished function for doing this in general right now, but something like my.abatch <- ReadAffy() my.Pset <- fitPLM(my.abatch) ##now assuming you have only two samples PLM.teststatistic <- (coefs(my.Pset)[,1] - coefs(my.Pset)[,2])/(sqrt(se(my.Pset)[,1]^2 + se(Pset)[,2]^2) I have observed that you do slightly better thresholding on this than FC (or log FC to be more exact) on spike-in datasets. Hope that helps, Ben On Wed, 2006-01-25 at 14:34 +0100, Nicolas Servant wrote: > Hello, > > Does anybody know a R package or function to compare expression level > (affy data) of two groups with no replicates in each group ? In fact, > just compare one array to an other. > The purpose is to find differentially expressed genes. > We cannot used statistical test (not enougth replicates), but we can > used graphical approach based on scatter plot, and outliers detection > approach. > > Thanks for your help, > Regards > > Nicolas. >

ADD COMMENT • link 18.9 years ago Ben Bolstad ★ 1.2k

0

Entering edit mode

> While I agree wholeheartedly with what others state on the issue of > replication and external validation (eg PCR) you might be able to do > slightly better with a test statistic based on a probe level analysis. > Admittedly there is no polished function for doing this in general right > now, but something like Wasn't the function 'ppsetApply' in 'affy' meant to be a general function to do whatever one likes across probe sets ? > my.abatch <- ReadAffy() > my.Pset <- fitPLM(my.abatch) > > ##now assuming you have only two samples > > PLM.teststatistic <- (coefs(my.Pset)[,1] - > coefs(my.Pset)[,2])/(sqrt(se(my.Pset)[,1]^2 + se(Pset)[,2]^2) > > > I have observed that you do slightly better thresholding on this than FC > (or log FC to be more exact) on spike-in datasets. > > Hope that helps, > > Ben > > > > > > > On Wed, 2006-01-25 at 14:34 +0100, Nicolas Servant wrote: >> Hello, >> >> Does anybody know a R package or function to compare expression level >> (affy data) of two groups with no replicates in each group ? In fact, >> just compare one array to an other. >> The purpose is to find differentially expressed genes. >> We cannot used statistical test (not enougth replicates), but we can >> used graphical approach based on scatter plot, and outliers detection >> approach. >> >> Thanks for your help, >> Regards >> >> Nicolas. >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >

ADD REPLY • link 18.9 years ago lgautier@altern.org ▴ 950

0

Entering edit mode

On Thu, 2006-01-26 at 15:35 +0100, lgautier at altern.org wrote: > > While I agree wholeheartedly with what others state on the issue of > > replication and external validation (eg PCR) you might be able to do > > slightly better with a test statistic based on a probe level analysis. > > Admittedly there is no polished function for doing this in general right > > now, but something like > > Wasn't the function 'ppsetApply' in 'affy' meant to be a general function > to do whatever one likes across probe sets ? The generality I was referring to was for carrying out tests based on output stored in PLMset objects.

ADD REPLY • link 18.9 years ago Ben Bolstad ★ 1.2k

0

Entering edit mode

Hi Nicolas, I tried to send this message to the mailing list yesterday but it seems not to have got through. Like others I'd caution against using no replicates at all, but I think we have some methods that are useful with no or very few replicates. Like Ben Bolstad said in a previous message, I would suggest that you use the probe-level analysis to estimate the technical variance. We have a method, multi-mgMOS, which estimates the log-signal and associates this estimate with credibility intervals (either variances or percentiles of the posterior log-signal). A paper describing this method is available from, http://bioinformatics.oxfordjournals.org/cgi/content/short/bti583v1 and the most recent code and other publications are on the project web-site, http://umber.sbs.man.ac.uk/resources/puma/ An older version of mmgMOS is available in bioconductor. More recently we have developed a Bayesian t-test, PPLR, which allows this technical variance to be included in determining differential expression from replicated conditions. The PPLR code also works with no replicates, in which case it just uses the technical variance. R code is available from the above website (the paper is submitted and we will get PPLR into bioconductor shortly). I suggest you try mmgMOS, followed by mean or median centering of the log-signal estimates, followed by PPLR. Let me know how you get on - these are new methods and feedback is very welcome. Regards, Magnus. -- Dr Magnus Rattray School of Computer Science, University of Manchester, Oxford Road, Manchester M13 9PL, UK. Tel. +44-161-275-6187 Fax. +44-161-275-6204 > Hello, > > Does anybody know a R package or function to compare expression level > (affy data) of two groups with no replicates in each group ? In fact, > just compare one array to an other. > The purpose is to find differentially expressed genes. > We cannot used statistical test (not enougth replicates), but we can > used graphical approach based on scatter plot, and outliers detection > approach. > > Thanks for your help, > Regards > > Nicolas. >

ADD REPLY • link 18.9 years ago Magnus Rattray ▴ 10

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 4 months ago

United States

On 1/25/06 8:34 AM, "Nicolas Servant" <nicolas.servant at="" curie.fr=""> wrote: > Hello, > > Does anybody know a R package or function to compare expression level > (affy data) of two groups with no replicates in each group ? In fact, > just compare one array to an other. > The purpose is to find differentially expressed genes. > We cannot used statistical test (not enougth replicates), but we can > used graphical approach based on scatter plot, and outliers detection > approach. Simply take array A and divide it by array B. Then rank the genes by those ratios. Sean

ADD COMMENT • link 18.9 years ago Sean Davis 21k

0

Entering edit mode

Thanks for your answer, But in this case, i have to choose a fold change threshold ! And it is supported that the FC tends to be greater at low expression levels. For instance a FC greater than 2 for expression values near 50 is readily seen, but it is low probability to observe FC greater than 2 for expression values near 1000 So i would like to use a more robust approach. Regards, Nicolas S. Sean Davis wrote: > >On 1/25/06 8:34 AM, "Nicolas Servant" <nicolas.servant at="" curie.fr=""> wrote: > > > >>Hello, >> >>Does anybody know a R package or function to compare expression level >>(affy data) of two groups with no replicates in each group ? In fact, >>just compare one array to an other. >>The purpose is to find differentially expressed genes. >>We cannot used statistical test (not enougth replicates), but we can >>used graphical approach based on scatter plot, and outliers detection >>approach. >> >> > >Simply take array A and divide it by array B. Then rank the genes by those >ratios. > >Sean > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > -- Nicolas Servant Equipe Bioinformatique Institut Curie 26, rue d'Ulm - 75248 Paris Cedex 05 - FRANCE Email: Nicolas.Servant at curie.fr Tel: 01 44 32 42 75

ADD REPLY • link 18.9 years ago Nicolas Servant ▴ 260

0

Entering edit mode

Without replication, there is nothing staatistical that is really "robust" because you do not know how variable the data are. In the old industrial design literature, in experiments without replication, a normal probability plot (qqnorm) or half-normal plot were used to identify effects which were too large compared to random normal (which presumably fit most of the effects). You could do something similar (I would suggest using the quantiles of the t3 or t4 distribution rather than a normal) but the method requires 2 assumptions that are very unlikely in the current situation: the responses must be independent (but responses on the same array are dependent) and the responses must be identically distributed as a K*t4 distribution, where K is a constant related to the gene-wise standard deviation - i.e. the SD for all genes must be equal. There is also the volcano plot, which I have never used, but is based on similar ideas. A more robust idea is to use a binary search using PCR and the observed fold differences. Although given the expense, it would be simpler to run a replicate for each condition. --Naomi At 09:19 AM 1/25/2006, Nicolas Servant wrote: >Thanks for your answer, >But in this case, i have to choose a fold change threshold ! And it is >supported that the FC tends to be greater at low expression levels. >For instance a FC greater than 2 for expression values near 50 is >readily seen, but it is low probability to observe FC greater than 2 for >expression values near 1000 >So i would like to use a more robust approach. > >Regards, >Nicolas S. > >Sean Davis wrote: > > > > >On 1/25/06 8:34 AM, "Nicolas Servant" <nicolas.servant at="" curie.fr=""> wrote: > > > > > > > >>Hello, > >> > >>Does anybody know a R package or function to compare expression level > >>(affy data) of two groups with no replicates in each group ? In fact, > >>just compare one array to an other. > >>The purpose is to find differentially expressed genes. > >>We cannot used statistical test (not enougth replicates), but we can > >>used graphical approach based on scatter plot, and outliers detection > >>approach. > >> > >> > > > >Simply take array A and divide it by array B. Then rank the genes by those > >ratios. > > > >Sean > > > >_______________________________________________ > >Bioconductor mailing list > >Bioconductor at stat.math.ethz.ch > >https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > > > >-- >Nicolas Servant >Equipe Bioinformatique >Institut Curie >26, rue d'Ulm - 75248 Paris Cedex 05 - FRANCE > >Email: Nicolas.Servant at curie.fr >Tel: 01 44 32 42 75 > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 18.9 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

On 1/25/06 9:19 AM, "Nicolas Servant" <nicolas.servant at="" curie.fr=""> wrote: > Thanks for your answer, > But in this case, i have to choose a fold change threshold ! And it is > supported that the FC tends to be greater at low expression levels. > For instance a FC greater than 2 for expression values near 50 is > readily seen, but it is low probability to observe FC greater than 2 for > expression values near 1000 > So i would like to use a more robust approach. Unfortunately, I don't think there is a truly more robust approach with only one measurement (the ratio) for each gene. The answer here is to do more replicates or to take your gene list as a source of "candidates" that you then validate using another technology (PCR, for example). If you really need to assign some statistical significance (rather than just ranking genes for further analysis), then I think you have no choice but to do further arrays. Sean

ADD REPLY • link 18.9 years ago Sean Davis 21k

0

Entering edit mode

Nicolas Servant wrote: > Thanks for your answer, > But in this case, i have to choose a fold change threshold ! And it is > supported that the FC tends to be greater at low expression levels. > For instance a FC greater than 2 for expression values near 50 is > readily seen, but it is low probability to observe FC greater than 2 for > expression values near 1000 > So i would like to use a more robust approach. With only two samples, you are stuck with fold changes. However, you might be able to make your results more robust by filtering out those genes that you think are too small. I often use kOverA() in the genefilter package to do this. Best, Jim > > Regards, > Nicolas S. > > Sean Davis wrote: > > >>On 1/25/06 8:34 AM, "Nicolas Servant" <nicolas.servant at="" curie.fr=""> wrote: >> >> >> >> >>>Hello, >>> >>>Does anybody know a R package or function to compare expression level >>>(affy data) of two groups with no replicates in each group ? In fact, >>>just compare one array to an other. >>>The purpose is to find differentially expressed genes. >>>We cannot used statistical test (not enougth replicates), but we can >>>used graphical approach based on scatter plot, and outliers detection >>>approach. >>> >>> >> >>Simply take array A and divide it by array B. Then rank the genes by those >>ratios. >> >>Sean >> >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor at stat.math.ethz.ch >>https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> >> >> > > > -- James W. MacDonald Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623

ADD REPLY • link 18.9 years ago James W. MacDonald 67k

0

Entering edit mode

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20060125/ 5a2a1cb9/attachment.pl

ADD REPLY • link 18.9 years ago Sharon Anbu ▴ 480

0

Entering edit mode

Hi Nicolas, > And it is > supported that the FC tends to be greater at low expression levels. What is supported is that the variance of the _estimate_ of the FC (the true underlying quantity) by the log-ratio of measured probe intensities tends to be greater at low expression levels. Indeed this depends on the preprocessing and background correction. Consider this paper: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&d opt=Abstract&list_uids=12169536 and the accompanying "vsn" package in bioC. It removes the intensity-dependence of the variance, and you can use the "glog- ratio", which is an alternative estimator of FC, to select genes. This amounts to assuming that all genes have the same variance. Of course the assumption is not really true, there can be gene- specific causes for different variances (besides overall intensity). But with only two arrays you have no way of seeing them. Hence, using glog- ratio to select genes when there are no replicates is an extreme version of the moderated t-statistic (which is often used when there are few replicates). Best wishes Wolfgang Nicolas Servant wrote: > Thanks for your answer, > But in this case, i have to choose a fold change threshold ! And it is > supported that the FC tends to be greater at low expression levels. > For instance a FC greater than 2 for expression values near 50 is > readily seen, but it is low probability to observe FC greater than 2 for > expression values near 1000 > So i would like to use a more robust approach. > > Regards, > Nicolas S. > > Sean Davis wrote: > > >>On 1/25/06 8:34 AM, "Nicolas Servant" <nicolas.servant at="" curie.fr=""> wrote: >> >> >> >> >>>Hello, >>> >>>Does anybody know a R package or function to compare expression level >>>(affy data) of two groups with no replicates in each group ? In fact, >>>just compare one array to an other. >>>The purpose is to find differentially expressed genes. >>>We cannot used statistical test (not enougth replicates), but we can >>>used graphical approach based on scatter plot, and outliers detection >>>approach. >>> >>> >> >>Simply take array A and divide it by array B. Then rank the genes by those >>ratios. >> >>Sean >> >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor at stat.math.ethz.ch >>https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> >> >> > > > -- Best regards Wolfgang ------------------------------------- Wolfgang Huber European Bioinformatics Institute European Molecular Biology Laboratory Cambridge CB10 1SD England Phone: +44 1223 494642 Fax: +44 1223 494486 Http: www.ebi.ac.uk/huber

ADD REPLY • link 18.9 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Aedin Culhane ▴ 510

@aedin-culhane-1526

Last seen 5.3 years ago

United States

Hi Nicolas, I recently had to analyse the same type of data. We had only 2 arrays from rare mRNA (each array contained a pool mRNA from 5 animals). Both we had only 2 arrays which we wanted to compare. All we could do was rank the difference of the genes, and take the maximum fold change. We found the expression value/processing of the probeset values made a big different to the number of genes that had a >2 fold difference. When we apply a mas5 to call the expression value, we had over 2,700 genes with greater than a 2 fold change. When gcRMA was used, 260 genes had a 2 fold difference, and with vsn only 11 genes had a 2 fold difference. I have lots of details on this analysis if it will help you. We found most of the genes that mas5 called different were in the low expression range, and could not be trusted. We validated 8 genes which we >2 fold different on both vsn and gcRMA using RT-PCR. We had excellent correlation in all cases. vsn does very slightly "under-estimate" the fold difference. I would definitely trust any genes that have a >2 fold difference when using vsn. I would not trust these if they are called using mas5. The glog transformation is worth applying particularly in these kinds of analyses. We found the glog-ratio to be reliable. Of course we have no real idea of the number of true positives we missed (false -ve). By using vsn, and removing the intensity-dependence of the variance. You can argue that you have removed the denominator of the T-statistic and thus comparing the "mean" difference is valid. Of course the mean, has an n of 1. Thus its just the glog-ratio. Albeit a woolly assumption, at least its gives better basis to your analysis. The second thing I might consider, is checking for replicate probesets on the array, if the replicate probesets agree, then you can be more confident in the result. Although fold change isn't a good statistical measure, a good variance estimate can be difficult. We just completed a comparison of feature selection method (jeffery et al.,) in which we should that at low number of replicates (n<5), rankproducts or even fold change can perform as well as or outperform t-statistic and moderated t-statistic methods, dependent on the variance structure of the data. Hope this helps, Regards Aedin -------------------- www.hsph.harvard.edu/researchers/aculhane.html PDate: Wed, 25 Jan 2006 16:43:51 +0000 From: Wolfgang Huber <huber@ebi.ac.uk> Subject: Re: [BioC] No replicates and differential analysis !! To: Nicolas Servant <nicolas.servant at="" curie.fr=""> Cc: Bioconductor <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <43D7AAC7.9080401 at ebi.ac.uk> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hi Nicolas, > And it is > supported that the FC tends to be greater at low expression levels. What is supported is that the variance of the _estimate_ of the FC (the true underlying quantity) by the log-ratio of measured probe intensities tends to be greater at low expression levels. Indeed this depends on the preprocessing and background correction. Consider this paper: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&d opt=Abstract&list_uids=12169536 and the accompanying "vsn" package in bioC. It removes the intensity-dependence of the variance, and you can use the "glog- ratio", which is an alternative estimator of FC, to select genes. This amounts to assuming that all genes have the same variance. Of course the assumption is not really true, there can be gene- specific causes for different variances (besides overall intensity). But with only two arrays you have no way of seeing them. Hence, using glog- ratio to select genes when there are no replicates is an extreme version of the moderated t-statistic (which is often used when there are few replicates). Best wishes Wolfgang Nicolas Servant wrote: >> Thanks for your answer, >> But in this case, i have to choose a fold change threshold ! And it is >> supported that the FC tends to be greater at low expression levels. >> For instance a FC greater than 2 for expression values near 50 is >> readily seen, but it is low probability to observe FC greater than 2 for >> expression values near 1000 >> So i would like to use a more robust approach. >> >> Regards, >> Nicolas S. >

ADD COMMENT • link 18.9 years ago Aedin Culhane ▴ 510

0

Entering edit mode

Again, we need to be careful about what is "validated" by PCR. If the RNA used for PCR were the same samples hybridized to the arrays, you have validated that the arrays "worked" technically. (And this is certainly worth knowing.) But what we usually want to validate is that the genes are differentially expressed in the population, which can only be validated by use of an independent sample. --Naomi At 11:06 AM 1/26/2006, Aedin Culhane wrote: >Hi Nicolas, >I recently had to analyse the same type of data. We had only 2 arrays >from rare mRNA (each array contained a pool mRNA from 5 animals). Both >we had only 2 arrays which we wanted to compare. All we could do was >rank the difference of the genes, and take the maximum fold change. We >found the expression value/processing of the probeset values made a big >different to the number of genes that had a >2 fold difference. When we >apply a mas5 to call the expression value, we had over 2,700 genes with >greater than a 2 fold change. When gcRMA was used, 260 genes had a 2 >fold difference, and with vsn only 11 genes had a 2 fold difference. I >have lots of details on this analysis if it will help you. We found most >of the genes that mas5 called different were in the low expression >range, and could not be trusted. > >We validated 8 genes which we >2 fold different on both vsn and gcRMA >using RT-PCR. We had excellent correlation in all cases. vsn does very >slightly "under-estimate" the fold difference. I would definitely trust >any genes that have a >2 fold difference when using vsn. I would not >trust these if they are called using mas5. The glog transformation is >worth applying particularly in these kinds of analyses. We found the >glog-ratio to be reliable. Of course we have no real idea of the number >of true positives we missed (false -ve). > >By using vsn, and removing the intensity-dependence of the variance. You >can argue that you have removed the denominator of the T-statistic and >thus comparing the "mean" difference is valid. Of course the mean, has >an n of 1. Thus its just the glog-ratio. Albeit a woolly assumption, at >least its gives better basis to your analysis. > >The second thing I might consider, is checking for replicate probesets >on the array, if the replicate probesets agree, then you can be more >confident in the result. > >Although fold change isn't a good statistical measure, a good variance >estimate can be difficult. We just completed a comparison of feature >selection method (jeffery et al.,) in which we should that at low number >of replicates (n<5), rankproducts or even fold change can perform as >well as or outperform t-statistic and moderated t-statistic methods, >dependent on the variance structure of the data. > >Hope this helps, >Regards >Aedin >-------------------- >www.hsph.harvard.edu/researchers/aculhane.html > > >PDate: Wed, 25 Jan 2006 16:43:51 +0000 >From: Wolfgang Huber <huber at="" ebi.ac.uk=""> >Subject: Re: [BioC] No replicates and differential analysis !! >To: Nicolas Servant <nicolas.servant at="" curie.fr=""> >Cc: Bioconductor <bioconductor at="" stat.math.ethz.ch=""> >Message-ID: <43D7AAC7.9080401 at ebi.ac.uk> >Content-Type: text/plain; charset=ISO-8859-1; format=flowed > >Hi Nicolas, > > > And it is > > supported that the FC tends to be greater at low expression levels. > >What is supported is that the variance of the _estimate_ of the FC (the >true underlying quantity) by the log-ratio of measured probe intensities >tends to be greater at low expression levels. Indeed this depends on the >preprocessing and background correction. Consider this paper: > >http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed& dopt=Abstract&list_uids=12169536 > >and the accompanying "vsn" package in bioC. It removes the >intensity-dependence of the variance, and you can use the "glog- ratio", >which is an alternative estimator of FC, to select genes. This amounts >to assuming that all genes have the same variance. > >Of course the assumption is not really true, there can be gene- specific >causes for different variances (besides overall intensity). But with >only two arrays you have no way of seeing them. Hence, using glog- ratio >to select genes when there are no replicates is an extreme version of >the moderated t-statistic (which is often used when there are few >replicates). > >Best wishes >Wolfgang > > > > >Nicolas Servant wrote: > > >> Thanks for your answer, > >> But in this case, i have to choose a fold change threshold ! And it is > >> supported that the FC tends to be greater at low expression levels. > >> For instance a FC greater than 2 for expression values near 50 is > >> readily seen, but it is low probability to observe FC greater than 2 for > >> expression values near 1000 > >> So i would like to use a more robust approach. > >> > >> Regards, > >> Nicolas S. > > > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 18.9 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Hi Naomi I completely agree with you. I think I was glad to find that anything agreed with an n=1. So we first "validated" (I agree its a bad word) that the results from the arrays made sense using RT-PCR. Then we have followed the findings, and validated using a different in vivo experiemental approach. Aedin Naomi Altman wrote: > Again, we need to be careful about what is "validated" by PCR. > > If the RNA used for PCR were the same samples hybridized to the > arrays, you have validated that the arrays "worked" technically. (And > this is certainly worth knowing.) > > But what we usually want to validate is that the genes are > differentially expressed in the population, which can only be > validated by use of an independent sample. > > --Naomi > > At 11:06 AM 1/26/2006, Aedin Culhane wrote: > >> Hi Nicolas, >> I recently had to analyse the same type of data. We had only 2 arrays >> from rare mRNA (each array contained a pool mRNA from 5 animals). Both >> we had only 2 arrays which we wanted to compare. All we could do was >> rank the difference of the genes, and take the maximum fold change. We >> found the expression value/processing of the probeset values made a big >> different to the number of genes that had a >2 fold difference. When we >> apply a mas5 to call the expression value, we had over 2,700 genes with >> greater than a 2 fold change. When gcRMA was used, 260 genes had a 2 >> fold difference, and with vsn only 11 genes had a 2 fold difference. I >> have lots of details on this analysis if it will help you. We found most >> of the genes that mas5 called different were in the low expression >> range, and could not be trusted. >> >> We validated 8 genes which we >2 fold different on both vsn and gcRMA >> using RT-PCR. We had excellent correlation in all cases. vsn does very >> slightly "under-estimate" the fold difference. I would definitely trust >> any genes that have a >2 fold difference when using vsn. I would not >> trust these if they are called using mas5. The glog transformation is >> worth applying particularly in these kinds of analyses. We found the >> glog-ratio to be reliable. Of course we have no real idea of the number >> of true positives we missed (false -ve). >> >> By using vsn, and removing the intensity-dependence of the variance. You >> can argue that you have removed the denominator of the T-statistic and >> thus comparing the "mean" difference is valid. Of course the mean, has >> an n of 1. Thus its just the glog-ratio. Albeit a woolly assumption, at >> least its gives better basis to your analysis. >> >> The second thing I might consider, is checking for replicate probesets >> on the array, if the replicate probesets agree, then you can be more >> confident in the result. >> >> Although fold change isn't a good statistical measure, a good variance >> estimate can be difficult. We just completed a comparison of feature >> selection method (jeffery et al.,) in which we should that at low number >> of replicates (n<5), rankproducts or even fold change can perform as >> well as or outperform t-statistic and moderated t-statistic methods, >> dependent on the variance structure of the data. >> >> Hope this helps, >> Regards >> Aedin >> -------------------- >> www.hsph.harvard.edu/researchers/aculhane.html >> >> >> PDate: Wed, 25 Jan 2006 16:43:51 +0000 >> From: Wolfgang Huber <huber at="" ebi.ac.uk=""> >> Subject: Re: [BioC] No replicates and differential analysis !! >> To: Nicolas Servant <nicolas.servant at="" curie.fr=""> >> Cc: Bioconductor <bioconductor at="" stat.math.ethz.ch=""> >> Message-ID: <43D7AAC7.9080401 at ebi.ac.uk> >> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >> >> Hi Nicolas, >> >> > And it is >> > supported that the FC tends to be greater at low expression levels. >> >> What is supported is that the variance of the _estimate_ of the FC (the >> true underlying quantity) by the log-ratio of measured probe intensities >> tends to be greater at low expression levels. Indeed this depends on the >> preprocessing and background correction. Consider this paper: >> >> http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubme d&dopt=Abstract&list_uids=12169536 >> >> >> and the accompanying "vsn" package in bioC. It removes the >> intensity-dependence of the variance, and you can use the "glog- ratio", >> which is an alternative estimator of FC, to select genes. This amounts >> to assuming that all genes have the same variance. >> >> Of course the assumption is not really true, there can be gene- specific >> causes for different variances (besides overall intensity). But with >> only two arrays you have no way of seeing them. Hence, using glog- ratio >> to select genes when there are no replicates is an extreme version of >> the moderated t-statistic (which is often used when there are few >> replicates). >> >> Best wishes >> Wolfgang >> >> >> >> >> Nicolas Servant wrote: >> >> >> Thanks for your answer, >> >> But in this case, i have to choose a fold change threshold ! And >> it is >> >> supported that the FC tends to be greater at low expression levels. >> >> For instance a FC greater than 2 for expression values near 50 is >> >> readily seen, but it is low probability to observe FC greater than >> 2 for >> >> expression values near 1000 >> >> So i would like to use a more robust approach. >> >> >> >> Regards, >> >> Nicolas S. >> > >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111

ADD REPLY • link 18.9 years ago Aedin Culhane ▴ 510

Login before adding your answer.