LIMMA vs. dChip

0

Entering edit mode

jun.yan.a@utoronto.ca ▴ 10

@junyanautorontoca-1133

Last seen 10.7 years ago

Dear list member, I have a set of Affymetrix data of 10 arrays, HG_U133A, seperated into unpaired two groups of 5 arrays each. I processed the data using LIMMA and dChip. For dChip, I used all the default setting. The resulted differential expressed genes of the two have only less than 50% in common. Why the number of the overlapped genes of the two results is so low? Is there any problems? Can anyone help me? Thanks in advance, Jun

limma limma • 1.4k views

ADD COMMENT • link updated 20.1 years ago by Stephen Henderson ★ 1.0k • written 20.2 years ago by jun.yan.a@utoronto.ca ▴ 10

0

Entering edit mode

Adaikalavan Ramasamy ★ 1.8k

@adaikalavan-ramasamy-675

Last seen 10.7 years ago

Your question is bit vague and you provide little information. I do not think LIMMA has preprocessing capabilities for Affymetrix data. 1) How did you preprocess the data ? 2) How did you "analyse" your data in dChip ? What technique (e.g. fold change, t-test, wilcoxon) did you use in dChip ? 3) How did you select the differentially expressed genes ? (e.g. via p- value cutoff or biological significance). One possibility is that you are using very different test statistics. With 5 in each group, it is difficult to draw any conclusions as some methods are more robust than others at small number of arrays. Another is that you choose a threshold that includes a lot of noisy gene. An extreme example is to select all genes with a p-value less than 1 in which case you get 100% agreement between the two methods. And yet another, you may have made a coding/programming error somewhere. Regards, Adai On Mon, 2005-03-07 at 14:15 -0500, jun.yan.a@utoronto.ca wrote: > Dear list member, > I have a set of Affymetrix data of 10 arrays, HG_U133A, seperated into unpaired > two groups of 5 arrays each. I processed the data using LIMMA and dChip. For > dChip, I used all the default setting. The resulted differential expressed > genes of the two have only less than 50% in common. > > Why the number of the overlapped genes of the two results is so low? Is there > any problems? Can anyone help me? > > Thanks in advance, > Jun > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >

ADD COMMENT • link 20.2 years ago Adaikalavan Ramasamy ★ 1.8k

0

Entering edit mode

We normalized the same data set using RMA and a very similar procedure that used Tukey's biweight within array to combine probes into gene expression, instead of median polish. We then applied 2-sample t-tests and SAM to both sets of data. The overlap in the "top 100" and "top 200" sets of differentially expressed genes was 50%. Normalization makes a huge difference, even though the correlation between the expression values, array by array, can be very close to 100%. This has been found many times. The recent thread "RMA vs gcRMA" sheds some light on this problem. I suspect that much of the difference lies in the low expressing genes - but this does not mean that these genes are "absent". --Naomi At 02:46 PM 3/7/2005, Adaikalavan Ramasamy wrote: >Your question is bit vague and you provide little information. I do not >think LIMMA has preprocessing capabilities for Affymetrix data. > >1) How did you preprocess the data ? > >2) How did you "analyse" your data in dChip ? What technique (e.g. fold >change, t-test, wilcoxon) did you use in dChip ? > >3) How did you select the differentially expressed genes ? (e.g. via p- >value cutoff or biological significance). > > >One possibility is that you are using very different test statistics. >With 5 in each group, it is difficult to draw any conclusions as some >methods are more robust than others at small number of arrays. > >Another is that you choose a threshold that includes a lot of noisy >gene. An extreme example is to select all genes with a p-value less than >1 in which case you get 100% agreement between the two methods. > >And yet another, you may have made a coding/programming error somewhere. > >Regards, Adai > > > >On Mon, 2005-03-07 at 14:15 -0500, jun.yan.a@utoronto.ca wrote: > > Dear list member, > > I have a set of Affymetrix data of 10 arrays, HG_U133A, seperated into > unpaired > > two groups of 5 arrays each. I processed the data using LIMMA and > dChip. For > > dChip, I used all the default setting. The resulted differential expressed > > genes of the two have only less than 50% in common. > > > > Why the number of the overlapped genes of the two results is so low? Is > there > > any problems? Can anyone help me? > > > > Thanks in advance, > > Jun > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 20.1 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

> We normalized the same data set using RMA and a very similar procedure > that > Normalization makes a huge difference, even though the correlation between > the expression values, array by array, can be very close to 100%. This > has > been found many times. The recent thread "RMA vs gcRMA" sheds some light > on this problem. I suspect that much of the difference lies in the low > expressing genes - but this does not mean that these genes are "absent". I agree with Naomi, those low expressing genes might still present, although the expressions are low. For RMA and GCRMA normalized data, some low expressing data also agree well, while there is also discrepancy in high expression part. My confusion is what to do? Filtering genes with inconsistent result from RAM and GCRMA, or filtering genes with low intensities (MAS5 call?) and use one normalization result to draw conclusion? Thanks! Fangxin > At 02:46 PM 3/7/2005, Adaikalavan Ramasamy wrote: >>Your question is bit vague and you provide little information. I do not >>think LIMMA has preprocessing capabilities for Affymetrix data. >> >>1) How did you preprocess the data ? >> >>2) How did you "analyse" your data in dChip ? What technique (e.g. fold >>change, t-test, wilcoxon) did you use in dChip ? >> >>3) How did you select the differentially expressed genes ? (e.g. via p- >>value cutoff or biological significance). >> >> >>One possibility is that you are using very different test statistics. >>With 5 in each group, it is difficult to draw any conclusions as some >>methods are more robust than others at small number of arrays. >> >>Another is that you choose a threshold that includes a lot of noisy >>gene. An extreme example is to select all genes with a p-value less than >>1 in which case you get 100% agreement between the two methods. >> >>And yet another, you may have made a coding/programming error somewhere. >> >>Regards, Adai >> >> >> >>On Mon, 2005-03-07 at 14:15 -0500, jun.yan.a@utoronto.ca wrote: >> > Dear list member, >> > I have a set of Affymetrix data of 10 arrays, HG_U133A, seperated into >> unpaired >> > two groups of 5 arrays each. I processed the data using LIMMA and >> dChip. For >> > dChip, I used all the default setting. The resulted differential >> expressed >> > genes of the two have only less than 50% in common. >> > >> > Why the number of the overlapped genes of the two results is so low? >> Is >> there >> > any problems? Can anyone help me? >> > >> > Thanks in advance, >> > Jun >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor@stat.math.ethz.ch >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > >> >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor@stat.math.ethz.ch >>https://stat.ethz.ch/mailman/listinfo/bioconductor > > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Bioinformatics Consulting Center > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > > -- Fangxin Hong, Ph.D. Plant Biology Laboratory The Salk Institute 10010 N. Torrey Pines Rd. La Jolla, CA 92037 E-mail: fhong@salk.edu

ADD REPLY • link 20.1 years ago Fangxin Hong ▴ 810

0

Entering edit mode

Stephen Henderson ★ 1.0k

@stephen-henderson-71

Last seen 8.0 years ago

What result do you get if you try and estimate how many are changing and the spearman rank correlation for that set? This seems a more meaningful metric as up to 50% of genes in some experiments maybe changing. -----Original Message----- From: Naomi Altman To: ramasamy@cancer.org.uk; jun.yan.a@utoronto.ca Cc: BioConductor mailing list Sent: 3/13/05 6:00 PM Subject: Re: [BioC] LIMMA vs. dChip We normalized the same data set using RMA and a very similar procedure that used Tukey's biweight within array to combine probes into gene expression, instead of median polish. We then applied 2-sample t-tests and SAM to both sets of data. The overlap in the "top 100" and "top 200" sets of differentially expressed genes was 50%. Normalization makes a huge difference, even though the correlation between the expression values, array by array, can be very close to 100%. This has been found many times. The recent thread "RMA vs gcRMA" sheds some light on this problem. I suspect that much of the difference lies in the low expressing genes - but this does not mean that these genes are "absent". --Naomi At 02:46 PM 3/7/2005, Adaikalavan Ramasamy wrote: >Your question is bit vague and you provide little information. I do not >think LIMMA has preprocessing capabilities for Affymetrix data. > >1) How did you preprocess the data ? > >2) How did you "analyse" your data in dChip ? What technique (e.g. fold >change, t-test, wilcoxon) did you use in dChip ? > >3) How did you select the differentially expressed genes ? (e.g. via p- >value cutoff or biological significance). > > >One possibility is that you are using very different test statistics. >With 5 in each group, it is difficult to draw any conclusions as some >methods are more robust than others at small number of arrays. > >Another is that you choose a threshold that includes a lot of noisy >gene. An extreme example is to select all genes with a p-value less than >1 in which case you get 100% agreement between the two methods. > >And yet another, you may have made a coding/programming error somewhere. > >Regards, Adai > > > >On Mon, 2005-03-07 at 14:15 -0500, jun.yan.a@utoronto.ca wrote: > > Dear list member, > > I have a set of Affymetrix data of 10 arrays, HG_U133A, seperated into > unpaired > > two groups of 5 arrays each. I processed the data using LIMMA and > dChip. For > > dChip, I used all the default setting. The resulted differential expressed > > genes of the two have only less than 50% in common. > > > > Why the number of the overlapped genes of the two results is so low? Is > there > > any problems? Can anyone help me? > > > > Thanks in advance, > > Jun > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111 _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor ********************************************************************** This email and any files transmitted with it are confidentia...{{dropped}}

ADD COMMENT • link 20.1 years ago Stephen Henderson ★ 1.0k

0

Entering edit mode

We did not do any further analysis, and we currently have no plans to do any. To really solve this, a properly designed experiment, possibly WT versus a well-understood knockout, should be done. The data we have at hand is not suitable to determine which normalization is best for determining differential expression. --Naomi At 04:54 AM 3/14/2005, Stephen Henderson wrote: >What result do you get if you try and estimate how many are changing and the >spearman rank correlation for that set? > >This seems a more meaningful metric as up to 50% of genes in some >experiments maybe changing. > > > >-----Original Message----- >From: Naomi Altman >To: ramasamy@cancer.org.uk; jun.yan.a@utoronto.ca >Cc: BioConductor mailing list >Sent: 3/13/05 6:00 PM >Subject: Re: [BioC] LIMMA vs. dChip > >We normalized the same data set using RMA and a very similar procedure >that >used Tukey's biweight within array to combine probes into gene >expression, >instead of median polish. We then applied 2-sample t-tests and SAM to >both >sets of data. The overlap in the "top 100" and "top 200" sets of >differentially expressed genes was 50%. > >Normalization makes a huge difference, even though the correlation >between >the expression values, array by array, can be very close to 100%. This >has >been found many times. The recent thread "RMA vs gcRMA" sheds some >light >on this problem. I suspect that much of the difference lies in the low >expressing genes - but this does not mean that these genes are "absent". > >--Naomi > >At 02:46 PM 3/7/2005, Adaikalavan Ramasamy wrote: > >Your question is bit vague and you provide little information. I do not > >think LIMMA has preprocessing capabilities for Affymetrix data. > > > >1) How did you preprocess the data ? > > > >2) How did you "analyse" your data in dChip ? What technique (e.g. fold > >change, t-test, wilcoxon) did you use in dChip ? > > > >3) How did you select the differentially expressed genes ? (e.g. via p- > >value cutoff or biological significance). > > > > > >One possibility is that you are using very different test statistics. > >With 5 in each group, it is difficult to draw any conclusions as some > >methods are more robust than others at small number of arrays. > > > >Another is that you choose a threshold that includes a lot of noisy > >gene. An extreme example is to select all genes with a p-value less >than > >1 in which case you get 100% agreement between the two methods. > > > >And yet another, you may have made a coding/programming error >somewhere. > > > >Regards, Adai > > > > > > > >On Mon, 2005-03-07 at 14:15 -0500, jun.yan.a@utoronto.ca wrote: > > > Dear list member, > > > I have a set of Affymetrix data of 10 arrays, HG_U133A, seperated >into > > unpaired > > > two groups of 5 arrays each. I processed the data using LIMMA and > > dChip. For > > > dChip, I used all the default setting. The resulted differential >expressed > > > genes of the two have only less than 50% in common. > > > > > > Why the number of the overlapped genes of the two results is so low? >Is > > there > > > any problems? Can anyone help me? > > > > > > Thanks in advance, > > > Jun > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor@stat.math.ethz.ch > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > >_______________________________________________ > >Bioconductor mailing list > >Bioconductor@stat.math.ethz.ch > >https://stat.ethz.ch/mailman/listinfo/bioconductor > >Naomi S. Altman 814-865-3791 (voice) >Associate Professor >Bioinformatics Consulting Center >Dept. of Statistics 814-863-7114 (fax) >Penn State University 814-865-1348 (Statistics) >University Park, PA 16802-2111 > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor > > >********************************************************************* * >This email and any files transmitted with it are confidentia...{{dropped}} > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 20.1 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Stephen Henderson ★ 1.0k

@stephen-henderson-71

Last seen 8.0 years ago

True but I think you maybe overstating the problem. Differences in the tail are not biologically all that interesting. the range of rma is squashed compared to other methods, and tukey bi-weight has an unreliable baseline for low expressing values. The top100 is often a small fraction, and the tail maybe not that extreme. The interesting point is whether all the data considered significant by one test is significant by the other and as you say how well correlated the raw data is. I think??? I sometimes worry about this too. S -----Original Message----- From: Naomi Altman To: Stephen Henderson; 'ramasamy@cancer.org.uk '; 'jun.yan.a@utoronto.ca ' Cc: 'BioConductor mailing list ' Sent: 3/14/05 12:57 PM Subject: RE: [BioC] LIMMA vs. dChip We did not do any further analysis, and we currently have no plans to do any. To really solve this, a properly designed experiment, possibly WT versus a well-understood knockout, should be done. The data we have at hand is not suitable to determine which normalization is best for determining differential expression. --Naomi At 04:54 AM 3/14/2005, Stephen Henderson wrote: >What result do you get if you try and estimate how many are changing and the >spearman rank correlation for that set? > >This seems a more meaningful metric as up to 50% of genes in some >experiments maybe changing. > > > >-----Original Message----- >From: Naomi Altman >To: ramasamy@cancer.org.uk; jun.yan.a@utoronto.ca >Cc: BioConductor mailing list >Sent: 3/13/05 6:00 PM >Subject: Re: [BioC] LIMMA vs. dChip > >We normalized the same data set using RMA and a very similar procedure >that >used Tukey's biweight within array to combine probes into gene >expression, >instead of median polish. We then applied 2-sample t-tests and SAM to >both >sets of data. The overlap in the "top 100" and "top 200" sets of >differentially expressed genes was 50%. > >Normalization makes a huge difference, even though the correlation >between >the expression values, array by array, can be very close to 100%. This >has >been found many times. The recent thread "RMA vs gcRMA" sheds some >light >on this problem. I suspect that much of the difference lies in the low >expressing genes - but this does not mean that these genes are "absent". > >--Naomi > >At 02:46 PM 3/7/2005, Adaikalavan Ramasamy wrote: > >Your question is bit vague and you provide little information. I do not > >think LIMMA has preprocessing capabilities for Affymetrix data. > > > >1) How did you preprocess the data ? > > > >2) How did you "analyse" your data in dChip ? What technique (e.g. fold > >change, t-test, wilcoxon) did you use in dChip ? > > > >3) How did you select the differentially expressed genes ? (e.g. via p- > >value cutoff or biological significance). > > > > > >One possibility is that you are using very different test statistics. > >With 5 in each group, it is difficult to draw any conclusions as some > >methods are more robust than others at small number of arrays. > > > >Another is that you choose a threshold that includes a lot of noisy > >gene. An extreme example is to select all genes with a p-value less >than > >1 in which case you get 100% agreement between the two methods. > > > >And yet another, you may have made a coding/programming error >somewhere. > > > >Regards, Adai > > > > > > > >On Mon, 2005-03-07 at 14:15 -0500, jun.yan.a@utoronto.ca wrote: > > > Dear list member, > > > I have a set of Affymetrix data of 10 arrays, HG_U133A, seperated >into > > unpaired > > > two groups of 5 arrays each. I processed the data using LIMMA and > > dChip. For > > > dChip, I used all the default setting. The resulted differential >expressed > > > genes of the two have only less than 50% in common. > > > > > > Why the number of the overlapped genes of the two results is so low? >Is > > there > > > any problems? Can anyone help me? > > > > > > Thanks in advance, > > > Jun > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor@stat.math.ethz.ch > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > >_______________________________________________ > >Bioconductor mailing list > >Bioconductor@stat.math.ethz.ch > >https://stat.ethz.ch/mailman/listinfo/bioconductor > >Naomi S. Altman 814-865-3791 (voice) >Associate Professor >Bioinformatics Consulting Center >Dept. of Statistics 814-863-7114 (fax) >Penn State University 814-865-1348 (Statistics) >University Park, PA 16802-2111 > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor > > >********************************************************************* * >This email and any files transmitted with it are confidentia...{{dropped}} > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111 ********************************************************************** This email and any files transmitted with it are confidentia...{{dropped}}

ADD COMMENT • link 20.1 years ago Stephen Henderson ★ 1.0k

Login before adding your answer.