转发： Statistical approach to compare differentiall expressed gene lists

0

Entering edit mode

qinghua xu ▴ 110

@qinghua-xu-2536

Last seen 10.2 years ago

Dear all, ? I?have?identified two lists of differential expressed gene?from the same expression data?but treated with different normalisation methods. List A contains?995 genes and list B contains 2400 genes. More than nine hundreds genes?are overlapped between two lists, namely?most of genes in list A are also included?in list B. The idea is to?check?whether list B is?better than list A. ? In addition to visualisation approach (like hierarchical clustering heatmap)?or biological interpretations,? I am wondering is there any other statistical approach?available to compare two?differential expressed gene lists? ? I would appreciate any advice, or pointers to any references for this! ? Bests, Qinghua ___________________________________________________________ ????????????????? http://card.mail.cn.yahoo.com/ -------------- next part -------------- _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

• 1.2k views

ADD COMMENT • link updated 14.9 years ago by Wolfgang Huber ★ 13k • written 14.9 years ago by qinghua xu ▴ 110

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 12 weeks ago

EMBL European Molecular Biology Laborat…

Dear Qinghua I am afraid your question may be too vague. You will need to define more precisely what you mean by "better". Then, it should be straightforward to compute a quantitative criterion. It wouldn't be wise to wait for someone else to define what is "better" for you. Also, for any analysis method I know of, gene lists depend in a trivial manner on a cut-off (e.g. for p-value, score...), and if you want to do something more meaningful than exegesis of someone's cut-off choice, than I'd suggest to plot ROC curves for both methods, using a reference set of genes that is enriched for "truely differentially expressed". Best wishes Wolfgang > Dear all, > > I have identified two lists of differential expressed gene from the > same expression data but treated with different normalisation > methods. List A contains 995 genes and list B contains 2400 genes. > More than nine hundreds genes are overlapped between two lists, > namely most of genes in list A are also included in list B. The idea > is to check whether list B is better than list A. > > In addition to visualisation approach (like hierarchical clustering > heatmap) or biological interpretations, I am wondering is there any > other statistical approach available to compare two differential > expressed gene lists? > > I would appreciate any advice, or pointers to any references for > this! > > Bests, Qinghua > > > > ___________________________________________________________ ????????? > ???????? http://card.mail.cn.yahoo.com/ > > > -------------------------------------------------------------------- ---- > > > _______________________________________________ Bioconductor mailing > list Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -------------------------------------------------------------------- ---- > > > _______________________________________________ Bioconductor mailing > list Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Best wishes Wolfgang -- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber/contact

ADD COMMENT • link 14.9 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Dear Wolfgang, Â It is really nice and surprise to have your attention! Thank you! Â I am sorry that the question was too vague. The detailed figure is that we would like to study the gene expression profiling in human peripheral blood and identify DEGs (differential expressed genes) between male and female. As I mentioned in my previous email, the raw data were preprocessed in two approaches: one is simply by RMA and the other, after RMA, the expression data were further adjusted by ComBat Â (http://statistics.byu.edu/johnson/ComBat/) to removal potential batch effects. The dataset was relatively small including 12 Male and 12 Female. At the end, we got two DEG lists by SAM at FDR=0.05. The basic idea is to show by removing potential batch effects, we are capable to extract more information from gene expression data representing the difference between male and female in peripheral blood. On the other hand, we also would like to check whether the additional batch effect adjustment will introduce artificial DEGs. Â Based on the preliminary result, we observe that the difference between male and female in peripheral blood are very impressive, especially for (x, y) chromosome specific genes. Hence, when plotted ROC curves for both methods, both DEG lists easily reached the maximum AUC=1. And the same situation for hierarchical clustering heatmap, both DEG lists achieved perfect discrimination. Â Thanks again! Â Qinghua ________________________________ åä»¶äººï¼ Wolfgang Huber <whuber@embl.de> æ éï¼ bioconductor <bioconductor@stat.math.ethz.ch>; qinghua.xu@as.biomerieux.com åéæ¥æï¼ 2009/12/28 (å¨ä¸) 4:56:18 ä¸å ä¸» é¢ï¼ Re: [BioC] è½¬åï¼ Statistical approach to compare differentiall expressed gene lists Dear Qinghua I am afraid your question may be too vague. You will need to define more precisely what you mean by "better". Then, it should be straightforward to compute a quantitative criterion. It wouldn't be wise to wait for someone else to define what is "better" for you. Also, for any analysis method I know of, gene lists depend in a trivial manner on a cut-off (e.g. for p-value, score...), and if you want to do something more meaningful than exegesis of someone's cut-off choice, than I'd suggest to plot ROC curves for both methods, using a reference set of genes that is enriched for "truely differentially expressed". Best wishes Â Â Â Wolfgang > Dear all, > > I have identified two lists of differential expressed gene from the > same expression data but treated with different normalisation > methods. List A contains 995 genes and list B contains 2400 genes. > More than nine hundreds genes are overlapped between two lists, > namely most of genes in list A are also included in list B. The idea > is to check whether list B is better than list A. > > In addition to visualisation approach (like hierarchical clustering > heatmap) or biological interpretations,Â I am wondering is there any > other statistical approach available to compare two differential > expressed gene lists? > > I would appreciate any advice, or pointers to any references for > this! > > Bests, Qinghua > > > > ___________________________________________________________ å¥½ç©è´ºå¡çä½ åï¼é® > ç®±è´ºå¡å ¨æ°ä¸çº¿ï¼ > > > -------------------------------------------------------------------- ---- > > > _______________________________________________ Bioconductor mailing > list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -------------------------------------------------------------------- ---- > > > _______________________________________________ Bioconductor mailing > list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Best wishes Â Â Wolfgang -- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber/contact ___________________________________________________________ å¥½ç©è´ºå¡çä½ åï¼é®ç®±è´ºå¡å ¨æ°ä¸çº¿ï¼ [[alternative HTML version deleted]]

ADD REPLY • link 14.9 years ago qinghua xu ▴ 110

0

Entering edit mode

Dear Qinghua, I am not sure if I would call those differences "very impressive". As your samples have different numbers of X and Y chromosomes, I would definitely expect many of them to be differentially expressed. After all, no genes on Y should be expressed on any of the females, right? The fact that you can trivially predict between them should suggest that what you are doing is not difficult at all. As Wolfgang is saying, you need another criterion to define what is a "good list". No statistical tests are going to tell you that one list is better than the other. This being said, one of your list is larger, so it is likely that it contains more of the differences between your groups. On the other hand, it could be giving you more false positives. You could look at the extra genes and see if they make sense. In this case, if a lot of the extra genes were on X and Y chromosomes, they are likely truly differentially expressed. Keep in mind that you have a large overlap between the lists, so it will be more difficult to choose between them but it also matters much less which one you choose. It would be very convenient if there was a simple test that would tell us which method is best for an analysis, but generally no such method exist. Francois On 12/29/2009 04:31 AM, qinghua xu wrote: > Dear Wolfgang, > ? > It is really nice and surprise to have your attention! Thank you! > ? > I am sorry that the question was too vague. The detailed figure is that we would like to study the gene expression profiling in human peripheral blood and identify DEGs (differential expressed genes) between male and female. As I mentioned in my previous email, the raw data were preprocessed in two approaches: one is simply by RMA and the other, after RMA, the expression data were further adjusted by ComBat ? (http://statistics.byu.edu/johnson/ComBat/) to removal potential batch effects. The dataset was relatively small including 12 Male and 12 Female. At the end, we got two DEG lists by SAM at FDR=0.05. The basic idea is to show by removing potential batch effects, we are capable to extract more information from gene expression data representing the difference between male and female in peripheral blood. On the other hand, we also would like to check whether the additional batch effect adjustment will introduce artificial DEGs. > ? > Based on the preliminary result, we observe that the difference between male and female in peripheral blood are very impressive, especially for (x, y) chromosome specific genes. Hence, when plotted ROC curves for both methods, both DEG lists easily reached the maximum AUC=1. And the same situation for hierarchical clustering heatmap, both DEG lists achieved perfect discrimination. > ? > Thanks again! > ? > Qinghua > > > > ________________________________ > ???????????? Wolfgang Huber<whuber at="" embl.de=""> > > ??? ?????? bioconductor<bioconductor at="" stat.math.ethz.ch="">; qinghua.xu at as.biomerieux.com > ??????????????? 2009/12/28 (??????) 4:56:18 ?????? > ??? ?????? Re: [BioC] ????????? Statistical approach to compare differentiall expressed gene lists > > Dear Qinghua > > I am afraid your question may be too vague. You will need to define more > precisely what you mean by "better". Then, it should be straightforward > to compute a quantitative criterion. It wouldn't be wise to wait for > someone else to define what is "better" for you. > > Also, for any analysis method I know of, gene lists depend in a trivial > manner on a cut-off (e.g. for p-value, score...), and if you want to do > something more meaningful than exegesis of someone's cut-off choice, > than I'd suggest to plot ROC curves for both methods, using a reference > set of genes that is enriched for "truely differentially expressed". > > Best wishes > ? ? ? Wolfgang > > >> Dear all, >> >> I have identified two lists of differential expressed gene from the >> same expression data but treated with different normalisation >> methods. List A contains 995 genes and list B contains 2400 genes. >> More than nine hundreds genes are overlapped between two lists, >> namely most of genes in list A are also included in list B. The idea >> is to check whether list B is better than list A. >> >> In addition to visualisation approach (like hierarchical clustering >> heatmap) or biological interpretations,? I am wondering is there any >> other statistical approach available to compare two differential >> expressed gene lists? >> >> I would appreciate any advice, or pointers to any references for >> this! >> >> Bests, Qinghua >> >> >> >> ___________________________________________________________ ????????????????? ????????? >> ???????????????????????? >> >> >> ------------------------------------------------------------------- ----- >> >> >> _______________________________________________ Bioconductor mailing >> list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >> archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> ------------------------------------------------------------------- ----- >> >> >> _______________________________________________ Bioconductor mailing >> list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >> archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 14.9 years ago Francois Pepin ★ 1.3k

0

Entering edit mode

Hi Quinghua, That is not really the way to approach the problem. You should consult a local statistician or biostatistician who can help you set up an appropriate model to test for sex differences (it is relatively straight forward and could easily be done in limma or just about any other piece of sensible software). A few more comments below. On Tue, Dec 29, 2009 at 8:49 AM, Francois Pepin <fpepin at="" cs.mcgill.ca=""> wrote: > Dear Qinghua, > > I am not sure if I would call those differences "very impressive". As your > samples have different numbers of X and Y chromosomes, I would definitely > expect many of them to be differentially expressed. After all, no genes on Y > should be expressed on any of the females, right? Not true - there are autosomal regions that are essentially duplicated (look at just about any annotation package in Bioc for genes that map to two different chromosomes) between X and Y. Also, X inactivation makes life reasonable difficult when working with genes on the X chromosome. One would need to develop some sort of sensible model (almost surely data driven in the first instance). best wishes Robert > > The fact that you can trivially predict between them should suggest that > what you are doing is not difficult at all. > > As Wolfgang is saying, you need another criterion to define what is a "good > list". No statistical tests are going to tell you that one list is better > than the other. > > This being said, one of your list is larger, so it is likely that it > contains more of the differences between your groups. On the other hand, it > could be giving you more false positives. You could look at the extra genes > and see if they make sense. In this case, if a lot of the extra genes were > on X and Y chromosomes, they are likely truly differentially expressed. > > Keep in mind that you have a large overlap between the lists, so it will be > more difficult to choose between them but it also matters much less which > one you choose. > > It would be very convenient if there was a simple test that would tell us > which method is best for an analysis, but generally no such method exist. > > Francois > > On 12/29/2009 04:31 AM, qinghua xu wrote: >> >> Dear Wolfgang, >> ? >> It is really nice and surprise to have your attention! Thank you! >> ? >> I am sorry that the question was too vague. The detailed figure is that we >> would like to study the gene expression profiling in human peripheral blood >> and identify DEGs (differential expressed genes) between male and female. As >> I mentioned in my previous email, the raw data were preprocessed in two >> approaches: one is simply by RMA and the other, after RMA, the expression >> data were further adjusted by ComBat ? >> (http://statistics.byu.edu/johnson/ComBat/) to removal potential batch >> effects. The dataset was relatively small including 12 Male and 12 Female. >> At the end, we got two DEG lists by SAM at FDR=0.05. The basic idea is to >> show by removing potential batch effects, we are capable to extract more >> information from gene expression data representing the difference between >> male and female in peripheral blood. On the other hand, we also would like >> to check whether the additional batch effect adjustment will introduce >> artificial DEGs. >> ? >> Based on the preliminary result, we observe that the difference between >> male and female in peripheral blood are very impressive, especially for (x, >> y) chromosome specific genes. Hence, when plotted ROC curves for both >> methods, both DEG lists easily reached the maximum AUC=1. And the same >> situation for hierarchical clustering heatmap, both DEG lists achieved >> perfect discrimination. >> ? >> Thanks again! >> ? >> Qinghua >> >> >> >> ________________________________ >> ? ?????????? Wolfgang Huber<whuber at="" embl.de=""> >> >> ??? ?? ??? bioconductor<bioconductor at="" stat.math.ethz.ch="">; >> qinghua.xu at as.biomerieux.com >> ? ??? ????????? 2009/12/28 (??????) 4:56:18 ???? ? >> ??? ?????? Re: [BioC] ???? ???? Statistical approach to compare >> differentiall expressed gene lists >> >> Dear Qinghua >> >> I am afraid your question may be too vague. You will need to define more >> precisely what you mean by "better". Then, it should be straightforward >> to compute a quantitative criterion. It wouldn't be wise to wait for >> someone else to define what is "better" for you. >> >> Also, for any analysis method I know of, gene lists depend in a trivial >> manner on a cut-off (e.g. for p-value, score...), and if you want to do >> something more meaningful than exegesis of someone's cut-off choice, >> than I'd suggest to plot ROC curves for both methods, using a reference >> set of genes that is enriched for "truely differentially expressed". >> >> Best wishes >> ? ? ? ?Wolfgang >> >> >>> Dear all, >>> >>> I have identified two lists of differential expressed gene from the >>> same expression data but treated with different normalisation >>> methods. List A contains 995 genes and list B contains 2400 genes. >>> More than nine hundreds genes are overlapped between two lists, >>> namely most of genes in list A are also included in list B. The idea >>> is to check whether list B is better than list A. >>> >>> In addition to visualisation approach (like hierarchical clustering >>> heatmap) or biological interpretations,? ?I am wondering is there any >>> other statistical approach available to compare two differential >>> expressed gene lists? >>> >>> I would appreciate any advice, or pointers to any references for >>> this! >>> >>> Bests, Qinghua >>> >>> >>> >>> ___________________________________________________________ ?????????? >>> ?????? ? ??????? >>> ??????? ??????????????? >>> >>> ------------------------------------------------------------------ ------ >>> >>> >>> _______________________________________________ Bioconductor mailing >>> list Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >>> archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >>> ------------------------------------------------------------------ ------ >>> >>> >>> _______________________________________________ Bioconductor mailing >>> list Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >>> archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Robert Gentleman rgentlem at gmail.com

ADD REPLY • link 14.9 years ago rgentleman ★ 5.5k

Login before adding your answer.