edgeR question

0

Entering edit mode

王喆 ▴ 60

@-4142

Last seen 10.4 years ago

Hello, Â I am learning edgeR and would like to use it dealing with my Tag-seq and RNA-seq data.Â I have several questions: Â 1.Â Does the DE analysisÂ usingÂ common dispersion or moderated tagwise dispersionsÂ use the TMM method for normalization?Â Â I am not sure the relationship between Setion 6 (Normalization) and the following sections in the user manual. I suppose I should normalize the data first, and then perform DE analysis. Â 2. Do you suggest to use P-value < 0.01? What about FDR < 0.05? After saving de.tagwise (>Â write.tablede.com[[1]], file = "/Users/Zhe/edgeR/page7", sep = "\t")), I found there is not a columnÂ of theÂ FDR. How to calculateÂ the FDR for each gene and save it in the output file. Â Thanks a lot. Best wishes, Â Zhe Â [[alternative HTML version deleted]]

edgeR edgeR • 1.6k views

ADD COMMENT • link updated 14.5 years ago by Gordon Smyth 52k • written 14.5 years ago by 王喆 ▴ 60

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 3.7 years ago

United States

Hi Zhe, 1. First normalize and then do the DE analysis. (I found this confusing in the vignette, too.) 2. I do not suggest using FDR at this time. The standard FDR computations need to be adjusted for count data. I do not think this has been worked out yet. --Naomi At 12:21 PM 6/25/2010, ?????? wrote: >Hello, >? >I am learning edgeR and would like to use it >dealing with my Tag-seq and RNA-seq data.? I have several questions: >? >1.? Does the DE analysis? using? common >dispersion or moderated tagwise dispersions? use >the TMM method for normalization?? ? I am not >sure the relationship between Setion 6 >(Normalization) and the following sections in >the user manual. I suppose I should normalize >the data first, and then perform DE analysis. >? >2. Do you suggest to use P-value < 0.01? What >about FDR < 0.05? After saving de.tagwise (>? >write.tablede.com[[1]], file = >"/Users/Zhe/edgeR/page7", sep = "\t")), I found >there is not a column? of the? FDR. How to >calculate? the FDR for each gene and save it in the output file. >? >Thanks a lot. >Best wishes, >? >Zhe >? > > > > [[alternative HTML version deleted]] > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD COMMENT • link 14.5 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 5 hours ago

WEHI, Melbourne, Australia

Dear Zhe, To get FDR, you must use the topTags() function. Is your de.com object a deDGEList object? If it is, then top <- topTagsde.com, n=Inf) write.table(top$table, file="yourfile.txt") will do what you want. (I can't tell you what level of FDR to use as your cutoff though, that's up to you.) Naomi, I don't know of any problem with FDR from edgeR. It should work just fine. Best wishes Gordon ----------------------------------------------- Associate Professor Gordon K Smyth, NHMRC Senior Research Fellow, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. smyth at wehi.edu.au http://www.wehi.edu.au http://www.statsci.org/smyth ------------ original message --------------- [BioC] edgeR question Naomi Altman naomi at stat.psu.edu Fri Jun 25 22:43:51 CEST 2010 Hi Zhe, 1. First normalize and then do the DE analysis. (I found this confusing in the vignette, too.) 2. I do not suggest using FDR at this time. The standard FDR computations need to be adjusted for count data. I do not think this has been worked out yet. --Naomi At 12:21 PM 6/25/2010, wrote: >Hello, > >I am learning edgeR and would like to use it >dealing with my Tag-seq and RNA-seq data. I have several questions: > >1. Does the DE analysis using common >dispersion or moderated tagwise dispersions use >the TMM method for normalization? I am not >sure the relationship between Setion 6 >(Normalization) and the following sections in >the user manual. I suppose I should normalize >the data first, and then perform DE analysis. > >2. Do you suggest to use P-value < 0.01? What >about FDR < 0.05? After saving de.tagwise (> >write.tablede.com[[1]], file = >"/Users/Zhe/edgeR/page7", sep = "\t")), I found >there is not a column of the FDR. How to >calculate the FDR for each gene and save it in the output file. > >Thanks a lot. >Best wishes, > >Zhe ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 14.5 years ago Gordon Smyth 52k

0

Entering edit mode

Dear Gordon, Thank you for your very detailed and clear answer to my question about the dispersion model. Regarding FDR: For discrete-valued test statistics, the distribution of the p-values under the null hypothesis is a discrete uniform which depends on the marginal total. As a result, under the distribution of p-values from the null hypotheses is a mixture of discrete uniforms, which can be marginally very non-uniform. Even after filtering out low expressing genes, it is common to see a peak of p-values near 1.0 due to this effect. It is less evident that there are multiple other peaks, one at each of the discrete values of the p-value for each marginal total. The result of this is that FDR computations are far too conservative for lowly expressing genes, and far too liberal for highly expressing genes which basically magnifies the power differential that already exists due to the relationship between the mean and variance. --Naomi At 05:01 AM 6/26/2010, Gordon K Smyth wrote: >Dear Zhe, > >To get FDR, you must use the topTags() function. Is your de.com >object a deDGEList object? If it is, then > > top <- topTagsde.com, n=Inf) > write.table(top$table, file="yourfile.txt") > >will do what you want. (I can't tell you what level of FDR to use >as your cutoff though, that's up to you.) > >Naomi, I don't know of any problem with FDR from edgeR. It should >work just fine. > >Best wishes >Gordon > >----------------------------------------------- >Associate Professor Gordon K Smyth, >NHMRC Senior Research Fellow, >Bioinformatics Division, Walter and Eliza Hall Institute of Medical >Research, 1G Royal Parade, Parkville, Vic 3052, Australia. >smyth at wehi.edu.au >http://www.wehi.edu.au >http://www.statsci.org/smyth > > > >------------ original message --------------- >[BioC] edgeR question >Naomi Altman naomi at stat.psu.edu >Fri Jun 25 22:43:51 CEST 2010 > >Hi Zhe, >1. First normalize and then do the DE >analysis. (I found this confusing in the vignette, too.) > >2. I do not suggest using FDR at this time. The >standard FDR computations need to be adjusted for >count data. I do not think this has been worked out yet. > >--Naomi > > >At 12:21 PM 6/25/2010, wrote: > >>Hello, >> >>I am learning edgeR and would like to use it >>dealing with my Tag-seq and RNA-seq data. I have several questions: >> >>1. Does the DE analysis using common >>dispersion or moderated tagwise dispersions use >>the TMM method for normalization? I am not >>sure the relationship between Setion 6 >>(Normalization) and the following sections in >>the user manual. I suppose I should normalize >>the data first, and then perform DE analysis. >> >>2. Do you suggest to use P-value < 0.01? What >>about FDR < 0.05? After saving de.tagwise (> >>write.tablede.com[[1]], file = >>"/Users/Zhe/edgeR/page7", sep = "\t")), I found >>there is not a column of the FDR. How to >>calculate the FDR for each gene and save it in the output file. >> >>Thanks a lot. >>Best wishes, >> >>Zhe > >_____________________________________________________________________ _ >The information in this email is confidential and intend...{{dropped:4}} > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 14.5 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Hi Naomi, I agree that the discreteness of the counts introduces conservatism, and that there is a power differential between low and high expressed genes. However the expected overall FDR is still controlled at a rate less than or equal to the nominal rate, and that is all we promise. To reduce the trend in DE vs expression level, I like to combine FDR with a fold-change cutoff or, perhaps better, use a TREAT like test. Regards Gordon On Sat, 26 Jun 2010, Naomi Altman wrote: > Dear Gordon, > Thank you for your very detailed and clear answer to my question about the > dispersion model. > > Regarding FDR: > For discrete-valued test statistics, the distribution of the p-values under > the null hypothesis is a discrete uniform which depends on the marginal > total. As a result, > under the distribution of p-values from the null hypotheses is a mixture of > discrete uniforms, which can be marginally very non-uniform. Even after > filtering out low expressing genes, it is common to see a peak of p-values > near 1.0 due to this effect. It is less evident that there are multiple > other peaks, one at each of the discrete values of the p-value for each > marginal total. The result of this is that FDR computations are far too > conservative for lowly expressing genes, and far too liberal for highly > expressing genes which basically magnifies the power differential that > already exists due to the relationship between the mean and variance. > > --Naomi > > At 05:01 AM 6/26/2010, Gordon K Smyth wrote: >> Dear Zhe, >> >> To get FDR, you must use the topTags() function. Is your de.com object a >> deDGEList object? If it is, then >> >> top <- topTagsde.com, n=Inf) >> write.table(top$table, file="yourfile.txt") >> >> will do what you want. (I can't tell you what level of FDR to use as your >> cutoff though, that's up to you.) >> >> Naomi, I don't know of any problem with FDR from edgeR. It should work >> just fine. >> >> Best wishes >> Gordon >> >> ----------------------------------------------- >> Associate Professor Gordon K Smyth, >> NHMRC Senior Research Fellow, >> Bioinformatics Division, Walter and Eliza Hall Institute of Medical >> Research, 1G Royal Parade, Parkville, Vic 3052, Australia. >> smyth at wehi.edu.au >> http://www.wehi.edu.au >> http://www.statsci.org/smyth >> >> >> >> ------------ original message --------------- >> [BioC] edgeR question >> Naomi Altman naomi at stat.psu.edu >> Fri Jun 25 22:43:51 CEST 2010 >> >> Hi Zhe, >> 1. First normalize and then do the DE >> analysis. (I found this confusing in the vignette, too.) >> >> 2. I do not suggest using FDR at this time. The >> standard FDR computations need to be adjusted for >> count data. I do not think this has been worked out yet. >> >> --Naomi >> >> >> At 12:21 PM 6/25/2010, wrote: >> >>> Hello, >>> >>> I am learning edgeR and would like to use it >>> dealing with my Tag-seq and RNA-seq data. I have several questions: >>> >>> 1. Does the DE analysis using common >>> dispersion or moderated tagwise dispersions use >>> the TMM method for normalization? I am not >>> sure the relationship between Setion 6 >>> (Normalization) and the following sections in >>> the user manual. I suppose I should normalize >>> the data first, and then perform DE analysis. >>> >>> 2. Do you suggest to use P-value < 0.01? What >>> about FDR < 0.05? After saving de.tagwise (> >>> write.tablede.com[[1]], file = >>> "/Users/Zhe/edgeR/page7", sep = "\t")), I found >>> there is not a column of the FDR. How to >>> calculate the FDR for each gene and save it in the output file. >>> >>> Thanks a lot. >>> Best wishes, >>> >>> Zhe >> >> ______________________________________________________________________ >> The information in this email is confidential and intend...{{dropped:4}} >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111 > > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 14.5 years ago Gordon Smyth 52k

0

Entering edit mode

Basically, if a global FDR is used with discrete data, then one should filter low expressing genes pretty stringently. For example, one could compute K (the marginal total for the gene) for which the smallest possible p-value is .001 (e.g. use the ordinary Fisher's exact test as an approximation) and use only features with K or more reads in the study. This improves power for the (much smaller number of) remaining features, but obviously you will then need to sort manually through the low expressing genes to determine if you have missed something striking (such as all of the K-1 reads are in a single sample). --Naomi At 10:39 AM 6/26/2010, you wrote: >Hi Naomi, > >I agree that the discreteness of the counts introduces conservatism, >and that there is a power differential between low and high >expressed genes. However the expected overall FDR is still >controlled at a rate less than or equal to the nominal rate, and >that is all we promise. > >To reduce the trend in DE vs expression level, I like to combine FDR >with a fold-change cutoff or, perhaps better, use a TREAT like test. > >Regards >Gordon > >On Sat, 26 Jun 2010, Naomi Altman wrote: > >>Dear Gordon, >>Thank you for your very detailed and clear answer to my question >>about the dispersion model. >> >>Regarding FDR: >>For discrete-valued test statistics, the distribution of the >>p-values under the null hypothesis is a discrete uniform which >>depends on the marginal total. As a result, >>under the distribution of p-values from the null hypotheses is a >>mixture of discrete uniforms, which can be marginally very >>non-uniform. Even after filtering out low expressing genes, it is >>common to see a peak of p-values near 1.0 due to this effect. It >>is less evident that there are multiple other peaks, one at each of >>the discrete values of the p-value for each marginal total. The >>result of this is that FDR computations are far too conservative >>for lowly expressing genes, and far too liberal for highly >>expressing genes which basically magnifies the power differential >>that already exists due to the relationship between the mean and variance. >> >>--Naomi >> >>At 05:01 AM 6/26/2010, Gordon K Smyth wrote: >>>Dear Zhe, >>>To get FDR, you must use the topTags() function. Is your de.com >>>object a deDGEList object? If it is, then >>> >>> top <- topTagsde.com, n=Inf) >>> write.table(top$table, file="yourfile.txt") >>>will do what you want. (I can't tell you what level of FDR to use >>>as your cutoff though, that's up to you.) >>>Naomi, I don't know of any problem with FDR from edgeR. It should >>>work just fine. >>>Best wishes >>>Gordon >>>----------------------------------------------- >>>Associate Professor Gordon K Smyth, >>>NHMRC Senior Research Fellow, >>>Bioinformatics Division, Walter and Eliza Hall Institute of >>>Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. >>>smyth at wehi.edu.au >>>http://www.wehi.edu.au >>>http://www.statsci.org/smyth >>> >>>------------ original message --------------- >>>[BioC] edgeR question >>>Naomi Altman naomi at stat.psu.edu >>>Fri Jun 25 22:43:51 CEST 2010 >>>Hi Zhe, >>>1. First normalize and then do the DE >>>analysis. (I found this confusing in the vignette, too.) >>>2. I do not suggest using FDR at this time. The >>>standard FDR computations need to be adjusted for >>>count data. I do not think this has been worked out yet. >>>--Naomi >>> >>>At 12:21 PM 6/25/2010, wrote: >>> >>>>Hello, >>>>I am learning edgeR and would like to use it >>>>dealing with my Tag-seq and RNA-seq data. I have several questions: >>>>1. Does the DE analysis using common >>>>dispersion or moderated tagwise dispersions use >>>>the TMM method for normalization? I am not >>>>sure the relationship between Setion 6 >>>>(Normalization) and the following sections in >>>>the user manual. I suppose I should normalize >>>>the data first, and then perform DE analysis. >>>>2. Do you suggest to use P-value < 0.01? What >>>>about FDR < 0.05? After saving de.tagwise (> >>>>write.tablede.com[[1]], file = >>>>"/Users/Zhe/edgeR/page7", sep = "\t")), I found >>>>there is not a column of the FDR. How to >>>>calculate the FDR for each gene and save it in the output file. >>>>Thanks a lot. >>>>Best wishes, >>>>Zhe >>>___________________________________________________________________ ___ >>>The information in this email is confidential and intend...{{dropped:4}} >>>_______________________________________________ >>>Bioconductor mailing list >>>Bioconductor at stat.math.ethz.ch >>>https://stat.ethz.ch/mailman/listinfo/bioconductor >>>Search the archives: >>>http://news.gmane.org/gmane.science.biology.informatics.conductor >> >>Naomi S. Altman 814-865-3791 (voice) >>Associate Professor >>Dept. of Statistics 814-863-7114 (fax) >>Penn State University 814-865-1348 (Statistics) >>University Park, PA 16802-2111 >> > >_____________________________________________________________________ _ >The information in this email is confidential and intend...{{dropped:4}} > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 14.5 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Hi Naomi, edgeR already does exactly what you suggest, although we chose p=0.05 (leading to K=5) for this purpose rather than 0.001. You're right that a more conservative value would probably be better. However all the NextGen data sets we've analysed so far have huge amounts of DE, so it hasn't been an issue. Regards Gordon On Sat, 26 Jun 2010, Naomi Altman wrote: > > Basically, if a global FDR is used with discrete data, then one should filter > low expressing genes pretty stringently. For example, one could compute K > (the marginal total for the gene) for which the smallest possible p-value is > .001 (e.g. use the ordinary Fisher's exact test as an approximation) and use > only features with K or more reads in the study. This improves power for the > (much smaller number of) remaining features, but obviously you will then need > to sort manually through the low expressing genes to determine if you have > missed something striking (such as all of the K-1 reads are in a single > sample). > > --Naomi > > > > At 10:39 AM 6/26/2010, you wrote: >> Hi Naomi, >> >> I agree that the discreteness of the counts introduces conservatism, and >> that there is a power differential between low and high expressed genes. >> However the expected overall FDR is still controlled at a rate less than or >> equal to the nominal rate, and that is all we promise. >> >> To reduce the trend in DE vs expression level, I like to combine FDR with a >> fold-change cutoff or, perhaps better, use a TREAT like test. >> >> Regards >> Gordon >> >> On Sat, 26 Jun 2010, Naomi Altman wrote: >> >>> Dear Gordon, >>> Thank you for your very detailed and clear answer to my question about the >>> dispersion model. >>> >>> Regarding FDR: >>> For discrete-valued test statistics, the distribution of the p-values >>> under the null hypothesis is a discrete uniform which depends on the >>> marginal total. As a result, >>> under the distribution of p-values from the null hypotheses is a mixture >>> of discrete uniforms, which can be marginally very non-uniform. Even >>> after filtering out low expressing genes, it is common to see a peak of >>> p-values near 1.0 due to this effect. It is less evident that there are >>> multiple other peaks, one at each of the discrete values of the p-value >>> for each marginal total. The result of this is that FDR computations are >>> far too conservative for lowly expressing genes, and far too liberal for >>> highly expressing genes which basically magnifies the power differential >>> that already exists due to the relationship between the mean and variance. >>> >>> --Naomi >>> >>> At 05:01 AM 6/26/2010, Gordon K Smyth wrote: >>>> Dear Zhe, >>>> To get FDR, you must use the topTags() function. Is your de.com object a >>>> deDGEList object? If it is, then >>>> >>>> top <- topTagsde.com, n=Inf) >>>> write.table(top$table, file="yourfile.txt") >>>> will do what you want. (I can't tell you what level of FDR to use as >>>> your cutoff though, that's up to you.) >>>> Naomi, I don't know of any problem with FDR from edgeR. It should work >>>> just fine. >>>> Best wishes >>>> Gordon >>>> ----------------------------------------------- >>>> Associate Professor Gordon K Smyth, >>>> NHMRC Senior Research Fellow, >>>> Bioinformatics Division, Walter and Eliza Hall Institute of Medical >>>> Research, 1G Royal Parade, Parkville, Vic 3052, Australia. >>>> smyth at wehi.edu.au >>>> http://www.wehi.edu.au >>>> http://www.statsci.org/smyth >>>> >>>> ------------ original message --------------- >>>> [BioC] edgeR question >>>> Naomi Altman naomi at stat.psu.edu >>>> Fri Jun 25 22:43:51 CEST 2010 >>>> Hi Zhe, >>>> 1. First normalize and then do the DE >>>> analysis. (I found this confusing in the vignette, too.) >>>> 2. I do not suggest using FDR at this time. The >>>> standard FDR computations need to be adjusted for >>>> count data. I do not think this has been worked out yet. >>>> --Naomi >>>> >>>> At 12:21 PM 6/25/2010, wrote: >>>> >>>>> Hello, >>>>> I am learning edgeR and would like to use it >>>>> dealing with my Tag-seq and RNA-seq data. I have several questions: >>>>> 1. Does the DE analysis using common >>>>> dispersion or moderated tagwise dispersions use >>>>> the TMM method for normalization? I am not >>>>> sure the relationship between Setion 6 >>>>> (Normalization) and the following sections in >>>>> the user manual. I suppose I should normalize >>>>> the data first, and then perform DE analysis. >>>>> 2. Do you suggest to use P-value < 0.01? What >>>>> about FDR < 0.05? After saving de.tagwise (> >>>>> write.tablede.com[[1]], file = >>>>> "/Users/Zhe/edgeR/page7", sep = "\t")), I found >>>>> there is not a column of the FDR. How to >>>>> calculate the FDR for each gene and save it in the output file. >>>>> Thanks a lot. >>>>> Best wishes, >>>>> Zhe >>>> ______________________________________________________________________ >>>> The information in this email is confidential and intend...{{dropped:4}} >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> Naomi S. Altman 814-865-3791 (voice) >>> Associate Professor >>> Dept. of Statistics 814-863-7114 (fax) >>> Penn State University 814-865-1348 (Statistics) >>> University Park, PA 16802-2111 >>> >> >> ______________________________________________________________________ >> The information in this email is confidential and intend...{{dropped:4}} >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111 > > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 14.5 years ago Gordon Smyth 52k

0

Entering edit mode

Hi Naomi, Davis has pointed out to me that I'm not quite correct. edgeR does automatically filter out K<6 when estimating the common dispersion, but when doing the DE analysis the only automatic filter is to remove K=0. I agree that a filter like you suggest is very sensible as a routine procedure, and I've been thinking along the same lines. We did do this filtering for the 't Hoen data case study in the edgeR user's guide, but haven't done it so far for the other case studies. Regards Gordon On Mon, 28 Jun 2010, Gordon K Smyth wrote: > Hi Naomi, > > edgeR already does exactly what you suggest, although we chose p=0.05 > (leading to K=5) for this purpose rather than 0.001. You're right that a > more conservative value would probably be better. However all the NextGen > data sets we've analysed so far have huge amounts of DE, so it hasn't been an > issue. > > Regards > Gordon > > On Sat, 26 Jun 2010, Naomi Altman wrote: > >> >> Basically, if a global FDR is used with discrete data, then one should >> filter low expressing genes pretty stringently. For example, one could >> compute K (the marginal total for the gene) for which the smallest possible >> p-value is .001 (e.g. use the ordinary Fisher's exact test as an >> approximation) and use only features with K or more reads in the study. >> This improves power for the (much smaller number of) remaining features, >> but obviously you will then need to sort manually through the low >> expressing genes to determine if you have missed something striking (such >> as all of the K-1 reads are in a single sample). >> >> --Naomi >> >> >> >> At 10:39 AM 6/26/2010, you wrote: >>> Hi Naomi, >>> >>> I agree that the discreteness of the counts introduces conservatism, and >>> that there is a power differential between low and high expressed genes. >>> However the expected overall FDR is still controlled at a rate less than >>> or equal to the nominal rate, and that is all we promise. >>> >>> To reduce the trend in DE vs expression level, I like to combine FDR with >>> a fold-change cutoff or, perhaps better, use a TREAT like test. >>> >>> Regards >>> Gordon >>> >>> On Sat, 26 Jun 2010, Naomi Altman wrote: >>> >>>> Dear Gordon, >>>> Thank you for your very detailed and clear answer to my question about >>>> the dispersion model. >>>> >>>> Regarding FDR: >>>> For discrete-valued test statistics, the distribution of the p-values >>>> under the null hypothesis is a discrete uniform which depends on the >>>> marginal total. As a result, >>>> under the distribution of p-values from the null hypotheses is a mixture >>>> of discrete uniforms, which can be marginally very non-uniform. Even >>>> after filtering out low expressing genes, it is common to see a peak of >>>> p-values near 1.0 due to this effect. It is less evident that there are >>>> multiple other peaks, one at each of the discrete values of the p-value >>>> for each marginal total. The result of this is that FDR computations are >>>> far too conservative for lowly expressing genes, and far too liberal for >>>> highly expressing genes which basically magnifies the power differential >>>> that already exists due to the relationship between the mean and >>>> variance. >>>> >>>> --Naomi >>>> >>>> At 05:01 AM 6/26/2010, Gordon K Smyth wrote: >>>>> Dear Zhe, >>>>> To get FDR, you must use the topTags() function. Is your de.com object >>>>> a deDGEList object? If it is, then >>>>> >>>>> top <- topTagsde.com, n=Inf) >>>>> write.table(top$table, file="yourfile.txt") >>>>> will do what you want. (I can't tell you what level of FDR to use as >>>>> your cutoff though, that's up to you.) >>>>> Naomi, I don't know of any problem with FDR from edgeR. It should work >>>>> just fine. >>>>> Best wishes >>>>> Gordon >>>>> ----------------------------------------------- >>>>> Associate Professor Gordon K Smyth, >>>>> NHMRC Senior Research Fellow, >>>>> Bioinformatics Division, Walter and Eliza Hall Institute of Medical >>>>> Research, 1G Royal Parade, Parkville, Vic 3052, Australia. >>>>> smyth at wehi.edu.au >>>>> http://www.wehi.edu.au >>>>> http://www.statsci.org/smyth >>>>> >>>>> ------------ original message --------------- >>>>> [BioC] edgeR question >>>>> Naomi Altman naomi at stat.psu.edu >>>>> Fri Jun 25 22:43:51 CEST 2010 >>>>> Hi Zhe, >>>>> 1. First normalize and then do the DE >>>>> analysis. (I found this confusing in the vignette, too.) >>>>> 2. I do not suggest using FDR at this time. The >>>>> standard FDR computations need to be adjusted for >>>>> count data. I do not think this has been worked out yet. >>>>> --Naomi >>>>> >>>>> At 12:21 PM 6/25/2010, wrote: >>>>> >>>>>> Hello, >>>>>> I am learning edgeR and would like to use it >>>>>> dealing with my Tag-seq and RNA-seq data. I have several questions: >>>>>> 1. Does the DE analysis using common >>>>>> dispersion or moderated tagwise dispersions use >>>>>> the TMM method for normalization? I am not >>>>>> sure the relationship between Setion 6 >>>>>> (Normalization) and the following sections in >>>>>> the user manual. I suppose I should normalize >>>>>> the data first, and then perform DE analysis. >>>>>> 2. Do you suggest to use P-value < 0.01? What >>>>>> about FDR < 0.05? After saving de.tagwise (> >>>>>> write.tablede.com[[1]], file = >>>>>> "/Users/Zhe/edgeR/page7", sep = "\t")), I found >>>>>> there is not a column of the FDR. How to >>>>>> calculate the FDR for each gene and save it in the output file. >>>>>> Thanks a lot. >>>>>> Best wishes, >>>>>> Zhe >> >> Naomi S. Altman 814-865-3791 (voice) >> Associate Professor >> Dept. of Statistics 814-863-7114 (fax) >> Penn State University 814-865-1348 (Statistics) >> University Park, PA 16802-2111 >> >> > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 14.5 years ago Gordon Smyth 52k

Login before adding your answer.