EdgeR condition-specific dispersion

0

Entering edit mode

Thomas Frederick Willems ▴ 20

@thomas-frederick-willems-5527

Last seen 10.6 years ago

I'm dealing with a factorial RNA-seq data set in which cells have been stimulated with various combinations of extra-cellular cues. As such, I was interested in applying the GLM framework in edgeR to assess the contribution of each extra-cellular cue to the differential expression of certain genes. My concern, however, is that both the expression level and the dispersion of each gene varies greatly with the combination of cues. EdgeR doesn't seem to estimate condition-specific dispersion but rather one dispersion per gene (if the tagwise options is used). My question is therefore two-fold: 1) Does it make sense to want to estimate condition-specific dispersions? 2) Is there a way to modify the edgeR framework so that it does this? Thanks Thomas [[alternative HTML version deleted]]

edgeR edgeR • 1.4k views

ADD COMMENT • link updated 12.5 years ago by Robert Castelo ★ 3.4k • written 12.5 years ago by Thomas Frederick Willems ▴ 20

0

Entering edit mode

Mark Robinson ▴ 880

@mark-robinson-4908

Last seen 6.4 years ago

Hi Thomas, A couple thoughts below ? On 02.10.2012, at 19:15, Thomas Frederick Willems wrote: > I'm dealing with a factorial RNA-seq data set in which cells have been stimulated with various combinations of extra-cellular cues. As such, I was interested in applying the GLM framework in edgeR to assess the contribution of each extra-cellular cue to the differential expression of certain genes. My concern, however, is that both the expression level and the dispersion of each gene varies greatly with the combination of cues. EdgeR doesn't seem to estimate condition- specific dispersion but rather one dispersion per gene (if the tagwise options is used). My question is therefore two-fold: > 1) Does it make sense to want to estimate condition-specific dispersions? Maybe. I haven't seen too much evidence of this in data I've analyzed. Maybe you could show a compelling example? > 2) Is there a way to modify the edgeR framework so that it does this? It's not so easy. Unless I'm mistaken, the standard likelihood ratio test isn't able to handle this setting. A conservative approach would be to estimate the dispersions using the more-variable state, and use these in the DE analysis. But, maybe then your dispersion estimates are less accurate (using less data) and it doesn't buy you much in the end. A recent paper shows an extension that might be able to handle this more general situation, but I haven't figured out all the details yet: http://biostatistics.oxfordjournals.org/content/early/2012/09/16/biost atistics.kxs031.short Hope that helps. Best, Mark > > Thanks > > Thomas > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 12.5 years ago Mark Robinson ▴ 880

0

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 42 minutes ago

WEHI, Melbourne, Australia

Dear Thomas, It does make sense to estimate condition-specific dispersions, but most of the time it isn't worthwhile to do so, and the only penalty for not doing so when you could have is some loss of statistical power (fewer DE genes). It makes sense when a perturbed condition is more variable than a 'normal' condition, for example cancer tumour vs normal tissue, or knockout vs wildtype. For it to be worthwhile, there must be a substantial difference between in variability and a relatively large number of replicate samples in each group. It is almost certainly not worthwhile if you only have 2-3 replicates in each condition. I wonder how you have established that the dispersion varies with the combination of cues? By running edgeR separately on different conditions? Otherwise you might be examining standard deviations rather than dispersions, and they are not the same thing. Is the sequencing depth similar between the different conditions? If the library sizes are different, then edgeR will assign different variances to different observations, even though the dispersions might be the same. Anyway, edgeR is limited to estimating the dispersion at the gene level. It cannot be easily modified to estimate the dispersion on a condition-specific basis. On the other hand, voom (a function in the limma package) estimates observation-specific dispersions, and can be easily modified to do so in a condition-specific manner. This is part of the work of Charity Law, who is currently writing up her PhD thesis. If you really need to go in this direction, I can show you how to do so using voom. Best wishes Gordon > Date: Tue, 2 Oct 2012 17:15:47 +0000 > From: Thomas Frederick Willems <twillems at="" mit.edu=""> > To: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> > Subject: [BioC] EdgeR condition-specific dispersion > > I'm dealing with a factorial RNA-seq data set in which cells have been > stimulated with various combinations of extra-cellular cues. As such, I > was interested in applying the GLM framework in edgeR to assess the > contribution of each extra-cellular cue to the differential expression > of certain genes. My concern, however, is that both the expression level > and the dispersion of each gene varies greatly with the combination of > cues. EdgeR doesn't seem to estimate condition-specific dispersion but > rather one dispersion per gene (if the tagwise options is used). My > question is therefore two-fold: > 1) Does it make sense to want to estimate condition-specific > dispersions? > 2) Is there a way to modify the edgeR framework so that it does this? > > Thanks > Thomas ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 12.5 years ago Gordon Smyth 52k

0

Entering edit mode

Robert Castelo ★ 3.4k

@rcastelo

Last seen 8 weeks ago

Barcelona/Universitat Pompeu Fabra

Dear Thomas, if you have 10 or more samples per condition you could try the tweeDEseq package which is based on a more flexible family of count data distributions, the Poisson-Tweedie, and will estimate different dispersions and shapes per condition. the shape is a third parameter which provides additional flexibility over the negative-binomial to fit distributions with features such as heavy-tails or zero-inflation. cheers, robert. On 10/02/2012 07:15 PM, Thomas Frederick Willems wrote: > I'm dealing with a factorial RNA-seq data set in which cells have been stimulated with various combinations of extra-cellular cues. As such, I was interested in applying the GLM framework in edgeR to assess the contribution of each extra-cellular cue to the differential expression of certain genes. My concern, however, is that both the expression level and the dispersion of each gene varies greatly with the combination of cues. EdgeR doesn't seem to estimate condition-specific dispersion but rather one dispersion per gene (if the tagwise options is used). My question is therefore two-fold: > 1) Does it make sense to want to estimate condition-specific dispersions? > 2) Is there a way to modify the edgeR framework so that it does this? > > Thanks > > Thomas > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Robert Castelo, PhD Associate Professor Dept. of Experimental and Health Sciences Universitat Pompeu Fabra (UPF) Barcelona Biomedical Research Park (PRBB) Dr Aiguader 88 E-08003 Barcelona, Spain telf: +34.933.160.514 fax: +34.933.160.550

ADD COMMENT • link 12.5 years ago Robert Castelo ★ 3.4k

Login before adding your answer.