brief quesion on DESeq2

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 3 hours ago

United States

hi Daniel, On Mon, Apr 1, 2013 at 6:41 PM, daniel.aguirre <daniel.aguirre@cbm.uam.es>wrote: > Hi, > > I´m a little puzzled about your 'Di erential analysis of count data { the > DESeq2 package' protocol. > > I was trying it with two samples and got the DE results, then I tried the > suggested transformations: > > (being 'des' my previous results, just as it appears in the 'manual') > > dseBlind <- dse > design(dseBlind) <- formula(~ 1) > dseBlind <- estimateDispersions(dseBlind) > > rld <- rlogTransformation(dseBlind) > vsd <- varianceStabilizingTransformat**ion(dseBlind) > > At this point I had assumed that the 'rld' and 'vsd' objects are like > 'dse' but with the transformations, however whe i try to retrieve the > results I get: > > Prueba.rld.res <- results(rld) >> > Error in tail(all.vars(design(object)), 1) : > error in evaluating the argument 'x' in selecting a method for function > 'tail': Error in (function (classes, fdef, mtable) : > unable to find an inherited method for function design for signature > "SummarizedExperiment" > > Am I missing something, should I instead use the 'rld' or 'vsd' objects > with my DE analysis somehow??? > > Both varianceStabilizingTransformation and rlogTransformation return SummarizedExperiment objects: see the value section of the man pages for these functions, and the transformed values are accessed using the assay() accessor, see the GenomicRanges manual pages on SummarizedExperiment. (you can do class(dse) or class(rld) to see what kind of object you have) Section 7 and 8 in the vignette no longer have to do with DE analysis, maybe we should make this more clear in the vignette. Here we describe optional transformations of the data which might be useful for other applications, such as clustering, which might give nicer results when the variance is relatively constant across the range of values. For example we show a hierarchical clustering of the samples by transformed values in Figure 8 of the vignette. > many many thanks!! > > (also, I assume that the aanlysis takes into account differences in > library depth and hence normalizes in this regard?) > > if I have several conditions (only one sample each though) should I > counduct pairwise analyses or would it be better to pool them together so > that the dispersion model is better? how would the formula be written in > that case? > cheers! if you have several conditions for one factor, we address this in Section G of the vignette on multi-level conditions. You just need to specify which level is the base level. Then in the DE analysis, the other two levels will be compared against this one. We are working to implement the contrasts between all 3. If you have only one replicate per condition, you can treat the samples as replicates in order to calculate dispersion. In the original DESeq paper, they advise, "While one may not want to draw strong conclusions from such an analysis, it may still be useful for exploration and hypothesis generation." This is done automatically for the 2 sample case, but I still need to generalize this code. You can use the code below in the meantime: The recommended pipeline then, for three samples with something like colData(dse)$condition <- factor(c("ctrl","A","B"), levels=c("ctrl","A","B")), would be: design(dse) <- ~ 1 dse <- estimateSizeFactors(dse) dse <- estimateDispersions(dse) design(dse) <- ~ condition dse <- nbinomWaldTest(dse) resultsNames(dse) # prints out the names of the variables in the final model results(dse,"conditionA") # gets the table of logFC, p-values and FDRs for a single variable results(dse,"conditionB") Mike [[alternative HTML version deleted]]

Clustering GenomicRanges DESeq2 Clustering GenomicRanges DESeq2 • 2.9k views

ADD COMMENT • link updated 11.8 years ago by daniel.aguirre ▴ 20 • written 11.8 years ago by Michael Love 43k

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 4 months ago

EMBL European Molecular Biology Laborat…

Dear Daniel sorry for lecturing, but...: cross-posting the same question on different mailing lists (here: SEQanswers and Bioconductor, and who knows where else) is considered bad form by many. It requires those that might be willing to help do extra work, or alternatively, if they decide not to do that, leaves some of these threads dangling, potentially confusing future users who find such a thread on a web search. Please, please, let's be considerate of other people's time. Best wishes Wolfgang El Apr 2, 2013, a las 5:37 pm, Michael Love <michaelisaiahlove at="" gmail.com=""> escribi?: > hi Daniel, > > On Mon, Apr 1, 2013 at 6:41 PM, daniel.aguirre <daniel.aguirre at="" cbm.uam.es="">wrote: > >> Hi, >> >> I?m a little puzzled about your 'Di erential analysis of count data { the >> DESeq2 package' protocol. >> >> I was trying it with two samples and got the DE results, then I tried the >> suggested transformations: >> >> (being 'des' my previous results, just as it appears in the 'manual') >> >> dseBlind <- dse >> design(dseBlind) <- formula(~ 1) >> dseBlind <- estimateDispersions(dseBlind) >> >> rld <- rlogTransformation(dseBlind) >> vsd <- varianceStabilizingTransformat**ion(dseBlind) >> >> At this point I had assumed that the 'rld' and 'vsd' objects are like >> 'dse' but with the transformations, however whe i try to retrieve the >> results I get: >> >> Prueba.rld.res <- results(rld) >>> >> Error in tail(all.vars(design(object)), 1) : >> error in evaluating the argument 'x' in selecting a method for function >> 'tail': Error in (function (classes, fdef, mtable) : >> unable to find an inherited method for function ?design? for signature >> ?"SummarizedExperiment"? >> >> Am I missing something, should I instead use the 'rld' or 'vsd' objects >> with my DE analysis somehow??? >> >> > Both varianceStabilizingTransformation and rlogTransformation return > SummarizedExperiment objects: see the value section of the man pages for > these functions, and the transformed values are accessed using the assay() > accessor, see the GenomicRanges manual pages on SummarizedExperiment. (you > can do class(dse) or class(rld) to see what kind of object you have) > > Section 7 and 8 in the vignette no longer have to do with DE analysis, > maybe we should make this more clear in the vignette. Here we describe > optional transformations of the data which might be useful for other > applications, such as clustering, which might give nicer results when the > variance is relatively constant across the range of values. For example we > show a hierarchical clustering of the samples by transformed values in > Figure 8 of the vignette. > > > >> many many thanks!! >> >> (also, I assume that the aanlysis takes into account differences in >> library depth and hence normalizes in this regard?) >> >> if I have several conditions (only one sample each though) should I >> counduct pairwise analyses or would it be better to pool them together so >> that the dispersion model is better? how would the formula be written in >> that case? >> cheers! > > > if you have several conditions for one factor, we address this in Section G > of the vignette on multi-level conditions. You just need to specify which > level is the base level. Then in the DE analysis, the other two levels > will be compared against this one. We are working to implement the > contrasts between all 3. > > If you have only one replicate per condition, you can treat the samples as > replicates in order to calculate dispersion. In the original DESeq paper, > they advise, "While one may not want to draw strong conclusions from such > an analysis, it may still be useful for exploration and hypothesis > generation." This is done automatically for the 2 sample case, but I still > need to generalize this code. You can use the code below in the meantime: > > The recommended pipeline then, for three samples with something like > colData(dse)$condition <- factor(c("ctrl","A","B"), > levels=c("ctrl","A","B")), would be: > > design(dse) <- ~ 1 > dse <- estimateSizeFactors(dse) > dse <- estimateDispersions(dse) > design(dse) <- ~ condition > dse <- nbinomWaldTest(dse) > resultsNames(dse) # prints out the names of the variables in the final model > results(dse,"conditionA") # gets the table of logFC, p-values and FDRs for > a single variable > results(dse,"conditionB") > > Mike > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 11.8 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

daniel.aguirre ▴ 20

@danielaguirre-5880

Last seen 10.4 years ago

On Tue, 2 Apr 2013 17:37:11 +0200, Michael Love wrote: > hi Daniel, > > On Mon, Apr 1, 2013 at 6:41 PM, daniel.aguirre wrote: > >> Hi, >> >> I?m a little puzzled about your 'Di erential analysis of count >> data { the >> DESeq2 package' protocol. >> >> I was trying it with two samples and got the DE results, then I >> tried the suggested transformations: >> >> (being 'des' my previous results, just as it appears in the >> 'manual') >> >> dseBlind > > Both varianceStabilizingTransformation and rlogTransformation return > SummarizedExperiment objects: see the value section of the man pages > for these functions, and the transformed values are accessed using > the > assay() accessor, see the GenomicRanges manual pages on > SummarizedExperiment. (you can do class(dse) or class(rld) to see > what > kind of object you have) > > Section 7 and 8 in the vignette no longer have to do with DE > analysis, > maybe we should make this more clear in the vignette. Here we > describe > optional transformations of the data which might be useful for other > applications, such as clustering, which might give nicer results when > the variance is relatively constant across the range of values. For > example we show a hierarchical clustering of the samples by > transformed values in Figure 8 of the vignette. > > ? > >> many many thanks!! >> >> (also, I assume that the aanlysis takes into account differences in >> library depth and hence normalizes in this regard?) >> >> if I have several conditions (only one sample each though) should I >> counduct pairwise analyses or would it be better to pool them >> together so that the dispersion model is better? how would the >> formula be written in that case? >> cheers! > > if you have several conditions for one factor, we address this in > Section G of the vignette on multi-level conditions. You just need to > specify which level is the base level.? Then in the DE analysis, the > other two levels will be compared against this one. We are working to > implement the contrasts between all 3. > > If you have only one replicate per condition, you can treat the > samples as replicates in order to calculate dispersion. In the > original DESeq paper, they advise, "While one may not want to draw > strong conclusions from such an analysis, it may still be useful for > exploration and hypothesis generation." This is done automatically > for > the 2 sample case, but I still need to generalize this code. You can > use the code below in the meantime: > > The recommended pipeline then, for three samples with something like > colData(dse)$condition > > Links: > ------ > [1] mailto:daniel.aguirre at cbm.uam.es Hi again and Many thanks, it appears to have worked fine! I was expecting that (having no replicates, for several conditions of a single factor/variable) using a dispersion model built from all my samples will increase the number of DE genes found with respect to a single pairwise analysis. However I found the contrary; e.g. comparing Condition A vs. B (one replicate each) [using DESeqSummarizedExperimentFromHTSeqCount with A and B only], I find 56 DE genes (pairwise comparison), when I build the dispersion model etc. including other two samples the comparison between A and B yields only 13 [here using DESeqSummarizedExperimentFromHTSeqCount with A,B,C,D following the proposed pipeline and then retrieving results for A vs B]. Would you care to comment on this? Could you give me a hint as to why this happens and I was wrong to assume te contrary) cheers, D

ADD COMMENT • link 11.8 years ago daniel.aguirre ▴ 20

0

Entering edit mode

Hi, On Wed, Apr 10, 2013 at 1:38 AM, daniel.aguirre <daniel.aguirre@cbm.uam.es>wrote: [snip] I was expecting that (having no replicates, for several conditions of a > single factor/variable) using a dispersion model built from all my samples > will increase the number of DE genes found with respect to a single > pairwise analysis. > > However I found the contrary; e.g. comparing Condition A vs. B (one > replicate each) [using DESeqSummarizedExperimentFromH**TSeqCount with A > and B only], I find 56 DE genes (pairwise comparison), when I build the > dispersion model etc. including other two samples the comparison between A > and B yields only 13 [here using DESeqSummarizedExperimentFromH**TSeqCount > with A,B,C,D following the proposed pipeline and then retrieving results > for A vs B]. > > Would you care to comment on this? Could you give me a hint as to why this > happens and I was wrong to assume te contrary) > I haven't been following this thread very closely, but briefly skimming the emails it would seem as if Michael already suggested why this would be the case. In a previous email he wrote: """If you have only one replicate per condition, you can treat the samples as replicates in order to calculate dispersion. """ I guess you are using a "blind" method (not sure if the nomenclature changed in DESeq2) to estimate your dispersions in this scenario. To do so, all of your data are assumed to be replicates of *the same condition* and the dispersion is estimated from them. Can you see why this scenario would result in inflated dispersion estimates "across the board"? If so, then there's your answer as to why you are getting fewer DE genes when you are using no-replicate data across a larger number of conditions to estimate the dispersion -- the dispersion estimates are higher as you add more samples from different conditions but are forced to assume that they are coming from the same condition. HTH, -steve -- Steve Lianoglou Defender of The Thesis | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact [[alternative HTML version deleted]]

ADD REPLY • link 11.8 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

hi, > I was expecting that (having no replicates, for several conditions of a >> single factor/variable) using a dispersion model built from all my samples >> will increase the number of DE genes found with respect to a single >> pairwise analysis. >> >> > > Adding C and D to the analysis can either decrease or increase the dispersion estimates for a given gene, for example if the counts for C and D are close to or far from the counts for A and B. Mike [[alternative HTML version deleted]]

ADD REPLY • link 11.8 years ago Michael Love 43k

0

Entering edit mode

On Wed, 10 Apr 2013 11:26:34 +0200, Michael Love wrote: > hi, > >>> I was expecting that (having no replicates, for several conditions >>> of a single factor/variable) using a dispersion model built from >>> all my samples will increase the number of DE genes found with >>> respect to a single pairwise analysis. >>> ? > > Adding C and D to the analysis can either decrease or increase the > dispersion estimates for a given gene, for example if the counts for > C > and D are close to or far from the counts for A and B. > > Mike thanks guys! D

ADD REPLY • link 11.8 years ago daniel.aguirre ▴ 20

Login before adding your answer.