multi-level, multi-factor results interpretation.
1
0
Entering edit mode
@ingrid-lindquist-5181
Last seen 10.4 years ago
Hi Simon, I am working with a multi-factor experiment that, in a sense, doesn't have biological replicates (replicates were done at different times, and the differences in the chemistry/instrumentation is enough to be a rather large component of variance) and has three levels. By three levels, I mean there are 2 different treatments and 1 control. When I push the workflow through, everything works fine, but I'm having a hard time delineating which level is responsible for a gene being identified as significantly differentially expressed. I understand that (in the following example) ConditionX, ConditionY, AlternativeVariableold are log2 fold changes in relation to "ctrl" (in the case of condition) or "new" (in the case of AlternativeVariable). Can I assume that the largest FC of these three fields within each gene instance is likely the reason that gene had a padj within significance? My plan is to separate out the dataset so that I'm only running 2 levels at once (x vs ctrl; y vs ctrl), but I'm hoping you can shed light on how I can interpret these results I've mentioned here. The output I'm referring to has the following headers (generated by the last line (write.table) in my workflow below): Gene Pval Padj Intercept ConditionX ConditionY AlternativeVariableold Deviance Converged My workflow: counts <- (file, header=TRUE, row.names=1) Design<- data.frame( + row.names = colnames(SampledCounts), + Condition = c( "x", "ctrl", "ctrl", "x", "y", "y"), + AlterativeVariable= c( "old", "new", "old", "new", "new", "old")) library(DESeq) cdsFull <- newCountDataSet (counts, Design) cdsFull <-estimateSizeFactors(cdsFull) cdsFull <-estimateDispersions(cdsFull, method="blind", sharingMode="fit-only") fit1<-fitNbinomGLMs(cdsFull, count ~ condition + AlternativeVariable) fit0<-fitNbinomGLMs(cdsFull, count ~ AlternativeVariable) pvalsGLM <-nbinomGLMTest(fit1, fit0) padjGLM <-p.adjust(pvalsGLM, method = "BH") write.table(data.frame(geneID=row.names(counts(cdsFull)), pval=pvalsGLM, padj=padjGLM, fit1), file= "result.txt") Thanks for your help, Ingrid
• 1.2k views
ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 4 months ago
EMBL European Molecular Biology Laborat…
Dear Ingrid I wonder whether you really sent the code that you ran, because 'AlterativeVariable' is sometimes spellt with 'n' and sometimes without, and R wouldn't normally tolerate that. The question you are asking is not specific for NGS data or for DESeq, but rather concerns multivariate and multilevel ANOVA in general. I see two main options, as you suggest: 1. test for more specific hypotheses, such as ctrl vs x and ctrl vs y separately. 2. do a scatterplot with your significant hits (ctrl-vs-x on one axis, and ctrl-vs-y on the other; possibly old vs new on the third) to get an overview of the hits and cluster them into subgroups. Best wishes Wolfgang Apr/20/12 11:37 PM, Ingrid Lindquist scripsit:: > Hi Simon, > > I am working with a multi-factor experiment that, in a sense, doesn't > have biological replicates (replicates were done at different times, and > the differences in the chemistry/instrumentation is enough to be a > rather large component of variance) and has three levels. By three > levels, I mean there are 2 different treatments and 1 control. When I > push the workflow through, everything works fine, but I'm having a hard > time delineating which level is responsible for a gene being identified > as significantly differentially expressed. I understand that (in the > following example) ConditionX, ConditionY, AlternativeVariableold are > log2 fold changes in relation to "ctrl" (in the case of condition) or > "new" (in the case of AlternativeVariable). Can I assume that the > largest FC of these three fields within each gene instance is likely the > reason that gene had a padj within significance? My plan is to separate > out the dataset so that I'm only running 2 levels at once (x vs ctrl; y > vs ctrl), but I'm hoping you can shed light on how I can interpret these > results I've mentioned here. > > The output I'm referring to has the following headers (generated by the > last line (write.table) in my workflow below): > > Gene Pval Padj Intercept ConditionX ConditionY AlternativeVariableold > Deviance Converged > > > My workflow: > > counts <- (file, header=TRUE, row.names=1) > > Design<- data.frame( > + row.names = colnames(SampledCounts), > + Condition = c( "x", "ctrl", "ctrl", "x", "y", "y"), > + AlterativeVariable= c( "old", "new", "old", "new", "new", "old")) > > library(DESeq) > > cdsFull <- newCountDataSet (counts, Design) > cdsFull <-estimateSizeFactors(cdsFull) > cdsFull <-estimateDispersions(cdsFull, method="blind", > sharingMode="fit-only") > fit1<-fitNbinomGLMs(cdsFull, count ~ condition + AlternativeVariable) > fit0<-fitNbinomGLMs(cdsFull, count ~ AlternativeVariable) > pvalsGLM <-nbinomGLMTest(fit1, fit0) > padjGLM <-p.adjust(pvalsGLM, method = "BH") > write.table(data.frame(geneID=row.names(counts(cdsFull)), pval=pvalsGLM, > padj=padjGLM, fit1), file= "result.txt") > > Thanks for your help, > Ingrid > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Best wishes Wolfgang Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber
ADD COMMENT

Login before adding your answer.

Traffic: 641 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6