DESeq vs DESeq2 have different DEGs results

0

Entering edit mode

Catalina Aguilar Hurtado ▴ 90

@catalina-aguilar-hurtado-6554

Last seen 4.1 years ago

United States

Hi I want to compare DESeq vs DESeq2 and I am getting different number of DEGs which I will expect to be normal. But when I compare the 149 genes iD that I get with DESeq with the 869 from DESeq2 there are only ~10 genes that are in common which I don\'92t understand (using FDR <0.05 for both). I want to block the Subject effect for which I am including the reduced formula of ~1. Shouldn\'92t this two methods output similar results? Because at the moment I could interpret my results with different ways.\ \ Thanks for your help,\ \ Catalina\ \ \ This the DESeq script that I am using:\ \ \ DESeq\ \ library(DESeq)\ \ co=as.matrix(read.table("2014_04_01_6h_LP.csv",header=T, sep=",", row.names=1))\ \ \ Subject=c(1,2,3,4,5,1,2,4,5)\ \ Treatment=c(rep("co",5),rep("c2",4))\ a.con=cbind(Subject,Treatment)\ \ cds=newCountDataSet(co,a.con)\ \ \ cds <- estimateSizeFactors( cds)\ \ cds <- estimateDispersions(cds,method="pooled-CR", modelFormula=count~Subject+Treatment)\ \ \ #filtering\ \ rs = rowSums ( counts ( cds ))\ theta = 0.2\ use = (rs > quantile(rs, probs=theta))\ table(use)\ cdsFilt= cds[ use, ]\ \ \ \ fit0 <- fitNbinomGLMs (cdsFilt, count~1)\ \ fit1 <- fitNbinomGLMs (cdsFilt, count~Treatment)\ \ pvals <- nbinomGLMTest (fit1, fit0)\ \ \ padj <- p.adjust( pvals, method="BH" )\ \ padj <- data.frame(padj)\ \ row.names(padj)=row.names(cdsFilt)\ \ padj_fil <- subset (padj,padj <0.05 )\ \ dim (padj_fil)\ \ [1] 149 1\ \ \ ------------ \ library ("DESeq2")\ \ countdata=as.matrix(read.table("2014_04_01_6h_LP.csv",header=T, sep=",", row.names=1))\ \ coldata= read.table ("targets.csv", header = T, sep=",",row.names=1)\ \ coldata\ \ >Subject Treatment\ >F1 1 co\ >F2 2 co\ >F3 3 co\ >F4 4 co\ >F5 5 co\ >H1 1 c2\ >H2 2 c2\ >H4 4 c2\ >H5 5 c2\ \ dds <- DESeqDataSetFromMatrix(\ countData = countdata,\ colData = coldata,\ design = ~ Subject + Treatment)\ dds\ \ dds$Treatment <- relevel (dds$Treatment, "co")\ \ \ dds <- estimateSizeFactors( dds)\ \ dds <- estimateDispersions(dds)\ \ \ rs = rowSums ( counts ( dds ))\ theta = 0.2\ use = (rs > quantile(rs, probs=theta))\ table(use)\ ddsFilt= dds[ use, ]\ \ \ dds <- nbinomLRT(ddsFilt, full = design(dds), reduced = ~ 1)\ \ resLRT <- results(dds)\ \ sum( resLRT$padj < 0.05, na.rm=TRUE )\ \ #[1] 869} [[alternative HTML version deleted]]

DESeq DESeq2 DESeq DESeq2 • 4.1k views

ADD COMMENT • link 10.6 years ago Catalina Aguilar Hurtado ▴ 90

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 2 days ago

United States

hi Catalina, If you want to test for the treatment effect and control the subject effect you should use the reduced design of ~ subject. In the DESeq2 vignette we say: "the user provides the full formula (the formula stored in design(dds)), and a reduced formula, e.g. one which does not contain the variable of interest." For your filtering for the old version of DESeq, how did you chose the value of theta=0.2? Note that such filtering in DESeq2 is accomplished automatically by results(), and the function optimizes for the best value of theta (most adjusted p-values less than a threshold). So you do not need to pre-filter. Mike On Fri, May 16, 2014 at 1:42 AM, Catalina Aguilar Hurtado <catagui at="" gmail.com=""> wrote: > Hi I want to compare DESeq vs DESeq2 and I am getting different number of > DEGs which I will expect to be normal. But when I compare the 149 genes iD > that I get with DESeq with the 869 from DESeq2 there are only ~10 genes > that are in common which I don\'92t understand (using FDR <0.05 for both). > I want to block the Subject effect for which I am including the reduced > formula of ~1. > > Shouldn\'92t this two methods output similar results? Because at the > moment I could interpret my results with different ways.\ > \ > Thanks for your help,\ > \ > Catalina\ > \ > \ > This the DESeq script that I am using:\ > \ > \ > DESeq\ > \ > library(DESeq)\ > \ > co=as.matrix(read.table("2014_04_01_6h_LP.csv",header=T, sep=",", > row.names=1))\ > \ > \ > Subject=c(1,2,3,4,5,1,2,4,5)\ > \ > Treatment=c(rep("co",5),rep("c2",4))\ > a.con=cbind(Subject,Treatment)\ > \ > cds=newCountDataSet(co,a.con)\ > \ > \ > cds <- estimateSizeFactors( cds)\ > \ > cds <- estimateDispersions(cds,method="pooled-CR", > modelFormula=count~Subject+Treatment)\ > \ > \ > #filtering\ > \ > rs = rowSums ( counts ( cds ))\ > theta = 0.2\ > use = (rs > quantile(rs, probs=theta))\ > table(use)\ > cdsFilt= cds[ use, ]\ > \ > \ > \ > fit0 <- fitNbinomGLMs (cdsFilt, count~1)\ > \ > fit1 <- fitNbinomGLMs (cdsFilt, count~Treatment)\ > \ > pvals <- nbinomGLMTest (fit1, fit0)\ > \ > \ > padj <- p.adjust( pvals, method="BH" )\ > \ > padj <- data.frame(padj)\ > \ > row.names(padj)=row.names(cdsFilt)\ > \ > padj_fil <- subset (padj,padj <0.05 )\ > \ > dim (padj_fil)\ > \ > [1] 149 1\ > \ > \ > ------------ > \ > library ("DESeq2")\ > \ > countdata=as.matrix(read.table("2014_04_01_6h_LP.csv",header=T, sep=",", > row.names=1))\ > \ > coldata= read.table ("targets.csv", header = T, sep=",",row.names=1)\ > \ > coldata\ > \ >>Subject Treatment\ >>F1 1 co\ >>F2 2 co\ >>F3 3 co\ >>F4 4 co\ >>F5 5 co\ >>H1 1 c2\ >>H2 2 c2\ >>H4 4 c2\ >>H5 5 c2\ > \ > dds <- DESeqDataSetFromMatrix(\ > countData = countdata,\ > colData = coldata,\ > design = ~ Subject + Treatment)\ > dds\ > \ > dds$Treatment <- relevel (dds$Treatment, "co")\ > \ > \ > dds <- estimateSizeFactors( dds)\ > \ > dds <- estimateDispersions(dds)\ > \ > \ > rs = rowSums ( counts ( dds ))\ > theta = 0.2\ > use = (rs > quantile(rs, probs=theta))\ > table(use)\ > ddsFilt= dds[ use, ]\ > \ > \ > dds <- nbinomLRT(ddsFilt, full = design(dds), reduced = ~ 1)\ > \ > resLRT <- results(dds)\ > \ > sum( resLRT$padj < 0.05, na.rm=TRUE )\ > \ > #[1] 869} > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 10.6 years ago Michael Love 43k

0

Entering edit mode

Hi Mike, Thanks for your reply. I had read the manual and did try the reduced design ~Subject and got 5 DEGs. I tried with both DESeq and DESeq2. My data has a stronger effect of the subject compare to the treatment, if I do a heatmap the samples group by subject rather than by treatment. The filtering theta value was choose based on a plot of "Number of DEGs vs theta" when using filtered_R in genefilter. Thanks for pointing out that results() was doing automatically. I have just run it again without pre-filter and got 812 DEGs. Still my concern is why the DEGs that I get in DESeq disappear in the DESeq2 results? Thanks again, Catalina On Fri, May 16, 2014 at 10:35 PM, Michael Love <michaelisaiahlove@gmail.com>wrote: > hi Catalina, > > If you want to test for the treatment effect and control the subject > effect you should use the reduced design of ~ subject. > > In the DESeq2 vignette we say: "the user provides the full formula > (the formula stored in design(dds)), and a reduced formula, e.g. one > which does not contain the variable of interest." > > For your filtering for the old version of DESeq, how did you chose the > value of theta=0.2? Note that such filtering in DESeq2 is accomplished > automatically by results(), and the function optimizes for the best > value of theta (most adjusted p-values less than a threshold). So you > do not need to pre-filter. > > Mike > > > On Fri, May 16, 2014 at 1:42 AM, Catalina Aguilar Hurtado > <catagui@gmail.com> wrote: > > Hi I want to compare DESeq vs DESeq2 and I am getting different number of > > DEGs which I will expect to be normal. But when I compare the 149 genes > iD > > that I get with DESeq with the 869 from DESeq2 there are only ~10 genes > > that are in common which I don\'92t understand (using FDR <0.05 for > both). > > I want to block the Subject effect for which I am including the reduced > > formula of ~1. > > > > Shouldn\'92t this two methods output similar results? Because at the > > moment I could interpret my results with different ways.\ > > \ > > Thanks for your help,\ > > \ > > Catalina\ > > \ > > \ > > This the DESeq script that I am using:\ > > \ > > \ > > DESeq\ > > \ > > library(DESeq)\ > > \ > > co=as.matrix(read.table("2014_04_01_6h_LP.csv",header=T, sep=",", > > row.names=1))\ > > \ > > \ > > Subject=c(1,2,3,4,5,1,2,4,5)\ > > \ > > Treatment=c(rep("co",5),rep("c2",4))\ > > a.con=cbind(Subject,Treatment)\ > > \ > > cds=newCountDataSet(co,a.con)\ > > \ > > \ > > cds <- estimateSizeFactors( cds)\ > > \ > > cds <- estimateDispersions(cds,method="pooled-CR", > > modelFormula=count~Subject+Treatment)\ > > \ > > \ > > #filtering\ > > \ > > rs = rowSums ( counts ( cds ))\ > > theta = 0.2\ > > use = (rs > quantile(rs, probs=theta))\ > > table(use)\ > > cdsFilt= cds[ use, ]\ > > \ > > \ > > \ > > fit0 <- fitNbinomGLMs (cdsFilt, count~1)\ > > \ > > fit1 <- fitNbinomGLMs (cdsFilt, count~Treatment)\ > > \ > > pvals <- nbinomGLMTest (fit1, fit0)\ > > \ > > \ > > padj <- p.adjust( pvals, method="BH" )\ > > \ > > padj <- data.frame(padj)\ > > \ > > row.names(padj)=row.names(cdsFilt)\ > > \ > > padj_fil <- subset (padj,padj <0.05 )\ > > \ > > dim (padj_fil)\ > > \ > > [1] 149 1\ > > \ > > \ > > ------------ > > \ > > library ("DESeq2")\ > > \ > > countdata=as.matrix(read.table("2014_04_01_6h_LP.csv",header=T, sep=",", > > row.names=1))\ > > \ > > coldata= read.table ("targets.csv", header = T, sep=",",row.names=1)\ > > \ > > coldata\ > > \ > >>Subject Treatment\ > >>F1 1 co\ > >>F2 2 co\ > >>F3 3 co\ > >>F4 4 co\ > >>F5 5 co\ > >>H1 1 c2\ > >>H2 2 c2\ > >>H4 4 c2\ > >>H5 5 c2\ > > \ > > dds <- DESeqDataSetFromMatrix(\ > > countData = countdata,\ > > colData = coldata,\ > > design = ~ Subject + Treatment)\ > > dds\ > > \ > > dds$Treatment <- relevel (dds$Treatment, "co")\ > > \ > > \ > > dds <- estimateSizeFactors( dds)\ > > \ > > dds <- estimateDispersions(dds)\ > > \ > > \ > > rs = rowSums ( counts ( dds ))\ > > theta = 0.2\ > > use = (rs > quantile(rs, probs=theta))\ > > table(use)\ > > ddsFilt= dds[ use, ]\ > > \ > > \ > > dds <- nbinomLRT(ddsFilt, full = design(dds), reduced = ~ 1)\ > > \ > > resLRT <- results(dds)\ > > \ > > sum( resLRT$padj < 0.05, na.rm=TRUE )\ > > \ > > #[1] 869} > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *Catalina Aguilar Hurtado* PhD Candidate ARC Centre of Excellence for Coral Reef Studies Department of Biochemistry and Molecular Biology AIMS@JCU James Cook University Townsville, QLD 4811, Australia ph: +61 7 4781 6009 [[alternative HTML version deleted]]

ADD REPLY • link 10.6 years ago Catalina Aguilar Hurtado ▴ 90

0

Entering edit mode

Hi Catalina, What are the numbers exactly and the intersection for using a reduced design of ~ subject for both, and not filtering in either case (independentFiltering = FALSE)? Mike On May 18, 2014 10:34 PM, "Catalina Aguilar Hurtado" <catagui@gmail.com> wrote: > Hi Mike, > > Thanks for your reply. > > I had read the manual and did try the reduced design ~Subject and got 5 > DEGs. I tried with both DESeq and DESeq2. > > My data has a stronger effect of the subject compare to the treatment, if I > do a heatmap the samples group by subject rather than by treatment. > > The filtering theta value was choose based on a plot of "Number of DEGs vs > theta" when using filtered_R in genefilter. Thanks for pointing out that > results() was doing automatically. I have just run it again without > pre-filter and got 812 DEGs. > > Still my concern is why the DEGs that I get in DESeq disappear in the > DESeq2 results? > > Thanks again, > > Catalina > > > On Fri, May 16, 2014 at 10:35 PM, Michael Love > <michaelisaiahlove@gmail.com>wrote: > > > hi Catalina, > > > > If you want to test for the treatment effect and control the subject > > effect you should use the reduced design of ~ subject. > > > > In the DESeq2 vignette we say: "the user provides the full formula > > (the formula stored in design(dds)), and a reduced formula, e.g. one > > which does not contain the variable of interest." > > > > For your filtering for the old version of DESeq, how did you chose the > > value of theta=0.2? Note that such filtering in DESeq2 is accomplished > > automatically by results(), and the function optimizes for the best > > value of theta (most adjusted p-values less than a threshold). So you > > do not need to pre-filter. > > > > Mike > > > > > > On Fri, May 16, 2014 at 1:42 AM, Catalina Aguilar Hurtado > > <catagui@gmail.com> wrote: > > > Hi I want to compare DESeq vs DESeq2 and I am getting different number > of > > > DEGs which I will expect to be normal. But when I compare the 149 genes > > iD > > > that I get with DESeq with the 869 from DESeq2 there are only ~10 genes > > > that are in common which I don\'92t understand (using FDR <0.05 for > > both). > > > I want to block the Subject effect for which I am including the reduced > > > formula of ~1. > > > > > > Shouldn\'92t this two methods output similar results? Because at the > > > moment I could interpret my results with different ways.\ > > > \ > > > Thanks for your help,\ > > > \ > > > Catalina\ > > > \ > > > \ > > > This the DESeq script that I am using:\ > > > \ > > > \ > > > DESeq\ > > > \ > > > library(DESeq)\ > > > \ > > > co=as.matrix(read.table("2014_04_01_6h_LP.csv",header=T, sep=",", > > > row.names=1))\ > > > \ > > > \ > > > Subject=c(1,2,3,4,5,1,2,4,5)\ > > > \ > > > Treatment=c(rep("co",5),rep("c2",4))\ > > > a.con=cbind(Subject,Treatment)\ > > > \ > > > cds=newCountDataSet(co,a.con)\ > > > \ > > > \ > > > cds <- estimateSizeFactors( cds)\ > > > \ > > > cds <- estimateDispersions(cds,method="pooled-CR", > > > modelFormula=count~Subject+Treatment)\ > > > \ > > > \ > > > #filtering\ > > > \ > > > rs = rowSums ( counts ( cds ))\ > > > theta = 0.2\ > > > use = (rs > quantile(rs, probs=theta))\ > > > table(use)\ > > > cdsFilt= cds[ use, ]\ > > > \ > > > \ > > > \ > > > fit0 <- fitNbinomGLMs (cdsFilt, count~1)\ > > > \ > > > fit1 <- fitNbinomGLMs (cdsFilt, count~Treatment)\ > > > \ > > > pvals <- nbinomGLMTest (fit1, fit0)\ > > > \ > > > \ > > > padj <- p.adjust( pvals, method="BH" )\ > > > \ > > > padj <- data.frame(padj)\ > > > \ > > > row.names(padj)=row.names(cdsFilt)\ > > > \ > > > padj_fil <- subset (padj,padj <0.05 )\ > > > \ > > > dim (padj_fil)\ > > > \ > > > [1] 149 1\ > > > \ > > > \ > > > ------------ > > > \ > > > library ("DESeq2")\ > > > \ > > > countdata=as.matrix(read.table("2014_04_01_6h_LP.csv",header=T, > sep=",", > > > row.names=1))\ > > > \ > > > coldata= read.table ("targets.csv", header = T, sep=",",row.names=1)\ > > > \ > > > coldata\ > > > \ > > >>Subject Treatment\ > > >>F1 1 co\ > > >>F2 2 co\ > > >>F3 3 co\ > > >>F4 4 co\ > > >>F5 5 co\ > > >>H1 1 c2\ > > >>H2 2 c2\ > > >>H4 4 c2\ > > >>H5 5 c2\ > > > \ > > > dds <- DESeqDataSetFromMatrix(\ > > > countData = countdata,\ > > > colData = coldata,\ > > > design = ~ Subject + Treatment)\ > > > dds\ > > > \ > > > dds$Treatment <- relevel (dds$Treatment, "co")\ > > > \ > > > \ > > > dds <- estimateSizeFactors( dds)\ > > > \ > > > dds <- estimateDispersions(dds)\ > > > \ > > > \ > > > rs = rowSums ( counts ( dds ))\ > > > theta = 0.2\ > > > use = (rs > quantile(rs, probs=theta))\ > > > table(use)\ > > > ddsFilt= dds[ use, ]\ > > > \ > > > \ > > > dds <- nbinomLRT(ddsFilt, full = design(dds), reduced = ~ 1)\ > > > \ > > > resLRT <- results(dds)\ > > > \ > > > sum( resLRT$padj < 0.05, na.rm=TRUE )\ > > > \ > > > #[1] 869} > > > > > > [[alternative HTML version deleted]] > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor@r-project.org > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > -- > *Catalina Aguilar Hurtado* > PhD Candidate > ARC Centre of Excellence for Coral Reef Studies > Department of Biochemistry and Molecular Biology > AIMS@JCU > James Cook University > Townsville, QLD 4811, Australia > ph: +61 7 4781 6009 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 10.6 years ago Michael Love 43k

0

Entering edit mode

Catalina Aguilar Hurtado ▴ 90

@catalina-aguilar-hurtado-6554

Last seen 4.1 years ago

United States

Hi Mike, Not sure what you mean by the intersection. I have run both without any filter and reduced formula of ~Subject. As you can see below, there is almost no DEGs and the only way I could get was when ~1. DESeq2: > dds <- nbinomLRT(dds, full = design(dds), reduced = ~ Subject) > resLRT <- results(dds, independentFiltering=FALSE) > sum( resLRT$padj < 0.5, na.rm=TRUE ) [1] 6 DESeq: >fit0 <- fitNbinomGLMs (cds, count~Subject) >fit1 <- fitNbinomGLMs (cds, count~Treatment) >pvals <- nbinomGLMTest (fit1, fit0) > padj <- p.adjust( pvals, method="BH" ) > padj <- data.frame(padj) > row.names(padj)=row.names(cds) > padj_fil <- subset (padj,padj <0.05 ) > dim (padj_fil) [1] 0 1 Thanks, Cata On Mon, May 19, 2014 at 10:31 PM, Michael Love <michaelisaiahlove@gmail.com>wrote: > Hi Catalina, > > What are the numbers exactly and the intersection for using a reduced > design of ~ subject for both, and not filtering in either case > (independentFiltering = FALSE)? > > Mike > On May 18, 2014 10:34 PM, "Catalina Aguilar Hurtado" <catagui@gmail.com> > wrote: > >> Hi Mike, >> >> Thanks for your reply. >> >> I had read the manual and did try the reduced design ~Subject and got 5 >> DEGs. I tried with both DESeq and DESeq2. >> >> My data has a stronger effect of the subject compare to the treatment, if >> I >> do a heatmap the samples group by subject rather than by treatment. >> >> The filtering theta value was choose based on a plot of "Number of DEGs vs >> theta" when using filtered_R in genefilter. Thanks for pointing out that >> results() was doing automatically. I have just run it again without >> pre-filter and got 812 DEGs. >> >> Still my concern is why the DEGs that I get in DESeq disappear in the >> DESeq2 results? >> >> Thanks again, >> >> Catalina >> >> >> On Fri, May 16, 2014 at 10:35 PM, Michael Love >> <michaelisaiahlove@gmail.com>wrote: >> >> > hi Catalina, >> > >> > If you want to test for the treatment effect and control the subject >> > effect you should use the reduced design of ~ subject. >> > >> > In the DESeq2 vignette we say: "the user provides the full formula >> > (the formula stored in design(dds)), and a reduced formula, e.g. one >> > which does not contain the variable of interest." >> > >> > For your filtering for the old version of DESeq, how did you chose the >> > value of theta=0.2? Note that such filtering in DESeq2 is accomplished >> > automatically by results(), and the function optimizes for the best >> > value of theta (most adjusted p-values less than a threshold). So you >> > do not need to pre-filter. >> > >> > Mike >> > >> > >> > On Fri, May 16, 2014 at 1:42 AM, Catalina Aguilar Hurtado >> > <catagui@gmail.com> wrote: >> > > Hi I want to compare DESeq vs DESeq2 and I am getting different >> number of >> > > DEGs which I will expect to be normal. But when I compare the 149 >> genes >> > iD >> > > that I get with DESeq with the 869 from DESeq2 there are only ~10 >> genes >> > > that are in common which I don\'92t understand (using FDR <0.05 for >> > both). >> > > I want to block the Subject effect for which I am including the >> reduced >> > > formula of ~1. >> > > >> > > Shouldn\'92t this two methods output similar results? Because at the >> > > moment I could interpret my results with different ways.\ >> > > \ >> > > Thanks for your help,\ >> > > \ >> > > Catalina\ >> > > \ >> > > \ >> > > This the DESeq script that I am using:\ >> > > \ >> > > \ >> > > DESeq\ >> > > \ >> > > library(DESeq)\ >> > > \ >> > > co=as.matrix(read.table("2014_04_01_6h_LP.csv",header=T, sep=",", >> > > row.names=1))\ >> > > \ >> > > \ >> > > Subject=c(1,2,3,4,5,1,2,4,5)\ >> > > \ >> > > Treatment=c(rep("co",5),rep("c2",4))\ >> > > a.con=cbind(Subject,Treatment)\ >> > > \ >> > > cds=newCountDataSet(co,a.con)\ >> > > \ >> > > \ >> > > cds <- estimateSizeFactors( cds)\ >> > > \ >> > > cds <- estimateDispersions(cds,method="pooled-CR", >> > > modelFormula=count~Subject+Treatment)\ >> > > \ >> > > \ >> > > #filtering\ >> > > \ >> > > rs = rowSums ( counts ( cds ))\ >> > > theta = 0.2\ >> > > use = (rs > quantile(rs, probs=theta))\ >> > > table(use)\ >> > > cdsFilt= cds[ use, ]\ >> > > \ >> > > \ >> > > \ >> > > fit0 <- fitNbinomGLMs (cdsFilt, count~1)\ >> > > \ >> > > fit1 <- fitNbinomGLMs (cdsFilt, count~Treatment)\ >> > > \ >> > > pvals <- nbinomGLMTest (fit1, fit0)\ >> > > \ >> > > \ >> > > padj <- p.adjust( pvals, method="BH" )\ >> > > \ >> > > padj <- data.frame(padj)\ >> > > \ >> > > row.names(padj)=row.names(cdsFilt)\ >> > > \ >> > > padj_fil <- subset (padj,padj <0.05 )\ >> > > \ >> > > dim (padj_fil)\ >> > > \ >> > > [1] 149 1\ >> > > \ >> > > \ >> > > ------------ >> > > \ >> > > library ("DESeq2")\ >> > > \ >> > > countdata=as.matrix(read.table("2014_04_01_6h_LP.csv",header=T, >> sep=",", >> > > row.names=1))\ >> > > \ >> > > coldata= read.table ("targets.csv", header = T, sep=",",row.names=1)\ >> > > \ >> > > coldata\ >> > > \ >> > >>Subject Treatment\ >> > >>F1 1 co\ >> > >>F2 2 co\ >> > >>F3 3 co\ >> > >>F4 4 co\ >> > >>F5 5 co\ >> > >>H1 1 c2\ >> > >>H2 2 c2\ >> > >>H4 4 c2\ >> > >>H5 5 c2\ >> > > \ >> > > dds <- DESeqDataSetFromMatrix(\ >> > > countData = countdata,\ >> > > colData = coldata,\ >> > > design = ~ Subject + Treatment)\ >> > > dds\ >> > > \ >> > > dds$Treatment <- relevel (dds$Treatment, "co")\ >> > > \ >> > > \ >> > > dds <- estimateSizeFactors( dds)\ >> > > \ >> > > dds <- estimateDispersions(dds)\ >> > > \ >> > > \ >> > > rs = rowSums ( counts ( dds ))\ >> > > theta = 0.2\ >> > > use = (rs > quantile(rs, probs=theta))\ >> > > table(use)\ >> > > ddsFilt= dds[ use, ]\ >> > > \ >> > > \ >> > > dds <- nbinomLRT(ddsFilt, full = design(dds), reduced = ~ 1)\ >> > > \ >> > > resLRT <- results(dds)\ >> > > \ >> > > sum( resLRT$padj < 0.05, na.rm=TRUE )\ >> > > \ >> > > #[1] 869} >> > > >> > > [[alternative HTML version deleted]] >> > > >> > > _______________________________________________ >> > > Bioconductor mailing list >> > > Bioconductor@r-project.org >> > > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > > Search the archives: >> > http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> >> >> >> -- >> *Catalina Aguilar Hurtado* >> >> PhD Candidate >> ARC Centre of Excellence for Coral Reef Studies >> Department of Biochemistry and Molecular Biology >> AIMS@JCU >> James Cook University >> Townsville, QLD 4811, Australia >> ph: +61 7 4781 6009 >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > [[alternative HTML version deleted]]

ADD COMMENT • link 10.6 years ago Catalina Aguilar Hurtado ▴ 90

0

Entering edit mode

hi Catalina, On Mon, May 19, 2014 at 8:38 PM, Catalina Aguilar Hurtado <catagui at="" gmail.com=""> wrote: > Hi Mike, > > Not sure what you mean by the intersection. I meant those genes which had padj < cutoff for both DESeq and DESeq2. You can calculate this with length(intersect(padj1rownames, padj2rownames)) It's safest to work with the rownames, to make sure you are referring to the same genes. You can get this vector, e.g. in DESeq2 with: padj2rownames <- rownames(results[which(results$padj < cutoff),]) > I have run both without any > filter and reduced formula of ~Subject. As you can see below, there is > almost no DEGs and the only way I could get was when ~1. > This means that, without independent filtering, for most genes, the treatment effect was too small to observe a significant difference for the experiment's sample size and/or sequencing depth. However, you should use independent filtering to report as many genes as possible, and I'd recommend using DESeq2 which will do this automatically within the results() function. The reason I asked for you to not use independent filtering in both is to see if in fact there were many genes called by either version of DESeq but not by both, when the settings were made comparable. Note that the ~ 1 reduced design is not what you want: OK: "we tested for the treatment effect, controlling for the subject effect" full: ~ subject + treament reduced: ~ subject not ok: "we tested for the treatment and subject effect" full: ~ subject + treament reduced: ~ 1 Mike > DESeq2: > >> dds <- nbinomLRT(dds, full = design(dds), reduced = ~ Subject) >> resLRT <- results(dds, independentFiltering=FALSE) > >> sum( resLRT$padj < 0.5, na.rm=TRUE ) > [1] 6 > > > DESeq: > > >>fit0 <- fitNbinomGLMs (cds, count~Subject) > >>fit1 <- fitNbinomGLMs (cds, count~Treatment) > >>pvals <- nbinomGLMTest (fit1, fit0) > >> padj <- p.adjust( pvals, method="BH" ) >> padj <- data.frame(padj) > >> row.names(padj)=row.names(cds) >> padj_fil <- subset (padj,padj <0.05 ) >> dim (padj_fil) > [1] 0 1 > > Thanks, > > Cata > > On Mon, May 19, 2014 at 10:31 PM, Michael Love > <michaelisaiahlove at="" gmail.com="">wrote: > >> Hi Catalina, >> >> What are the numbers exactly and the intersection for using a reduced >> design of ~ subject for both, and not filtering in either case >> (independentFiltering = FALSE)? >> >> Mike >> On May 18, 2014 10:34 PM, "Catalina Aguilar Hurtado" <catagui at="" gmail.com=""> >> wrote: >> >>> Hi Mike, >>> >>> Thanks for your reply. >>> >>> I had read the manual and did try the reduced design ~Subject and got 5 >>> DEGs. I tried with both DESeq and DESeq2. >>> >>> My data has a stronger effect of the subject compare to the treatment, if >>> I >>> do a heatmap the samples group by subject rather than by treatment. >>> >>> The filtering theta value was choose based on a plot of "Number of DEGs vs >>> theta" when using filtered_R in genefilter. Thanks for pointing out that >>> results() was doing automatically. I have just run it again without >>> pre-filter and got 812 DEGs. >>> >>> Still my concern is why the DEGs that I get in DESeq disappear in the >>> DESeq2 results? >>> >>> Thanks again, >>> >>> Catalina >>> >>> >>> On Fri, May 16, 2014 at 10:35 PM, Michael Love >>> <michaelisaiahlove at="" gmail.com="">wrote: >>> >>> > hi Catalina, >>> > >>> > If you want to test for the treatment effect and control the subject >>> > effect you should use the reduced design of ~ subject. >>> > >>> > In the DESeq2 vignette we say: "the user provides the full formula >>> > (the formula stored in design(dds)), and a reduced formula, e.g. one >>> > which does not contain the variable of interest." >>> > >>> > For your filtering for the old version of DESeq, how did you chose the >>> > value of theta=0.2? Note that such filtering in DESeq2 is accomplished >>> > automatically by results(), and the function optimizes for the best >>> > value of theta (most adjusted p-values less than a threshold). So you >>> > do not need to pre-filter. >>> > >>> > Mike >>> > >>> > >>> > On Fri, May 16, 2014 at 1:42 AM, Catalina Aguilar Hurtado >>> > <catagui at="" gmail.com=""> wrote: >>> > > Hi I want to compare DESeq vs DESeq2 and I am getting different >>> number of >>> > > DEGs which I will expect to be normal. But when I compare the 149 >>> genes >>> > iD >>> > > that I get with DESeq with the 869 from DESeq2 there are only ~10 >>> genes >>> > > that are in common which I don\'92t understand (using FDR <0.05 for >>> > both). >>> > > I want to block the Subject effect for which I am including the >>> reduced >>> > > formula of ~1. >>> > > >>> > > Shouldn\'92t this two methods output similar results? Because at the >>> > > moment I could interpret my results with different ways.\ >>> > > \ >>> > > Thanks for your help,\ >>> > > \ >>> > > Catalina\ >>> > > \ >>> > > \ >>> > > This the DESeq script that I am using:\ >>> > > \ >>> > > \ >>> > > DESeq\ >>> > > \ >>> > > library(DESeq)\ >>> > > \ >>> > > co=as.matrix(read.table("2014_04_01_6h_LP.csv",header=T, sep=",", >>> > > row.names=1))\ >>> > > \ >>> > > \ >>> > > Subject=c(1,2,3,4,5,1,2,4,5)\ >>> > > \ >>> > > Treatment=c(rep("co",5),rep("c2",4))\ >>> > > a.con=cbind(Subject,Treatment)\ >>> > > \ >>> > > cds=newCountDataSet(co,a.con)\ >>> > > \ >>> > > \ >>> > > cds <- estimateSizeFactors( cds)\ >>> > > \ >>> > > cds <- estimateDispersions(cds,method="pooled-CR", >>> > > modelFormula=count~Subject+Treatment)\ >>> > > \ >>> > > \ >>> > > #filtering\ >>> > > \ >>> > > rs = rowSums ( counts ( cds ))\ >>> > > theta = 0.2\ >>> > > use = (rs > quantile(rs, probs=theta))\ >>> > > table(use)\ >>> > > cdsFilt= cds[ use, ]\ >>> > > \ >>> > > \ >>> > > \ >>> > > fit0 <- fitNbinomGLMs (cdsFilt, count~1)\ >>> > > \ >>> > > fit1 <- fitNbinomGLMs (cdsFilt, count~Treatment)\ >>> > > \ >>> > > pvals <- nbinomGLMTest (fit1, fit0)\ >>> > > \ >>> > > \ >>> > > padj <- p.adjust( pvals, method="BH" )\ >>> > > \ >>> > > padj <- data.frame(padj)\ >>> > > \ >>> > > row.names(padj)=row.names(cdsFilt)\ >>> > > \ >>> > > padj_fil <- subset (padj,padj <0.05 )\ >>> > > \ >>> > > dim (padj_fil)\ >>> > > \ >>> > > [1] 149 1\ >>> > > \ >>> > > \ >>> > > ------------ >>> > > \ >>> > > library ("DESeq2")\ >>> > > \ >>> > > countdata=as.matrix(read.table("2014_04_01_6h_LP.csv",header=T, >>> sep=",", >>> > > row.names=1))\ >>> > > \ >>> > > coldata= read.table ("targets.csv", header = T, sep=",",row.names=1)\ >>> > > \ >>> > > coldata\ >>> > > \ >>> > >>Subject Treatment\ >>> > >>F1 1 co\ >>> > >>F2 2 co\ >>> > >>F3 3 co\ >>> > >>F4 4 co\ >>> > >>F5 5 co\ >>> > >>H1 1 c2\ >>> > >>H2 2 c2\ >>> > >>H4 4 c2\ >>> > >>H5 5 c2\ >>> > > \ >>> > > dds <- DESeqDataSetFromMatrix(\ >>> > > countData = countdata,\ >>> > > colData = coldata,\ >>> > > design = ~ Subject + Treatment)\ >>> > > dds\ >>> > > \ >>> > > dds$Treatment <- relevel (dds$Treatment, "co")\ >>> > > \ >>> > > \ >>> > > dds <- estimateSizeFactors( dds)\ >>> > > \ >>> > > dds <- estimateDispersions(dds)\ >>> > > \ >>> > > \ >>> > > rs = rowSums ( counts ( dds ))\ >>> > > theta = 0.2\ >>> > > use = (rs > quantile(rs, probs=theta))\ >>> > > table(use)\ >>> > > ddsFilt= dds[ use, ]\ >>> > > \ >>> > > \ >>> > > dds <- nbinomLRT(ddsFilt, full = design(dds), reduced = ~ 1)\ >>> > > \ >>> > > resLRT <- results(dds)\ >>> > > \ >>> > > sum( resLRT$padj < 0.05, na.rm=TRUE )\ >>> > > \ >>> > > #[1] 869} >>> > > >>> > > [[alternative HTML version deleted]] >>> > > >>> > > _______________________________________________ >>> > > Bioconductor mailing list >>> > > Bioconductor at r-project.org >>> > > https://stat.ethz.ch/mailman/listinfo/bioconductor >>> > > Search the archives: >>> > http://news.gmane.org/gmane.science.biology.informatics.conductor >>> > >>> >>> >>> >>> -- >>> *Catalina Aguilar Hurtado* >>> >>> PhD Candidate >>> ARC Centre of Excellence for Coral Reef Studies >>> Department of Biochemistry and Molecular Biology >>> AIMS at JCU >>> James Cook University >>> Townsville, QLD 4811, Australia >>> ph: +61 7 4781 6009 >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 10.6 years ago Michael Love 43k

0

Entering edit mode

Hi Mike, We have this problem with our organisms of study (coral) were there is a higher variability due to individual than due to the treatment. If I get almost (~8) no DEGs by the reduced: ~subject and I get results (879) using reduced:~1 what is wrong with it? I donât understand what is wrong if I want to compare my design with the null model of no variability. Otherwise it would mean that my whole experiment didnât work and can not get any data out of it. Thanks again, Catalina On Tue, May 20, 2014 at 10:12 PM, Michael Love <michaelisaiahlove@gmail.com>wrote: > hi Catalina, > > > On Mon, May 19, 2014 at 8:38 PM, Catalina Aguilar Hurtado > <catagui@gmail.com> wrote: > > Hi Mike, > > > > Not sure what you mean by the intersection. > > I meant those genes which had padj < cutoff for both DESeq and DESeq2. > > You can calculate this with > > length(intersect(padj1rownames, padj2rownames)) > > It's safest to work with the rownames, to make sure you are referring > to the same genes. > > You can get this vector, e.g. in DESeq2 with: > > padj2rownames <- rownames(results[which(results$padj < cutoff),]) > > > I have run both without any > > filter and reduced formula of ~Subject. As you can see below, there is > > almost no DEGs and the only way I could get was when ~1. > > > > This means that, without independent filtering, for most genes, the > treatment effect was too small to observe a significant difference for > the experiment's sample size and/or sequencing depth. However, you > should use independent filtering to report as many genes as possible, > and I'd recommend using DESeq2 which will do this automatically within > the results() function. > > The reason I asked for you to not use independent filtering in both is > to see if in fact there were many genes called by either version of > DESeq but not by both, when the settings were made comparable. > > Note that the ~ 1 reduced design is not what you want: > > OK: "we tested for the treatment effect, controlling for the subject > effect" > full: ~ subject + treament > reduced: ~ subject > > > not ok: "we tested for the treatment and subject effect" > full: ~ subject + treament > reduced: ~ 1 > > > Mike > > > > DESeq2: > > > >> dds <- nbinomLRT(dds, full = design(dds), reduced = ~ Subject) > >> resLRT <- results(dds, independentFiltering=FALSE) > > > >> sum( resLRT$padj < 0.5, na.rm=TRUE ) > > [1] 6 > > > > > > DESeq: > > > > > >>fit0 <- fitNbinomGLMs (cds, count~Subject) > > > >>fit1 <- fitNbinomGLMs (cds, count~Treatment) > > > >>pvals <- nbinomGLMTest (fit1, fit0) > > > >> padj <- p.adjust( pvals, method="BH" ) > >> padj <- data.frame(padj) > > > >> row.names(padj)=row.names(cds) > >> padj_fil <- subset (padj,padj <0.05 ) > >> dim (padj_fil) > > [1] 0 1 > > > > Thanks, > > > > Cata > > > > On Mon, May 19, 2014 at 10:31 PM, Michael Love > > <michaelisaiahlove@gmail.com>wrote: > > > >> Hi Catalina, > >> > >> What are the numbers exactly and the intersection for using a reduced > >> design of ~ subject for both, and not filtering in either case > >> (independentFiltering = FALSE)? > >> > >> Mike > >> On May 18, 2014 10:34 PM, "Catalina Aguilar Hurtado" <catagui@gmail.com> > > >> wrote: > >> > >>> Hi Mike, > >>> > >>> Thanks for your reply. > >>> > >>> I had read the manual and did try the reduced design ~Subject and got 5 > >>> DEGs. I tried with both DESeq and DESeq2. > >>> > >>> My data has a stronger effect of the subject compare to the treatment, > if > >>> I > >>> do a heatmap the samples group by subject rather than by treatment. > >>> > >>> The filtering theta value was choose based on a plot of "Number of > DEGs vs > >>> theta" when using filtered_R in genefilter. Thanks for pointing out > that > >>> results() was doing automatically. I have just run it again without > >>> pre-filter and got 812 DEGs. > >>> > >>> Still my concern is why the DEGs that I get in DESeq disappear in the > >>> DESeq2 results? > >>> > >>> Thanks again, > >>> > >>> Catalina > >>> > >>> > >>> On Fri, May 16, 2014 at 10:35 PM, Michael Love > >>> <michaelisaiahlove@gmail.com>wrote: > >>> > >>> > hi Catalina, > >>> > > >>> > If you want to test for the treatment effect and control the subject > >>> > effect you should use the reduced design of ~ subject. > >>> > > >>> > In the DESeq2 vignette we say: "the user provides the full formula > >>> > (the formula stored in design(dds)), and a reduced formula, e.g. one > >>> > which does not contain the variable of interest." > >>> > > >>> > For your filtering for the old version of DESeq, how did you chose > the > >>> > value of theta=0.2? Note that such filtering in DESeq2 is > accomplished > >>> > automatically by results(), and the function optimizes for the best > >>> > value of theta (most adjusted p-values less than a threshold). So you > >>> > do not need to pre-filter. > >>> > > >>> > Mike > >>> > > >>> > > >>> > On Fri, May 16, 2014 at 1:42 AM, Catalina Aguilar Hurtado > >>> > <catagui@gmail.com> wrote: > >>> > > Hi I want to compare DESeq vs DESeq2 and I am getting different > >>> number of > >>> > > DEGs which I will expect to be normal. But when I compare the 149 > >>> genes > >>> > iD > >>> > > that I get with DESeq with the 869 from DESeq2 there are only ~10 > >>> genes > >>> > > that are in common which I don\'92t understand (using FDR <0.05 > for > >>> > both). > >>> > > I want to block the Subject effect for which I am including the > >>> reduced > >>> > > formula of ~1. > >>> > > > >>> > > Shouldn\'92t this two methods output similar results? Because at > the > >>> > > moment I could interpret my results with different ways.\ > >>> > > \ > >>> > > Thanks for your help,\ > >>> > > \ > >>> > > Catalina\ > >>> > > \ > >>> > > \ > >>> > > This the DESeq script that I am using:\ > >>> > > \ > >>> > > \ > >>> > > DESeq\ > >>> > > \ > >>> > > library(DESeq)\ > >>> > > \ > >>> > > co=as.matrix(read.table("2014_04_01_6h_LP.csv",header=T, sep=",", > >>> > > row.names=1))\ > >>> > > \ > >>> > > \ > >>> > > Subject=c(1,2,3,4,5,1,2,4,5)\ > >>> > > \ > >>> > > Treatment=c(rep("co",5),rep("c2",4))\ > >>> > > a.con=cbind(Subject,Treatment)\ > >>> > > \ > >>> > > cds=newCountDataSet(co,a.con)\ > >>> > > \ > >>> > > \ > >>> > > cds <- estimateSizeFactors( cds)\ > >>> > > \ > >>> > > cds <- estimateDispersions(cds,method="pooled-CR", > >>> > > modelFormula=count~Subject+Treatment)\ > >>> > > \ > >>> > > \ > >>> > > #filtering\ > >>> > > \ > >>> > > rs = rowSums ( counts ( cds ))\ > >>> > > theta = 0.2\ > >>> > > use = (rs > quantile(rs, probs=theta))\ > >>> > > table(use)\ > >>> > > cdsFilt= cds[ use, ]\ > >>> > > \ > >>> > > \ > >>> > > \ > >>> > > fit0 <- fitNbinomGLMs (cdsFilt, count~1)\ > >>> > > \ > >>> > > fit1 <- fitNbinomGLMs (cdsFilt, count~Treatment)\ > >>> > > \ > >>> > > pvals <- nbinomGLMTest (fit1, fit0)\ > >>> > > \ > >>> > > \ > >>> > > padj <- p.adjust( pvals, method="BH" )\ > >>> > > \ > >>> > > padj <- data.frame(padj)\ > >>> > > \ > >>> > > row.names(padj)=row.names(cdsFilt)\ > >>> > > \ > >>> > > padj_fil <- subset (padj,padj <0.05 )\ > >>> > > \ > >>> > > dim (padj_fil)\ > >>> > > \ > >>> > > [1] 149 1\ > >>> > > \ > >>> > > \ > >>> > > ------------ > >>> > > \ > >>> > > library ("DESeq2")\ > >>> > > \ > >>> > > countdata=as.matrix(read.table("2014_04_01_6h_LP.csv",header=T, > >>> sep=",", > >>> > > row.names=1))\ > >>> > > \ > >>> > > coldata= read.table ("targets.csv", header = T, > sep=",",row.names=1)\ > >>> > > \ > >>> > > coldata\ > >>> > > \ > >>> > >>Subject Treatment\ > >>> > >>F1 1 co\ > >>> > >>F2 2 co\ > >>> > >>F3 3 co\ > >>> > >>F4 4 co\ > >>> > >>F5 5 co\ > >>> > >>H1 1 c2\ > >>> > >>H2 2 c2\ > >>> > >>H4 4 c2\ > >>> > >>H5 5 c2\ > >>> > > \ > >>> > > dds <- DESeqDataSetFromMatrix(\ > >>> > > countData = countdata,\ > >>> > > colData = coldata,\ > >>> > > design = ~ Subject + Treatment)\ > >>> > > dds\ > >>> > > \ > >>> > > dds$Treatment <- relevel (dds$Treatment, "co")\ > >>> > > \ > >>> > > \ > >>> > > dds <- estimateSizeFactors( dds)\ > >>> > > \ > >>> > > dds <- estimateDispersions(dds)\ > >>> > > \ > >>> > > \ > >>> > > rs = rowSums ( counts ( dds ))\ > >>> > > theta = 0.2\ > >>> > > use = (rs > quantile(rs, probs=theta))\ > >>> > > table(use)\ > >>> > > ddsFilt= dds[ use, ]\ > >>> > > \ > >>> > > \ > >>> > > dds <- nbinomLRT(ddsFilt, full = design(dds), reduced = ~ 1)\ > >>> > > \ > >>> > > resLRT <- results(dds)\ > >>> > > \ > >>> > > sum( resLRT$padj < 0.05, na.rm=TRUE )\ > >>> > > \ > >>> > > #[1] 869} > >>> > > > >>> > > [[alternative HTML version deleted]] > >>> > > > >>> > > _______________________________________________ > >>> > > Bioconductor mailing list > >>> > > Bioconductor@r-project.org > >>> > > https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> > > Search the archives: > >>> > http://news.gmane.org/gmane.science.biology.informatics.conductor > >>> > > >>> > >>> > >>> > >>> -- > >>> *Catalina Aguilar Hurtado* > >>> > >>> PhD Candidate > >>> ARC Centre of Excellence for Coral Reef Studies > >>> Department of Biochemistry and Molecular Biology > >>> AIMS@JCU > >>> James Cook University > >>> Townsville, QLD 4811, Australia > >>> ph: +61 7 4781 6009 > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> _______________________________________________ > >>> Bioconductor mailing list > >>> Bioconductor@r-project.org > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> Search the archives: > >>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >>> > >> > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 10.6 years ago Catalina Aguilar Hurtado ▴ 90

0

Entering edit mode

Hi Catalina On 21/05/14 03:03, Catalina Aguilar Hurtado wrote: > If I get almost (~8) no DEGs by the reduced: ~subject and I get results > (879) using reduced:~1 what is wrong with it? > > I don???t understand what is wrong if I want to compare my design with the > null model of no variability. You seem to seriously misunderstand what the models mean. Your two model formulas were: >> full: ~ subject + treament >> reduced: ~ 1 This way, DESeq2 will report to you all genes that seem to be significantly affected by any of the factor that are present in the full model and absent in the reduced model, i.e., subject and treatment. Hence, you get all genes that differ between subjects _or_ that respond to treatment. But you don't want to see genes that differ between subjects without being affected by treatment, and this is why "subject" has to appear in the reduced model, too: If you do what Mike suggested >> full: ~ subject + treatment >> reduced: ~ subject you account for "subject" in both models. The difference between models is only "treatment", i.e. you get the genes that respond to treatment. I should add that all this is nothing specific two DESeq2, but just usual linear modelling. Any statistics text-book with a chapter on ANOVA will help you. Also, look up the difference between an unpaired and a paired t test. In RNA-Seq analysis, a simple t test is underpowered, unless you have many samples, and also not quite appropriate, because count data is not normally distributed. But as most people are familiar with t tests, it helps to know that the comparison full: ~ treatment reduced: ~ 1 is essentially the same as an unpaired two-sample t test, and full: ~ subject + treatment reduced: ~ subject is the same as a paired t test. Hence, reminding yourself about when and why one uses paired t test might be illuminating. Simon

ADD REPLY • link 10.6 years ago Simon Anders ★ 3.8k

0

Entering edit mode

hi Catalina, Simon addressed the question about models, but I want to follow up on another point. On Tue, May 20, 2014 at 9:03 PM, Catalina Aguilar Hurtado <catagui at="" gmail.com=""> wrote: > > Otherwise it would mean that my whole experiment didn?t work and can not > get any data out of it. > in the sense that it was underpowered to detect changes from the 5 control to the 4 treated samples at an FDR cutoff of 0.05. Note that 1/20 is an arbitrary cutoff. You might also consider valuable to create lists of DE genes with 1/5 false discoveries for follow up study, although it might be the case that there are few genes at this threshold as well. Furthermore, you can still use the experiment for exploratory analysis. For instance, examining the MA plot will give you insights into how many genes had very high or low fold changes. You can rank the results table by absolute value of log2FoldChange, and prioritize these genes for follow up study. Mike > Thanks again, > > Catalina > > > On Tue, May 20, 2014 at 10:12 PM, Michael Love > <michaelisaiahlove at="" gmail.com="">wrote: > >> hi Catalina, >> >> >> On Mon, May 19, 2014 at 8:38 PM, Catalina Aguilar Hurtado >> <catagui at="" gmail.com=""> wrote: >> > Hi Mike, >> > >> > Not sure what you mean by the intersection. >> >> I meant those genes which had padj < cutoff for both DESeq and DESeq2. >> >> You can calculate this with >> >> length(intersect(padj1rownames, padj2rownames)) >> >> It's safest to work with the rownames, to make sure you are referring >> to the same genes. >> >> You can get this vector, e.g. in DESeq2 with: >> >> padj2rownames <- rownames(results[which(results$padj < cutoff),]) >> >> > I have run both without any >> > filter and reduced formula of ~Subject. As you can see below, there is >> > almost no DEGs and the only way I could get was when ~1. >> > >> >> This means that, without independent filtering, for most genes, the >> treatment effect was too small to observe a significant difference for >> the experiment's sample size and/or sequencing depth. However, you >> should use independent filtering to report as many genes as possible, >> and I'd recommend using DESeq2 which will do this automatically within >> the results() function. >> >> The reason I asked for you to not use independent filtering in both is >> to see if in fact there were many genes called by either version of >> DESeq but not by both, when the settings were made comparable. >> >> Note that the ~ 1 reduced design is not what you want: >> >> OK: "we tested for the treatment effect, controlling for the subject >> effect" >> full: ~ subject + treament >> reduced: ~ subject >> >> >> not ok: "we tested for the treatment and subject effect" >> full: ~ subject + treament >> reduced: ~ 1 >> >> >> Mike >> >> >> > DESeq2: >> > >> >> dds <- nbinomLRT(dds, full = design(dds), reduced = ~ Subject) >> >> resLRT <- results(dds, independentFiltering=FALSE) >> > >> >> sum( resLRT$padj < 0.5, na.rm=TRUE ) >> > [1] 6 >> > >> > >> > DESeq: >> > >> > >> >>fit0 <- fitNbinomGLMs (cds, count~Subject) >> > >> >>fit1 <- fitNbinomGLMs (cds, count~Treatment) >> > >> >>pvals <- nbinomGLMTest (fit1, fit0) >> > >> >> padj <- p.adjust( pvals, method="BH" ) >> >> padj <- data.frame(padj) >> > >> >> row.names(padj)=row.names(cds) >> >> padj_fil <- subset (padj,padj <0.05 ) >> >> dim (padj_fil) >> > [1] 0 1 >> > >> > Thanks, >> > >> > Cata >> > >> > On Mon, May 19, 2014 at 10:31 PM, Michael Love >> > <michaelisaiahlove at="" gmail.com="">wrote: >> > >> >> Hi Catalina, >> >> >> >> What are the numbers exactly and the intersection for using a reduced >> >> design of ~ subject for both, and not filtering in either case >> >> (independentFiltering = FALSE)? >> >> >> >> Mike >> >> On May 18, 2014 10:34 PM, "Catalina Aguilar Hurtado" <catagui at="" gmail.com="">> > >> >> wrote: >> >> >> >>> Hi Mike, >> >>> >> >>> Thanks for your reply. >> >>> >> >>> I had read the manual and did try the reduced design ~Subject and got 5 >> >>> DEGs. I tried with both DESeq and DESeq2. >> >>> >> >>> My data has a stronger effect of the subject compare to the treatment, >> if >> >>> I >> >>> do a heatmap the samples group by subject rather than by treatment. >> >>> >> >>> The filtering theta value was choose based on a plot of "Number of >> DEGs vs >> >>> theta" when using filtered_R in genefilter. Thanks for pointing out >> that >> >>> results() was doing automatically. I have just run it again without >> >>> pre-filter and got 812 DEGs. >> >>> >> >>> Still my concern is why the DEGs that I get in DESeq disappear in the >> >>> DESeq2 results? >> >>> >> >>> Thanks again, >> >>> >> >>> Catalina >> >>> >> >>> >> >>> On Fri, May 16, 2014 at 10:35 PM, Michael Love >> >>> <michaelisaiahlove at="" gmail.com="">wrote: >> >>> >> >>> > hi Catalina, >> >>> > >> >>> > If you want to test for the treatment effect and control the subject >> >>> > effect you should use the reduced design of ~ subject. >> >>> > >> >>> > In the DESeq2 vignette we say: "the user provides the full formula >> >>> > (the formula stored in design(dds)), and a reduced formula, e.g. one >> >>> > which does not contain the variable of interest." >> >>> > >> >>> > For your filtering for the old version of DESeq, how did you chose >> the >> >>> > value of theta=0.2? Note that such filtering in DESeq2 is >> accomplished >> >>> > automatically by results(), and the function optimizes for the best >> >>> > value of theta (most adjusted p-values less than a threshold). So you >> >>> > do not need to pre-filter. >> >>> > >> >>> > Mike >> >>> > >> >>> > >> >>> > On Fri, May 16, 2014 at 1:42 AM, Catalina Aguilar Hurtado >> >>> > <catagui at="" gmail.com=""> wrote: >> >>> > > Hi I want to compare DESeq vs DESeq2 and I am getting different >> >>> number of >> >>> > > DEGs which I will expect to be normal. But when I compare the 149 >> >>> genes >> >>> > iD >> >>> > > that I get with DESeq with the 869 from DESeq2 there are only ~10 >> >>> genes >> >>> > > that are in common which I don\'92t understand (using FDR <0.05 >> for >> >>> > both). >> >>> > > I want to block the Subject effect for which I am including the >> >>> reduced >> >>> > > formula of ~1. >> >>> > > >> >>> > > Shouldn\'92t this two methods output similar results? Because at >> the >> >>> > > moment I could interpret my results with different ways.\ >> >>> > > \ >> >>> > > Thanks for your help,\ >> >>> > > \ >> >>> > > Catalina\ >> >>> > > \ >> >>> > > \ >> >>> > > This the DESeq script that I am using:\ >> >>> > > \ >> >>> > > \ >> >>> > > DESeq\ >> >>> > > \ >> >>> > > library(DESeq)\ >> >>> > > \ >> >>> > > co=as.matrix(read.table("2014_04_01_6h_LP.csv",header=T, sep=",", >> >>> > > row.names=1))\ >> >>> > > \ >> >>> > > \ >> >>> > > Subject=c(1,2,3,4,5,1,2,4,5)\ >> >>> > > \ >> >>> > > Treatment=c(rep("co",5),rep("c2",4))\ >> >>> > > a.con=cbind(Subject,Treatment)\ >> >>> > > \ >> >>> > > cds=newCountDataSet(co,a.con)\ >> >>> > > \ >> >>> > > \ >> >>> > > cds <- estimateSizeFactors( cds)\ >> >>> > > \ >> >>> > > cds <- estimateDispersions(cds,method="pooled-CR", >> >>> > > modelFormula=count~Subject+Treatment)\ >> >>> > > \ >> >>> > > \ >> >>> > > #filtering\ >> >>> > > \ >> >>> > > rs = rowSums ( counts ( cds ))\ >> >>> > > theta = 0.2\ >> >>> > > use = (rs > quantile(rs, probs=theta))\ >> >>> > > table(use)\ >> >>> > > cdsFilt= cds[ use, ]\ >> >>> > > \ >> >>> > > \ >> >>> > > \ >> >>> > > fit0 <- fitNbinomGLMs (cdsFilt, count~1)\ >> >>> > > \ >> >>> > > fit1 <- fitNbinomGLMs (cdsFilt, count~Treatment)\ >> >>> > > \ >> >>> > > pvals <- nbinomGLMTest (fit1, fit0)\ >> >>> > > \ >> >>> > > \ >> >>> > > padj <- p.adjust( pvals, method="BH" )\ >> >>> > > \ >> >>> > > padj <- data.frame(padj)\ >> >>> > > \ >> >>> > > row.names(padj)=row.names(cdsFilt)\ >> >>> > > \ >> >>> > > padj_fil <- subset (padj,padj <0.05 )\ >> >>> > > \ >> >>> > > dim (padj_fil)\ >> >>> > > \ >> >>> > > [1] 149 1\ >> >>> > > \ >> >>> > > \ >> >>> > > ------------ >> >>> > > \ >> >>> > > library ("DESeq2")\ >> >>> > > \ >> >>> > > countdata=as.matrix(read.table("2014_04_01_6h_LP.csv",header=T, >> >>> sep=",", >> >>> > > row.names=1))\ >> >>> > > \ >> >>> > > coldata= read.table ("targets.csv", header = T, >> sep=",",row.names=1)\ >> >>> > > \ >> >>> > > coldata\ >> >>> > > \ >> >>> > >>Subject Treatment\ >> >>> > >>F1 1 co\ >> >>> > >>F2 2 co\ >> >>> > >>F3 3 co\ >> >>> > >>F4 4 co\ >> >>> > >>F5 5 co\ >> >>> > >>H1 1 c2\ >> >>> > >>H2 2 c2\ >> >>> > >>H4 4 c2\ >> >>> > >>H5 5 c2\ >> >>> > > \ >> >>> > > dds <- DESeqDataSetFromMatrix(\ >> >>> > > countData = countdata,\ >> >>> > > colData = coldata,\ >> >>> > > design = ~ Subject + Treatment)\ >> >>> > > dds\ >> >>> > > \ >> >>> > > dds$Treatment <- relevel (dds$Treatment, "co")\ >> >>> > > \ >> >>> > > \ >> >>> > > dds <- estimateSizeFactors( dds)\ >> >>> > > \ >> >>> > > dds <- estimateDispersions(dds)\ >> >>> > > \ >> >>> > > \ >> >>> > > rs = rowSums ( counts ( dds ))\ >> >>> > > theta = 0.2\ >> >>> > > use = (rs > quantile(rs, probs=theta))\ >> >>> > > table(use)\ >> >>> > > ddsFilt= dds[ use, ]\ >> >>> > > \ >> >>> > > \ >> >>> > > dds <- nbinomLRT(ddsFilt, full = design(dds), reduced = ~ 1)\ >> >>> > > \ >> >>> > > resLRT <- results(dds)\ >> >>> > > \ >> >>> > > sum( resLRT$padj < 0.05, na.rm=TRUE )\ >> >>> > > \ >> >>> > > #[1] 869} >> >>> > > >> >>> > > [[alternative HTML version deleted]] >> >>> > > >> >>> > > _______________________________________________ >> >>> > > Bioconductor mailing list >> >>> > > Bioconductor at r-project.org >> >>> > > https://stat.ethz.ch/mailman/listinfo/bioconductor >> >>> > > Search the archives: >> >>> > http://news.gmane.org/gmane.science.biology.informatics.conductor >> >>> > >> >>> >> >>> >> >>> >> >>> -- >> >>> *Catalina Aguilar Hurtado* >> >>> >> >>> PhD Candidate >> >>> ARC Centre of Excellence for Coral Reef Studies >> >>> Department of Biochemistry and Molecular Biology >> >>> AIMS at JCU >> >>> James Cook University >> >>> Townsville, QLD 4811, Australia >> >>> ph: +61 7 4781 6009 >> >>> >> >>> [[alternative HTML version deleted]] >> >>> >> >>> _______________________________________________ >> >>> Bioconductor mailing list >> >>> Bioconductor at r-project.org >> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >>> Search the archives: >> >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >>> >> >> >> > >> > [[alternative HTML version deleted]] >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor at r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 10.6 years ago Michael Love 43k

0

Entering edit mode

Catalina Aguilar Hurtado ▴ 90

@catalina-aguilar-hurtado-6554

Last seen 4.1 years ago

United States

Hi Mike and Simon, Thanks for your help. You were right, I was lost and didn't understand the models properly. Additionally I realized that when calling colData(), my Subject was not defined as a factor and hence was getting 0 DEGs. Now I am getting 480 DEGs (FDR 0.1) when defining Subject as a factor and setting the reduced model properly: dds <- DESeqDataSetFromMatrix( countData = countdata, colData = coldata, design = ~ Subject + Treatment) dds dds$Treatment <- factor(dds$Treatment, levels=c("PBS", "LPS")) dds$Subject <- factor(dds$Subject) colData (dds) ddsLRT <- DESeq (dds, test ="LRT", reduced= ~Subject) Thanks a lot for you help, Catalina On Wed, May 21, 2014 at 11:36 PM, Michael Love <michaelisaiahlove@gmail.com>wrote: > hi Catalina, > > Simon addressed the question about models, but I want to follow up on > another point. > > On Tue, May 20, 2014 at 9:03 PM, Catalina Aguilar Hurtado > <catagui@gmail.com> wrote: > > > > Otherwise it would mean that my whole experiment didnât work and can not > > get any data out of it. > > > > in the sense that it was underpowered to detect changes from the 5 > control to the 4 treated samples at an FDR cutoff of 0.05. Note that > 1/20 is an arbitrary cutoff. You might also consider valuable to > create lists of DE genes with 1/5 false discoveries for follow up > study, although it might be the case that there are few genes at this > threshold as well. Furthermore, you can still use the experiment for > exploratory analysis. For instance, examining the MA plot will give > you insights into how many genes had very high or low fold changes. > You can rank the results table by absolute value of log2FoldChange, > and prioritize these genes for follow up study. > > Mike > > > > Thanks again, > > > > Catalina > > > > > > On Tue, May 20, 2014 at 10:12 PM, Michael Love > > <michaelisaiahlove@gmail.com>wrote: > > > >> hi Catalina, > >> > >> > >> On Mon, May 19, 2014 at 8:38 PM, Catalina Aguilar Hurtado > >> <catagui@gmail.com> wrote: > >> > Hi Mike, > >> > > >> > Not sure what you mean by the intersection. > >> > >> I meant those genes which had padj < cutoff for both DESeq and DESeq2. > >> > >> You can calculate this with > >> > >> length(intersect(padj1rownames, padj2rownames)) > >> > >> It's safest to work with the rownames, to make sure you are referring > >> to the same genes. > >> > >> You can get this vector, e.g. in DESeq2 with: > >> > >> padj2rownames <- rownames(results[which(results$padj < cutoff),]) > >> > >> > I have run both without any > >> > filter and reduced formula of ~Subject. As you can see below, there is > >> > almost no DEGs and the only way I could get was when ~1. > >> > > >> > >> This means that, without independent filtering, for most genes, the > >> treatment effect was too small to observe a significant difference for > >> the experiment's sample size and/or sequencing depth. However, you > >> should use independent filtering to report as many genes as possible, > >> and I'd recommend using DESeq2 which will do this automatically within > >> the results() function. > >> > >> The reason I asked for you to not use independent filtering in both is > >> to see if in fact there were many genes called by either version of > >> DESeq but not by both, when the settings were made comparable. > >> > >> Note that the ~ 1 reduced design is not what you want: > >> > >> OK: "we tested for the treatment effect, controlling for the subject > >> effect" > >> full: ~ subject + treament > >> reduced: ~ subject > >> > >> > >> not ok: "we tested for the treatment and subject effect" > >> full: ~ subject + treament > >> reduced: ~ 1 > >> > >> > >> Mike > >> > >> > >> > DESeq2: > >> > > >> >> dds <- nbinomLRT(dds, full = design(dds), reduced = ~ Subject) > >> >> resLRT <- results(dds, independentFiltering=FALSE) > >> > > >> >> sum( resLRT$padj < 0.5, na.rm=TRUE ) > >> > [1] 6 > >> > > >> > > >> > DESeq: > >> > > >> > > >> >>fit0 <- fitNbinomGLMs (cds, count~Subject) > >> > > >> >>fit1 <- fitNbinomGLMs (cds, count~Treatment) > >> > > >> >>pvals <- nbinomGLMTest (fit1, fit0) > >> > > >> >> padj <- p.adjust( pvals, method="BH" ) > >> >> padj <- data.frame(padj) > >> > > >> >> row.names(padj)=row.names(cds) > >> >> padj_fil <- subset (padj,padj <0.05 ) > >> >> dim (padj_fil) > >> > [1] 0 1 > >> > > >> > Thanks, > >> > > >> > Cata > >> > > >> > On Mon, May 19, 2014 at 10:31 PM, Michael Love > >> > <michaelisaiahlove@gmail.com>wrote: > >> > > >> >> Hi Catalina, > >> >> > >> >> What are the numbers exactly and the intersection for using a reduced > >> >> design of ~ subject for both, and not filtering in either case > >> >> (independentFiltering = FALSE)? > >> >> > >> >> Mike > >> >> On May 18, 2014 10:34 PM, "Catalina Aguilar Hurtado" < > catagui@gmail.com > >> > > >> >> wrote: > >> >> > >> >>> Hi Mike, > >> >>> > >> >>> Thanks for your reply. > >> >>> > >> >>> I had read the manual and did try the reduced design ~Subject and > got 5 > >> >>> DEGs. I tried with both DESeq and DESeq2. > >> >>> > >> >>> My data has a stronger effect of the subject compare to the > treatment, > >> if > >> >>> I > >> >>> do a heatmap the samples group by subject rather than by treatment. > >> >>> > >> >>> The filtering theta value was choose based on a plot of "Number of > >> DEGs vs > >> >>> theta" when using filtered_R in genefilter. Thanks for pointing out > >> that > >> >>> results() was doing automatically. I have just run it again without > >> >>> pre-filter and got 812 DEGs. > >> >>> > >> >>> Still my concern is why the DEGs that I get in DESeq disappear in > the > >> >>> DESeq2 results? > >> >>> > >> >>> Thanks again, > >> >>> > >> >>> Catalina > >> >>> > >> >>> > >> >>> On Fri, May 16, 2014 at 10:35 PM, Michael Love > >> >>> <michaelisaiahlove@gmail.com>wrote: > >> >>> > >> >>> > hi Catalina, > >> >>> > > >> >>> > If you want to test for the treatment effect and control the > subject > >> >>> > effect you should use the reduced design of ~ subject. > >> >>> > > >> >>> > In the DESeq2 vignette we say: "the user provides the full formula > >> >>> > (the formula stored in design(dds)), and a reduced formula, e.g. > one > >> >>> > which does not contain the variable of interest." > >> >>> > > >> >>> > For your filtering for the old version of DESeq, how did you chose > >> the > >> >>> > value of theta=0.2? Note that such filtering in DESeq2 is > >> accomplished > >> >>> > automatically by results(), and the function optimizes for the > best > >> >>> > value of theta (most adjusted p-values less than a threshold). So > you > >> >>> > do not need to pre-filter. > >> >>> > > >> >>> > Mike > >> >>> > > >> >>> > > >> >>> > On Fri, May 16, 2014 at 1:42 AM, Catalina Aguilar Hurtado > >> >>> > <catagui@gmail.com> wrote: > >> >>> > > Hi I want to compare DESeq vs DESeq2 and I am getting different > >> >>> number of > >> >>> > > DEGs which I will expect to be normal. But when I compare the > 149 > >> >>> genes > >> >>> > iD > >> >>> > > that I get with DESeq with the 869 from DESeq2 there are only > ~10 > >> >>> genes > >> >>> > > that are in common which I don\'92t understand (using FDR <0.05 > >> for > >> >>> > both). > >> >>> > > I want to block the Subject effect for which I am including the > >> >>> reduced > >> >>> > > formula of ~1. > >> >>> > > > >> >>> > > Shouldn\'92t this two methods output similar results? Because > at > >> the > >> >>> > > moment I could interpret my results with different ways.\ > >> >>> > > \ > >> >>> > > Thanks for your help,\ > >> >>> > > \ > >> >>> > > Catalina\ > >> >>> > > \ > >> >>> > > \ > >> >>> > > This the DESeq script that I am using:\ > >> >>> > > \ > >> >>> > > \ > >> >>> > > DESeq\ > >> >>> > > \ > >> >>> > > library(DESeq)\ > >> >>> > > \ > >> >>> > > co=as.matrix(read.table("2014_04_01_6h_LP.csv",header=T, > sep=",", > >> >>> > > row.names=1))\ > >> >>> > > \ > >> >>> > > \ > >> >>> > > Subject=c(1,2,3,4,5,1,2,4,5)\ > >> >>> > > \ > >> >>> > > Treatment=c(rep("co",5),rep("c2",4))\ > >> >>> > > a.con=cbind(Subject,Treatment)\ > >> >>> > > \ > >> >>> > > cds=newCountDataSet(co,a.con)\ > >> >>> > > \ > >> >>> > > \ > >> >>> > > cds <- estimateSizeFactors( cds)\ > >> >>> > > \ > >> >>> > > cds <- estimateDispersions(cds,method="pooled-CR", > >> >>> > > modelFormula=count~Subject+Treatment)\ > >> >>> > > \ > >> >>> > > \ > >> >>> > > #filtering\ > >> >>> > > \ > >> >>> > > rs = rowSums ( counts ( cds ))\ > >> >>> > > theta = 0.2\ > >> >>> > > use = (rs > quantile(rs, probs=theta))\ > >> >>> > > table(use)\ > >> >>> > > cdsFilt= cds[ use, ]\ > >> >>> > > \ > >> >>> > > \ > >> >>> > > \ > >> >>> > > fit0 <- fitNbinomGLMs (cdsFilt, count~1)\ > >> >>> > > \ > >> >>> > > fit1 <- fitNbinomGLMs (cdsFilt, count~Treatment)\ > >> >>> > > \ > >> >>> > > pvals <- nbinomGLMTest (fit1, fit0)\ > >> >>> > > \ > >> >>> > > \ > >> >>> > > padj <- p.adjust( pvals, method="BH" )\ > >> >>> > > \ > >> >>> > > padj <- data.frame(padj)\ > >> >>> > > \ > >> >>> > > row.names(padj)=row.names(cdsFilt)\ > >> >>> > > \ > >> >>> > > padj_fil <- subset (padj,padj <0.05 )\ > >> >>> > > \ > >> >>> > > dim (padj_fil)\ > >> >>> > > \ > >> >>> > > [1] 149 1\ > >> >>> > > \ > >> >>> > > \ > >> >>> > > ------------ > >> >>> > > \ > >> >>> > > library ("DESeq2")\ > >> >>> > > \ > >> >>> > > countdata=as.matrix(read.table("2014_04_01_6h_LP.csv",header=T, > >> >>> sep=",", > >> >>> > > row.names=1))\ > >> >>> > > \ > >> >>> > > coldata= read.table ("targets.csv", header = T, > >> sep=",",row.names=1)\ > >> >>> > > \ > >> >>> > > coldata\ > >> >>> > > \ > >> >>> > >>Subject Treatment\ > >> >>> > >>F1 1 co\ > >> >>> > >>F2 2 co\ > >> >>> > >>F3 3 co\ > >> >>> > >>F4 4 co\ > >> >>> > >>F5 5 co\ > >> >>> > >>H1 1 c2\ > >> >>> > >>H2 2 c2\ > >> >>> > >>H4 4 c2\ > >> >>> > >>H5 5 c2\ > >> >>> > > \ > >> >>> > > dds <- DESeqDataSetFromMatrix(\ > >> >>> > > countData = countdata,\ > >> >>> > > colData = coldata,\ > >> >>> > > design = ~ Subject + Treatment)\ > >> >>> > > dds\ > >> >>> > > \ > >> >>> > > dds$Treatment <- relevel (dds$Treatment, "co")\ > >> >>> > > \ > >> >>> > > \ > >> >>> > > dds <- estimateSizeFactors( dds)\ > >> >>> > > \ > >> >>> > > dds <- estimateDispersions(dds)\ > >> >>> > > \ > >> >>> > > \ > >> >>> > > rs = rowSums ( counts ( dds ))\ > >> >>> > > theta = 0.2\ > >> >>> > > use = (rs > quantile(rs, probs=theta))\ > >> >>> > > table(use)\ > >> >>> > > ddsFilt= dds[ use, ]\ > >> >>> > > \ > >> >>> > > \ > >> >>> > > dds <- nbinomLRT(ddsFilt, full = design(dds), reduced = ~ 1)\ > >> >>> > > \ > >> >>> > > resLRT <- results(dds)\ > >> >>> > > \ > >> >>> > > sum( resLRT$padj < 0.05, na.rm=TRUE )\ > >> >>> > > \ > >> >>> > > #[1] 869} > >> >>> > > > >> >>> > > [[alternative HTML version deleted]] > >> >>> > > > >> >>> > > _______________________________________________ > >> >>> > > Bioconductor mailing list > >> >>> > > Bioconductor@r-project.org > >> >>> > > https://stat.ethz.ch/mailman/listinfo/bioconductor > >> >>> > > Search the archives: > >> >>> > http://news.gmane.org/gmane.science.biology.informatics.conductor > >> >>> > > >> >>> > >> >>> > >> >>> > >> >>> -- > >> >>> *Catalina Aguilar Hurtado* > >> >>> > >> >>> PhD Candidate > >> >>> ARC Centre of Excellence for Coral Reef Studies > >> >>> Department of Biochemistry and Molecular Biology > >> >>> AIMS@JCU > >> >>> James Cook University > >> >>> Townsville, QLD 4811, Australia > >> >>> ph: +61 7 4781 6009 > >> >>> > >> >>> [[alternative HTML version deleted]] > >> >>> > >> >>> _______________________________________________ > >> >>> Bioconductor mailing list > >> >>> Bioconductor@r-project.org > >> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> >>> Search the archives: > >> >>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> >>> > >> >> > >> > > >> > [[alternative HTML version deleted]] > >> > > >> > _______________________________________________ > >> > Bioconductor mailing list > >> > Bioconductor@r-project.org > >> > https://stat.ethz.ch/mailman/listinfo/bioconductor > >> > Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > [[alternative HTML version deleted]] > > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 10.6 years ago Catalina Aguilar Hurtado ▴ 90

Login before adding your answer.