edgeR - paired samples with multifactorial design

edgeR - paired samples with multifactorial design - errors

1

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 10.6 years ago

Hi All, I had been trying to do DE analysis of my RNAseq experiment using edgeR and am having some isssues. The details of the Experiment and the R code I tried below: (a) Paired experimental design with 45 pairs (b) Treatment: "Before" and "After" (c) Phenotype: 1 & 2 Aim: Look for DE genes between Phenotype 1 and 2 upon treatment taking into account the paired design The R code tried: library(edgeR) counts<-read.delim(file="counts.dat",header=T) pair=factor(pdata$pair) Treat=factor( pdata$treat) Phenotype=factor(pdata$pheno) group<-paste(Treat,Phenotype,sep=".") design=model.matrix(~pair+Treat:Phenotype, data=counts) counts.DGEList<-DGEList(counts, group=group) y<-calcNormFactors(counts.DGEList) y<-estimateCommonDisp(y, design) y<-estimateGLMTrendedDisp(y, design) Error message I get: Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset, : Design matrix not of full rank. The following coefficients not estimable: TreatBefore:Phenotype1 TreatBefore:Phenotype2 Any idea to solve this out? Thanks, Preethy -- output of sessionInfo(): R version 3.1.0 (2014-04-10) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=fi_FI.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_3.4.2 limma_3.18.13 -- Sent via the guest posting facility at bioconductor.org.

RNASeq RNASeq • 2.5k views

ADD COMMENT • link updated 11.0 years ago by Devon Ryan ▴ 200 • written 11.0 years ago by Guest User ★ 13k

0

Entering edit mode

Devon Ryan ▴ 200

@devon-ryan-6054

Last seen 9.3 years ago

Germany

Hi Preethy, You likely want: design=model.matrix(~pair+Treat:Phenotype, data=pdata) If that still yields the error, then you'll need to share "pdata" or "design". Also, please don't crosspost on both this list and biostars ( https://www.biostars.org/p/98907/), it duplicates the community effort. Devon -- Devon Ryan, Ph.D. Email: dpryan@dpryan.com Laboratory for Molecular and Cellular Cognition German Centre for Neurodegenerative Diseases (DZNE) Ludwig-Erhard-Allee 2 53175 Bonn Germany <devon.ryan@dzne.de> On Fri, Apr 25, 2014 at 11:15 AM, Preethy [guest] <guest@bioconductor.org>wrote: > > Hi All, > > I had been trying to do DE analysis of my RNAseq experiment using edgeR > and am having some isssues. The details of the Experiment and the R code I > tried below: > > (a) Paired experimental design with 45 pairs > (b) Treatment: "Before" and "After" > (c) Phenotype: 1 & 2 > Aim: Look for DE genes between Phenotype 1 and 2 upon treatment taking > into account the paired design > > The R code tried: > > library(edgeR) > counts<-read.delim(file="counts.dat",header=T) > pair=factor(pdata$pair) > Treat=factor( pdata$treat) > Phenotype=factor(pdata$pheno) > group<-paste(Treat,Phenotype,sep=".") > design=model.matrix(~pair+Treat:Phenotype, data=counts) > counts.DGEList<-DGEList(counts, group=group) > y<-calcNormFactors(counts.DGEList) > y<-estimateCommonDisp(y, design) > y<-estimateGLMTrendedDisp(y, design) > > > Error message I get: > > Error in glmFit.default(y, design = design, dispersion = dispersion, > offset = offset, : > Design matrix not of full rank. The following coefficients not > estimable: > TreatBefore:Phenotype1 TreatBefore:Phenotype2 > > > Any idea to solve this out? > > Thanks, > Preethy > > -- output of sessionInfo(): > > R version 3.1.0 (2014-04-10) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=fi_FI.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > [5] LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=en_GB.UTF-8 > [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_3.4.2 limma_3.18.13 > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 11.0 years ago Devon Ryan ▴ 200

0

Entering edit mode

Hi Devon, Thanks for the replies both on biostars and here. Sorry for crossposting. Both mailing lists and discussion have been very helpful to me. But, I rarely see replies from BioC package maintainers at biostars. All the samples are paired and I have them all correct - I mean none of them are empty I tried different designs. But ending up with the same error message. What I want to do: Getting DE genes between (Phenotype1.Before-Phenotype1.After) & (Phenotype2.Before- Phenotype2.After) pdata here: pair=rep(c(1:45), each=2) Treat=rep(c("Before", "After"),45) Phenotype=rep(c(1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 2), each=2) pdata<-data.frame(pair,Treat,Phenotype) Preethy On Fri, Apr 25, 2014 at 4:53 PM, Devon Ryan <dpryan@dpryan.com> wrote: > Hi Preethy, > > You likely want: > > design=model.matrix(~pair+Treat:Phenotype, data=pdata) > > If that still yields the error, then you'll need to share "pdata" or > "design". Also, please don't crosspost on both this list and biostars ( > https://www.biostars.org/p/98907/), it duplicates the community effort. > > Devon > > > -- > Devon Ryan, Ph.D. > Email: dpryan@dpryan.com > Laboratory for Molecular and Cellular Cognition > German Centre for Neurodegenerative Diseases (DZNE) > Ludwig-Erhard-Allee 2 > 53175 Bonn > Germany > <devon.ryan@dzne.de> > > > On Fri, Apr 25, 2014 at 11:15 AM, Preethy [guest] <guest@bioconductor.org>wrote: > >> >> Hi All, >> >> I had been trying to do DE analysis of my RNAseq experiment using edgeR >> and am having some isssues. The details of the Experiment and the R code I >> tried below: >> >> (a) Paired experimental design with 45 pairs >> (b) Treatment: "Before" and "After" >> (c) Phenotype: 1 & 2 >> Aim: Look for DE genes between Phenotype 1 and 2 upon treatment taking >> into account the paired design >> >> The R code tried: >> >> library(edgeR) >> counts<-read.delim(file="counts.dat",header=T) >> pair=factor(pdata$pair) >> Treat=factor( pdata$treat) >> Phenotype=factor(pdata$pheno) >> group<-paste(Treat,Phenotype,sep=".") >> design=model.matrix(~pair+Treat:Phenotype, data=counts) >> counts.DGEList<-DGEList(counts, group=group) >> y<-calcNormFactors(counts.DGEList) >> y<-estimateCommonDisp(y, design) >> y<-estimateGLMTrendedDisp(y, design) >> >> >> Error message I get: >> >> Error in glmFit.default(y, design = design, dispersion = dispersion, >> offset = offset, : >> Design matrix not of full rank. The following coefficients not >> estimable: >> TreatBefore:Phenotype1 TreatBefore:Phenotype2 >> >> >> Any idea to solve this out? >> >> Thanks, >> Preethy >> >> -- output of sessionInfo(): >> >> R version 3.1.0 (2014-04-10) >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=fi_FI.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 >> [5] LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=en_GB.UTF-8 >> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] edgeR_3.4.2 limma_3.18.13 >> >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]

ADD REPLY • link 11.0 years ago Preethy Venkat Ram ▴ 30

0

Entering edit mode

Hi Preethy, This experiment is very similar to the example in part 3.5 of the edgeR User's guide, starting on page 31. Best, Jim On 4/25/2014 11:29 AM, Preethy Venkat Ram wrote: > Hi Devon, > > Thanks for the replies both on biostars and here. Sorry for > crossposting. Both > mailing lists and discussion have been very helpful to me. > > But, I rarely see replies from BioC package maintainers at biostars. > > All the samples are paired and I have them all correct - I mean none of > them are empty > I tried different designs. But ending up with the same error message. > What I want to do: Getting DE genes between > (Phenotype1.Before-Phenotype1.After) & (Phenotype2.Before- Phenotype2.After) > > pdata here: > > pair=rep(c(1:45), each=2) > Treat=rep(c("Before", "After"),45) > Phenotype=rep(c(1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, 2, 2, 2, 2, > 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, > 2), each=2) > pdata<-data.frame(pair,Treat,Phenotype) > > > Preethy > > > > On Fri, Apr 25, 2014 at 4:53 PM, Devon Ryan <dpryan at="" dpryan.com=""> wrote: > >> Hi Preethy, >> >> You likely want: >> >> design=model.matrix(~pair+Treat:Phenotype, data=pdata) >> >> If that still yields the error, then you'll need to share "pdata" or >> "design". Also, please don't crosspost on both this list and biostars ( >> https://www.biostars.org/p/98907/), it duplicates the community effort. >> >> Devon >> >> >> -- >> Devon Ryan, Ph.D. >> Email: dpryan at dpryan.com >> Laboratory for Molecular and Cellular Cognition >> German Centre for Neurodegenerative Diseases (DZNE) >> Ludwig-Erhard-Allee 2 >> 53175 Bonn >> Germany >> <devon.ryan at="" dzne.de=""> >> >> >> On Fri, Apr 25, 2014 at 11:15 AM, Preethy [guest] <guest at="" bioconductor.org="">wrote: >> >>> Hi All, >>> >>> I had been trying to do DE analysis of my RNAseq experiment using edgeR >>> and am having some isssues. The details of the Experiment and the R code I >>> tried below: >>> >>> (a) Paired experimental design with 45 pairs >>> (b) Treatment: "Before" and "After" >>> (c) Phenotype: 1 & 2 >>> Aim: Look for DE genes between Phenotype 1 and 2 upon treatment taking >>> into account the paired design >>> >>> The R code tried: >>> >>> library(edgeR) >>> counts<-read.delim(file="counts.dat",header=T) >>> pair=factor(pdata$pair) >>> Treat=factor( pdata$treat) >>> Phenotype=factor(pdata$pheno) >>> group<-paste(Treat,Phenotype,sep=".") >>> design=model.matrix(~pair+Treat:Phenotype, data=counts) >>> counts.DGEList<-DGEList(counts, group=group) >>> y<-calcNormFactors(counts.DGEList) >>> y<-estimateCommonDisp(y, design) >>> y<-estimateGLMTrendedDisp(y, design) >>> >>> >>> Error message I get: >>> >>> Error in glmFit.default(y, design = design, dispersion = dispersion, >>> offset = offset, : >>> Design matrix not of full rank. The following coefficients not >>> estimable: >>> TreatBefore:Phenotype1 TreatBefore:Phenotype2 >>> >>> >>> Any idea to solve this out? >>> >>> Thanks, >>> Preethy >>> >>> -- output of sessionInfo(): >>> >>> R version 3.1.0 (2014-04-10) >>> Platform: x86_64-pc-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=fi_FI.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 >>> [5] LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=en_GB.UTF-8 >>> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] edgeR_3.4.2 limma_3.18.13 >>> >>> >>> -- >>> Sent via the guest posting facility at bioconductor.org. >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 11.0 years ago James W. MacDonald 68k

0

Entering edit mode

Hi Jim, Thanks for the reply. I've tried that - with a slighlty modified code. I am sorry. But, I'm getting error again. Here: pair=factor(rep(c(1:45), each=2)).Treat=factor( rep(c("Before", "After"),45), levels=c("Before", "After")) Phenotype=factor(rep(c(1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 2), each=2),levels=c("1","2")) design=model.matrix(~Phenotype+Phenotype:pair+Phenotype:Treat) colnames(design) counts.DGEList<-DGEList(counts, group=Treat) y<-calcNormFactors(counts.DGEList) y<-estimateCommonDisp(y, design) y<-estimateGLMTrendedDisp(y, design) The error message here: > y<-estimateGLMTrendedDisp(y, design) Error in return(NA, ntags) : multi-argument returns are not permitted In addition: Warning message: In estimateGLMTrendedDisp.default(y = y$counts, design = design, : No residual df: cannot estimate dispersion One reason I can see is that the number of columns in my design matrix is "92" as they are more replicates/patients than in the 3.5 example. And there are only "90" columns in my "count" matrix. But, do you have any idea how can I solve this ? Thanks, Preethy On Fri, Apr 25, 2014 at 6:45 PM, James W. MacDonald <jmacdon@uw.edu> wrote: > Hi Preethy, > > This experiment is very similar to the example in part 3.5 of the edgeR > User's guide, starting on page 31. > > Best, > > Jim > > > > On 4/25/2014 11:29 AM, Preethy Venkat Ram wrote: > >> Hi Devon, >> >> Thanks for the replies both on biostars and here. Sorry for >> crossposting. Both >> mailing lists and discussion have been very helpful to me. >> >> But, I rarely see replies from BioC package maintainers at biostars. >> >> All the samples are paired and I have them all correct - I mean none of >> them are empty >> I tried different designs. But ending up with the same error message. >> What I want to do: Getting DE genes between >> (Phenotype1.Before-Phenotype1.After) & (Phenotype2.Before- Phenotype2. >> After) >> >> pdata here: >> >> pair=rep(c(1:45), each=2) >> Treat=rep(c("Before", "After"),45) >> Phenotype=rep(c(1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, 2, 2, 2, 2, >> 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 1, >> 2, >> 2), each=2) >> pdata<-data.frame(pair,Treat,Phenotype) >> >> >> Preethy >> >> >> >> On Fri, Apr 25, 2014 at 4:53 PM, Devon Ryan <dpryan@dpryan.com> wrote: >> >> Hi Preethy, >>> >>> You likely want: >>> >>> design=model.matrix(~pair+Treat:Phenotype, data=pdata) >>> >>> If that still yields the error, then you'll need to share "pdata" or >>> "design". Also, please don't crosspost on both this list and biostars ( >>> https://www.biostars.org/p/98907/), it duplicates the community effort. >>> >>> Devon >>> >>> >>> -- >>> Devon Ryan, Ph.D. >>> Email: dpryan@dpryan.com >>> Laboratory for Molecular and Cellular Cognition >>> German Centre for Neurodegenerative Diseases (DZNE) >>> Ludwig-Erhard-Allee 2 >>> 53175 Bonn >>> Germany >>> <devon.ryan@dzne.de> >>> >>> >>> >>> On Fri, Apr 25, 2014 at 11:15 AM, Preethy [guest] < >>> guest@bioconductor.org>wrote: >>> >>> Hi All, >>>> >>>> I had been trying to do DE analysis of my RNAseq experiment using edgeR >>>> and am having some isssues. The details of the Experiment and the R >>>> code I >>>> tried below: >>>> >>>> (a) Paired experimental design with 45 pairs >>>> (b) Treatment: "Before" and "After" >>>> (c) Phenotype: 1 & 2 >>>> Aim: Look for DE genes between Phenotype 1 and 2 upon treatment taking >>>> into account the paired design >>>> >>>> The R code tried: >>>> >>>> library(edgeR) >>>> counts<-read.delim(file="counts.dat",header=T) >>>> pair=factor(pdata$pair) >>>> Treat=factor( pdata$treat) >>>> Phenotype=factor(pdata$pheno) >>>> group<-paste(Treat,Phenotype,sep=".") >>>> design=model.matrix(~pair+Treat:Phenotype, data=counts) >>>> counts.DGEList<-DGEList(counts, group=group) >>>> y<-calcNormFactors(counts.DGEList) >>>> y<-estimateCommonDisp(y, design) >>>> y<-estimateGLMTrendedDisp(y, design) >>>> >>>> >>>> Error message I get: >>>> >>>> Error in glmFit.default(y, design = design, dispersion = dispersion, >>>> offset = offset, : >>>> Design matrix not of full rank. The following coefficients not >>>> estimable: >>>> TreatBefore:Phenotype1 TreatBefore:Phenotype2 >>>> >>>> >>>> Any idea to solve this out? >>>> >>>> Thanks, >>>> Preethy >>>> >>>> -- output of sessionInfo(): >>>> >>>> R version 3.1.0 (2014-04-10) >>>> Platform: x86_64-pc-linux-gnu (64-bit) >>>> >>>> locale: >>>> [1] LC_CTYPE=fi_FI.UTF-8 LC_NUMERIC=C >>>> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 >>>> [5] LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=en_GB.UTF-8 >>>> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C >>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] edgeR_3.4.2 limma_3.18.13 >>>> >>>> >>>> -- >>>> Sent via the guest posting facility at bioconductor.org. >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>> [[alternative HTML version deleted]] >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane. >> science.biology.informatics.conductor >> > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > [[alternative HTML version deleted]]

ADD REPLY • link 11.0 years ago Preethy Venkat Ram ▴ 30

0

Entering edit mode

Hi Preethy, You need to re-read that section of edgeR, and in particular look at the patient column of the revised targets frame. Best, Jim On 4/25/2014 12:46 PM, Preethy Venkat Ram wrote: > Hi Jim, > > Thanks for the reply. I've tried that - with a slighlty modified > code. I am sorry. But, I'm getting error again. > > Here: > > pair=factor(rep(c(1:45), each=2)).Treat=factor( rep(c("Before", > "After"),45), levels=c("Before", "After")) > Phenotype=factor(rep(c(1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, > 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, > 2, 1, 2, 2, 1, 2, 2), each=2),levels=c("1","2")) > design=model.matrix(~Phenotype+Phenotype:pair+Phenotype:Treat) > colnames(design) > counts.DGEList<-DGEList(counts, group=Treat) > y<-calcNormFactors(counts.DGEList) > y<-estimateCommonDisp(y, design) > y<-estimateGLMTrendedDisp(y, design) > > > The error message here: > > > y<-estimateGLMTrendedDisp(y, design) > Error in return(NA, ntags) : multi-argument returns are not permitted > In addition: Warning message: > In estimateGLMTrendedDisp.default(y = y$counts, design = design, : > No residual df: cannot estimate dispersion > > > One reason I can see is that the number of columns in my design matrix > is "92" as they are more replicates/patients than in the 3.5 > example. And there are only "90" columns in my "count" matrix. > > But, do you have any idea how can I solve this ? > > Thanks, > Preethy > > > > > > On Fri, Apr 25, 2014 at 6:45 PM, James W. MacDonald <jmacdon at="" uw.edu=""> <mailto:jmacdon at="" uw.edu="">> wrote: > > Hi Preethy, > > This experiment is very similar to the example in part 3.5 of the > edgeR User's guide, starting on page 31. > > Best, > > Jim > > > > On 4/25/2014 11:29 AM, Preethy Venkat Ram wrote: > > Hi Devon, > > Thanks for the replies both on biostars and here. Sorry for > crossposting. Both > mailing lists and discussion have been very helpful to me. > > But, I rarely see replies from BioC package maintainers at > biostars. > > All the samples are paired and I have them all correct - I > mean none of > them are empty > I tried different designs. But ending up with the same error > message. > What I want to do: Getting DE genes between > (Phenotype1.Before-Phenotype1.After) & > (Phenotype2.Before-Phenotype2.After) > > pdata here: > > pair=rep(c(1:45), each=2) > Treat=rep(c("Before", "After"),45) > Phenotype=rep(c(1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, > 2, 2, 2, 2, > 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, > 1, 2, 2, 1, 2, > 2), each=2) > pdata<-data.frame(pair,Treat,Phenotype) > > > Preethy > > > > On Fri, Apr 25, 2014 at 4:53 PM, Devon Ryan <dpryan at="" dpryan.com=""> <mailto:dpryan at="" dpryan.com="">> wrote: > > Hi Preethy, > > You likely want: > > design=model.matrix(~pair+Treat:Phenotype, data=pdata) > > If that still yields the error, then you'll need to share > "pdata" or > "design". Also, please don't crosspost on both this list > and biostars ( > https://www.biostars.org/p/98907/), it duplicates the > community effort. > > Devon > > > -- > Devon Ryan, Ph.D. > Email: dpryan at dpryan.com <mailto:dpryan at="" dpryan.com=""> > Laboratory for Molecular and Cellular Cognition > German Centre for Neurodegenerative Diseases (DZNE) > Ludwig-Erhard-Allee 2 > 53175 Bonn > Germany > <devon.ryan at="" dzne.de="" <mailto:devon.ryan="" at="" dzne.de="">> > > > > On Fri, Apr 25, 2014 at 11:15 AM, Preethy [guest] > <guest at="" bioconductor.org="" <mailto:guest="" at="" bioconductor.org="">>wrote: > > Hi All, > > I had been trying to do DE analysis of my RNAseq > experiment using edgeR > and am having some isssues. The details of the > Experiment and the R code I > tried below: > > (a) Paired experimental design with 45 pairs > (b) Treatment: "Before" and "After" > (c) Phenotype: 1 & 2 > Aim: Look for DE genes between Phenotype 1 and 2 upon > treatment taking > into account the paired design > > The R code tried: > > library(edgeR) > counts<-read.delim(file="counts.dat",header=T) > pair=factor(pdata$pair) > Treat=factor( pdata$treat) > Phenotype=factor(pdata$pheno) > group<-paste(Treat,Phenotype,sep=".") > design=model.matrix(~pair+Treat:Phenotype, data=counts) > counts.DGEList<-DGEList(counts, group=group) > y<-calcNormFactors(counts.DGEList) > y<-estimateCommonDisp(y, design) > y<-estimateGLMTrendedDisp(y, design) > > > Error message I get: > > Error in glmFit.default(y, design = design, dispersion > = dispersion, > offset = offset, : > Design matrix not of full rank. The following > coefficients not > estimable: > TreatBefore:Phenotype1 TreatBefore:Phenotype2 > > > Any idea to solve this out? > > Thanks, > Preethy > > -- output of sessionInfo(): > > R version 3.1.0 (2014-04-10) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=fi_FI.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > [5] LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=en_GB.UTF-8 > [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] edgeR_3.4.2 limma_3.18.13 > > > -- > Sent via the guest posting facility at > bioconductor.org <http: bioconductor.org="">. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 11.0 years ago James W. MacDonald 68k

0

Entering edit mode

Hi Jim and All, Thanks really a lot for bringing that to my attention. It didn't work this way as well as I had different number of samples for each Phenotype unlike the edgeR example which had 3 patients for each "Disease" category. But, I took off the empty columns from the design matrix and it worked. Is the analysis done correctly this way ? I have one more doubt - Would it be possible to account for batch effect too with this type of design matrix ? Like this: design=model.matrix(~Phenotype+Phenotype:pair+ Phenotype:Treat+Phenotype:batch) Thanks, Preethy On Fri, Apr 25, 2014 at 8:00 PM, James W. MacDonald <jmacdon@uw.edu> wrote: > Hi Preethy, > > You need to re-read that section of edgeR, and in particular look at the > patient column of the revised targets frame. > > Best, > > Jim > > > > On 4/25/2014 12:46 PM, Preethy Venkat Ram wrote: > >> Hi Jim, >> >> Thanks for the reply. I've tried that - with a slighlty modified code. I >> am sorry. But, I'm getting error again. >> >> Here: >> >> pair=factor(rep(c(1:45), each=2)).Treat=factor( rep(c("Before", >> "After"),45), levels=c("Before", "After")) >> Phenotype=factor(rep(c(1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, 2, >> 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, >> 2, 1, 2, 2), each=2),levels=c("1","2")) >> design=model.matrix(~Phenotype+Phenotype:pair+Phenotype:Treat) >> colnames(design) >> counts.DGEList<-DGEList(counts, group=Treat) >> y<-calcNormFactors(counts.DGEList) >> y<-estimateCommonDisp(y, design) >> y<-estimateGLMTrendedDisp(y, design) >> >> >> The error message here: >> >> > y<-estimateGLMTrendedDisp(y, design) >> Error in return(NA, ntags) : multi-argument returns are not permitted >> In addition: Warning message: >> In estimateGLMTrendedDisp.default(y = y$counts, design = design, : >> No residual df: cannot estimate dispersion >> >> >> One reason I can see is that the number of columns in my design matrix >> is "92" as they are more replicates/patients than in the 3.5 example. And >> there are only "90" columns in my "count" matrix. >> >> But, do you have any idea how can I solve this ? >> >> Thanks, >> Preethy >> >> >> >> >> >> On Fri, Apr 25, 2014 at 6:45 PM, James W. MacDonald <jmacdon@uw.edu<mailto:>> jmacdon@uw.edu>> wrote: >> >> Hi Preethy, >> >> This experiment is very similar to the example in part 3.5 of the >> edgeR User's guide, starting on page 31. >> >> Best, >> >> Jim >> >> >> >> On 4/25/2014 11:29 AM, Preethy Venkat Ram wrote: >> >> Hi Devon, >> >> Thanks for the replies both on biostars and here. Sorry for >> crossposting. Both >> mailing lists and discussion have been very helpful to me. >> >> But, I rarely see replies from BioC package maintainers at >> biostars. >> >> All the samples are paired and I have them all correct - I >> mean none of >> them are empty >> I tried different designs. But ending up with the same error >> message. >> What I want to do: Getting DE genes between >> (Phenotype1.Before-Phenotype1.After) & >> (Phenotype2.Before-Phenotype2.After) >> >> pdata here: >> >> pair=rep(c(1:45), each=2) >> Treat=rep(c("Before", "After"),45) >> Phenotype=rep(c(1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, >> 2, 2, 2, 2, >> 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, >> 1, 2, 2, 1, 2, >> 2), each=2) >> pdata<-data.frame(pair,Treat,Phenotype) >> >> >> Preethy >> >> >> >> On Fri, Apr 25, 2014 at 4:53 PM, Devon Ryan <dpryan@dpryan.com>> <mailto:dpryan@dpryan.com>> wrote: >> >> Hi Preethy, >> >> You likely want: >> >> design=model.matrix(~pair+Treat:Phenotype, data=pdata) >> >> If that still yields the error, then you'll need to share >> "pdata" or >> "design". Also, please don't crosspost on both this list >> and biostars ( >> https://www.biostars.org/p/98907/), it duplicates the >> community effort. >> >> Devon >> >> >> -- >> Devon Ryan, Ph.D. >> Email: dpryan@dpryan.com <mailto:dpryan@dpryan.com> >> >> Laboratory for Molecular and Cellular Cognition >> German Centre for Neurodegenerative Diseases (DZNE) >> Ludwig-Erhard-Allee 2 >> 53175 Bonn >> Germany >> <devon.ryan@dzne.de <mailto:devon.ryan@dzne.de="">> >> >> >> >> >> On Fri, Apr 25, 2014 at 11:15 AM, Preethy [guest] >> <guest@bioconductor.org <mailto:guest@bioconductor.org="">> >>wrote: >> >> >> Hi All, >> >> I had been trying to do DE analysis of my RNAseq >> experiment using edgeR >> and am having some isssues. The details of the >> Experiment and the R code I >> tried below: >> >> (a) Paired experimental design with 45 pairs >> (b) Treatment: "Before" and "After" >> (c) Phenotype: 1 & 2 >> Aim: Look for DE genes between Phenotype 1 and 2 upon >> treatment taking >> into account the paired design >> >> The R code tried: >> >> library(edgeR) >> counts<-read.delim(file="counts.dat",header=T) >> pair=factor(pdata$pair) >> Treat=factor( pdata$treat) >> Phenotype=factor(pdata$pheno) >> group<-paste(Treat,Phenotype,sep=".") >> design=model.matrix(~pair+Treat:Phenotype, data=counts) >> counts.DGEList<-DGEList(counts, group=group) >> y<-calcNormFactors(counts.DGEList) >> y<-estimateCommonDisp(y, design) >> y<-estimateGLMTrendedDisp(y, design) >> >> >> Error message I get: >> >> Error in glmFit.default(y, design = design, dispersion >> = dispersion, >> offset = offset, : >> Design matrix not of full rank. The following >> coefficients not >> estimable: >> TreatBefore:Phenotype1 TreatBefore:Phenotype2 >> >> >> Any idea to solve this out? >> >> Thanks, >> Preethy >> >> -- output of sessionInfo(): >> >> R version 3.1.0 (2014-04-10) >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=fi_FI.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 >> [5] LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=en_GB.UTF-8 >> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets >> methods base >> >> other attached packages: >> [1] edgeR_3.4.2 limma_3.18.13 >> >> >> -- >> Sent via the guest posting facility at >> bioconductor.org <http: bioconductor.org="">. >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> <mailto:bioconductor@r-project.org> >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics. >> conductor >> >> >> [[alternative HTML version deleted]] >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org <mailto:bioconductor@r-project.org> >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> -- James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> >> >> > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > [[alternative HTML version deleted]]

ADD REPLY • link 11.0 years ago Preethy Venkat Ram ▴ 30

Login before adding your answer.