Question

DESeq2 error

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 5 days ago

United States

hi Xianjun, On Wed, Feb 5, 2014 at 12:26 PM, Dong, Xianjun <xdong@rics.bwh.harvard.edu>wrote: > Hi Mike, > > Thanks for quick response. It helps a lot. > > Note that the transformations have nothing to do with testing. They are > not used by DESeq() or results(). The transformations are an entirely > separate functionality included in our package for users who wish to do > clustering, visualization, or machine learning applications. This is > explained in the vignette. > > > Oh, this is very important clarification. I thought DEseq() use the > fancy vst, and then use generalized linear model to do the differential > analysis. May I know why DEseq2 not using vsd for testing, since you > already proved it better in stabilizing the variance in vst.pdf? Also, is > this same case for rlogTransformation()? > > In my last email I recommended you read the vignette, as it will help explain a lot of your questions. As would reading the documentation for ?DESeq. In the very first sentence of the section of the vignette on transformations we say: > For testing for differential expression we operate on raw counts and use > discrete distributions, however for other downstream analyses (e.g. for > visualization or clustering it might be useful to work with transformed > versions of the count data.) âDESeq() does not use the VST or the rlog transformation. It performs the exact steps outlined in ?DESeq, estimating model parameters, but always evaluating the likelihood of the model with respect to raw counts. Why? because this allows us to model and account for shot noise/Poisson variance and biological variance. At high counts this becomes less important, but performing modeling and inference on the raw count scale allows a consistent approach to high and low counts. We do not need to stabilize the variance ahead of DESeq(), because DESeq() involves an estimation of dispersion step which takes into account the dependence of variance on mean. âbest, Mikeâ > -Xianjun > > > On Feb 5, 2014, at 11:40 AM, Michael Love <michaelisaiahlove@gmail.com> > wrote: > > hi Xianjun, > > base mean is rowMeans(counts(dds, normalized=TRUE)), the mean of the > counts divided by the size factors. > > The transformations involve much more complicated mathematics than > simply dividing the size factors and taking logarithm. It is not expected > that this will relate 1-1 with the simple base mean calculation. > > Note that the transformations have nothing to do with testing. They are > not used by DESeq() or results(). The transformations are an entirely > separate functionality included in our package for users who wish to do > clustering, visualization, or machine learning applications. This is > explained in the vignette. > > I'm happy to answer further questions, but please CC the Bioconductor > mailing list. This way other users can benefit from the questions and > answers. > > best > > Mike > > > > > On Wed, Feb 5, 2014 at 11:27 AM, Dong, Xianjun <xdong@rics.bwh.harvard.edu> > wrote: > >> Hi Mike >> >> One more question: whatâs the baseMean from result(dds) mean? I tried to >> compare rowMeans(assay(vsd)) with result(dds)$baseMean. It seems they are >> different. I also tried log(rowMeans(assay(vsd))). They also look >> different. >> >> p.s. I calculate vsd as vsd <- varianceStabilizingTransformation(dds, >> blind=TRUE) >> >> >> > head(rowMeans(assay(vsd))) >> ENST00000516445.1___ENSG00000252254.1___misc_RNA___Y_RNA >> 4.570189 >> ENST00000363889.1___ENSG00000200759.1___snRNA___U6 >> 4.702925 >> ENST00000458942.1___ENSG00000238545.1___snoRNA___snoU13 >> 4.551350 >> ENST00000416470.1___ENSG00000235185.1___antisense___RP5-1056L3.1 >> 7.839727 >> ENST00000400490.2___ENSG00000158828.5___protein_coding___PINK1 >> 4.650183 >> ENST00000374514.3___ENSG00000011009.6___protein_coding___LYPLA2 >> 6.800622 >> --------------------------------------------------------------------- >> >> >> > head(res) >> DataFrame with 6 rows and 6 columns >> >> baseMean >> >> <numeric> >> ENST00000516445.1___ENSG00000252254.1___misc_RNA___Y_RNA >> 0.7759528 >> ENST00000363889.1___ENSG00000200759.1___snRNA___U6 >> 1.2084540 >> ENST00000458942.1___ENSG00000238545.1___snoRNA___snoU13 >> 0.5918945 >> ENST00000416470.1___ENSG00000235185.1___antisense___RP5-1056L3.1 >> 189.9427698 >> ENST00000400490.2___ENSG00000158828.5___protein_coding___PINK1 >> 1.1895997 >> ENST00000374514.3___ENSG00000011009.6___protein_coding___LYPLA2 >> 74.6999546 >> >> Instructor, Department of Neurology >> Harvard Medical School >> Faculty of Neurology Department >> Brigham and Women's Hospital >> >> Tel: (+1)617-768-8691 >> Address: 65 Landsdowne St, Cambridge, MA 02139 >> Blog: http://onetipperday.blogspot.com/ >> >> On Jan 16, 2014, at 8:07 AM, Michael Love <michaelisaiahlove@gmail.com> >> wrote: >> >> > hi Xianjun, >> > >> > >> > On Wed, Jan 15, 2014 at 11:03 PM, Dong, Xianjun < >> XDONG@rics.bwh.harvard.edu> wrote: >> > Hi Michael, >> > >> > I was recently using your DESeq2 a lot, but unfortunately I could not >> get any run successful when I use multiple factors. So, I wonder it must >> be something I misused or misunderstood. >> > >> > Here is one of such examples: >> > >> > > dds <- DESeqDataSetFromMatrix(countData = countData, colData = >> colData, design = ~ age + gender + PMI + condition) >> > it appears that the last variable in the design formula, 'condition', >> has a factor level, 'control', which is not the base level. we recommend >> you use factor(...,levels=...) or relevel() to set this as the base level >> before proceeding. for more information, please see the 'Note on factor >> levels' in vignette('DESeq2'). >> > > colData(dds)$condition <- factor(colData(dds)$condition, >> levels=c("control","case")) >> > > colData(dds)$condition <- relevel(colData(dds)$condition, "control") >> > > >> > > dds <- DESeq(dds) >> > estimating size factors >> > estimating dispersions >> > gene-wise dispersion estimates >> > >> > error: inv(): matrix appears to be singular >> > >> > Error: >> > >> > â >> > I thought I had solved many of these such problems in DESeq2 version >> 1.2. I don't know what version you are using, but if it is v1.0 that could >> be a solution.â >> > >> > >> > >> > >> > colData is like this: >> > > colData >> > condition age gender PMI >> > h3k4me3_HD_batch1_3576 case 55 1 37.25 >> > h3k4me3_HD_batch1_3584 case 56 2 19.00 >> > h3k4me3_HD_batch1_4189 case 71 1 20.50 >> > h3k4me3_HD_batch1_4242 case 69 1 19.06 >> > h3k4me3_HD_batch1_4412 case 43 1 21.30 >> > h3k4me3_HD_batch1_4430 case 45 1 3.73 >> > Schraham_1644 control 81 1 8.00 >> > Schraham_3706R control 63 1 24.50 >> > Schraham_7244 control 64 1 28.21 >> > Schraham_R30 control 74 1 12.00 >> > Schraham_R7 control 55 1 17.00 >> > >> > >> > âThe error means that during the estimation steps, it has to solve (X' >> W Xâ)^-1, where X is the design matrix and W is a weight matrix based on >> the variance model for each gene. >> > >> > Looking at the column data, I would guess that the single '2' in the >> gender column might be the problem, as this column is nearly the same as >> the intercept. If you remove the 'gender' variable from the design, do you >> still get the error? >> > >> > Also, in general, we recommend splitting continuous variables into >> factors like so: >> > >> > colData(dds)$ageFctr <- cut(colData(dds)$age, 3)) >> > >> > cutting the age into three groups, then: >> > >> > design â(dds) <- ~ ageâFctrâ + gender + PMI âFctrâ + condition >> > â >> > âThis allows a more general dependence of counts on continuous >> covariates rather than the exponential: >> > >> > counts ~ exp(age * beta) >> > >> > âIn some experiments, it makes sense to have this exponential increase >> in mRNA with a covariate, but with things like 'age' or other continuous >> covariates about subjects, it makes more sense to fit means for each group.â >> > >> > âMikeâ >> > >> > >> > >> > Do you have any clue for the error? >> > >> > -Xianjun >> > --------------------------------------------------------------------- >> > Instructor, Department of Neurology >> > Harvard Medical School >> > Faculty of Neurology Department >> > Brigham and Women's Hospital >> > >> > Tel: (+1)617-768-8691 >> > Address: 65 Landsdowne St, Cambridge, MA 02139 >> > Blog: http://onetipperday.blogspot.com/ >> > >> > >> > >> > The information in this e-mail is intended only for the person to whom >> it is >> > addressed. If you believe this e-mail was sent to you in error and the >> e-mail >> > contains patient information, please contact the Partners Compliance >> HelpLine at >> > http://www.partners.org/complianceline . If the e-mail was sent to you >> in error >> > but does not contain patient information, please contact the sender and >> properly >> > dispose of the e-mail. >> > >> > >> >> > > [[alternative HTML version deleted]]

Visualization Clustering DESeq2 Visualization Clustering DESeq2 • 2.9k views

ADD COMMENT • link 11.2 years ago Michael Love 43k

score 0 · Answer 1 · 2014-02-05

hi Xianjun, Let's keep all questions on the mailing list. On Wed, Feb 5, 2014 at 4:16 PM, Dong, Xianjun <xdong@rics.bwh.harvard.edu>wrote: > > > One more question if you donât mind to answer: > According to your reply, baseMean=rowMeans(counts(dds, normalized=TRUE)). > Just a comment: itâs mean for all samples, not mean of two means, e.g. > (mean_case + mean_control)/2. When the sample size are different btw case > and control, they are different. I am sure you already noticed this. > My question is about log2FoldChange. From what I understand, itâs > log2(mean_case/mean_control), where mean_case and mean_control are based on > normalized count e.g. counts(dds, normalized=TRUE). Am I right? but it > seems that I am wrong, because when I calculate the log2FoldChange in the > way I understood, itâs different from the log2FoldChange output from > results(). Could you help on this? > > This is answered in the manual pages and vignette, specifically ?nbinomWaldTest in the man pages, and Fig 1 and Section 4.2 "Changes compared to the DESeq package" âMikeâ > Best, > Xianjun > --------------------------------------------------------------------- > > Instructor, Department of Neurology > Harvard Medical School > Faculty of Neurology Department > Brigham and Women's Hospital > > Tel: (+1)617-768-8691 > Address: 65 Landsdowne St, Cambridge, MA 02139 > Blog: http://onetipperday.blogspot.com/ > > On Feb 5, 2014, at 12:47 PM, Michael Love <michaelisaiahlove@gmail.com> > wrote: > > hi Xianjun, > > > > On Wed, Feb 5, 2014 at 12:26 PM, Dong, Xianjun <xdong@rics.bwh.harvard.edu> > wrote: > >> Hi Mike, >> >> Thanks for quick response. It helps a lot. >> >> Note that the transformations have nothing to do with testing. They >> are not used by DESeq() or results(). The transformations are an >> entirely separate functionality included in our package for users who wish >> to do clustering, visualization, or machine learning applications. This is >> explained in the vignette. >> >> >> Oh, this is very important clarification. I thought DEseq() use the >> fancy vst, and then use generalized linear model to do the differential >> analysis. May I know why DEseq2 not using vsd for testing, since you >> already proved it better in stabilizing the variance in vst.pdf? Also, is >> this same case for rlogTransformation()? >> >> > In my last email I recommended you read the vignette, as it will help > explain a lot of your questions. As would reading the documentation for > ?DESeq. In the very first sentence of the section of the vignette on > transformations we say: > > >> For testing for differential expression we operate on raw counts and use >> discrete distributions, however for other downstream analyses (e.g. for >> visualization or clustering it might be useful to work with transformed >> versions of the count data.) > > > âDESeq() does not use the VST or the rlog transformation. It performs > the exact steps outlined in ?DESeq, estimating model parameters, but always > evaluating the likelihood of the model with respect to raw counts. Why? > because this allows us to model and account for shot noise/Poisson variance > and biological variance. At high counts this becomes less important, but > performing modeling and inference on the raw count scale allows a > consistent approach to high and low counts. We do not need to stabilize > the variance ahead of DESeq(), because DESeq() involves an estimation of > dispersion step which takes into account the dependence of variance on mean. > > âbest, > > Mikeâ > > > > > >> -Xianjun >> >> >> On Feb 5, 2014, at 11:40 AM, Michael Love <michaelisaiahlove@gmail.com> >> wrote: >> >> hi Xianjun, >> >> base mean is rowMeans(counts(dds, normalized=TRUE)), the mean of the >> counts divided by the size factors. >> >> The transformations involve much more complicated mathematics than >> simply dividing the size factors and taking logarithm. It is not expected >> that this will relate 1-1 with the simple base mean calculation. >> >> Note that the transformations have nothing to do with testing. They are >> not used by DESeq() or results(). The transformations are an entirely >> separate functionality included in our package for users who wish to do >> clustering, visualization, or machine learning applications. This is >> explained in the vignette. >> >> I'm happy to answer further questions, but please CC the Bioconductor >> mailing list. This way other users can benefit from the questions and >> answers. >> >> best >> >> Mike >> >> >> >> >> On Wed, Feb 5, 2014 at 11:27 AM, Dong, Xianjun < >> XDONG@rics.bwh.harvard.edu> wrote: >> >>> Hi Mike >>> >>> One more question: whatâs the baseMean from result(dds) mean? I tried to >>> compare rowMeans(assay(vsd)) with result(dds)$baseMean. It seems they are >>> different. I also tried log(rowMeans(assay(vsd))). They also look >>> different. >>> >>> p.s. I calculate vsd as vsd <- varianceStabilizingTransformation(dds, >>> blind=TRUE) >>> >>> >>> > head(rowMeans(assay(vsd))) >>> ENST00000516445.1___ENSG00000252254.1___misc_RNA___Y_RNA >>> 4.570189 >>> ENST00000363889.1___ENSG00000200759.1___snRNA___U6 >>> 4.702925 >>> ENST00000458942.1___ENSG00000238545.1___snoRNA___snoU13 >>> 4.551350 >>> ENST00000416470.1___ENSG00000235185.1___antisense___RP5-1056L3.1 >>> 7.839727 >>> ENST00000400490.2___ENSG00000158828.5___protein_coding___PINK1 >>> 4.650183 >>> ENST00000374514.3___ENSG00000011009.6___protein_coding___LYPLA2 >>> 6.800622 >>> --------------------------------------------------------------------- >>> >>> >>> > head(res) >>> DataFrame with 6 rows and 6 columns >>> >>> baseMean >>> >>> <numeric> >>> ENST00000516445.1___ENSG00000252254.1___misc_RNA___Y_RNA >>> 0.7759528 >>> ENST00000363889.1___ENSG00000200759.1___snRNA___U6 >>> 1.2084540 >>> ENST00000458942.1___ENSG00000238545.1___snoRNA___snoU13 >>> 0.5918945 >>> ENST00000416470.1___ENSG00000235185.1___antisense___RP5-1056L3.1 >>> 189.9427698 >>> ENST00000400490.2___ENSG00000158828.5___protein_coding___PINK1 >>> 1.1895997 >>> ENST00000374514.3___ENSG00000011009.6___protein_coding___LYPLA2 >>> 74.6999546 >>> >>> Instructor, Department of Neurology >>> Harvard Medical School >>> Faculty of Neurology Department >>> Brigham and Women's Hospital >>> >>> Tel: (+1)617-768-8691 >>> Address: 65 Landsdowne St, Cambridge, MA 02139 >>> Blog: http://onetipperday.blogspot.com/ >>> >>> On Jan 16, 2014, at 8:07 AM, Michael Love <michaelisaiahlove@gmail.com> >>> wrote: >>> >>> > hi Xianjun, >>> > >>> > >>> > On Wed, Jan 15, 2014 at 11:03 PM, Dong, Xianjun < >>> XDONG@rics.bwh.harvard.edu> wrote: >>> > Hi Michael, >>> > >>> > I was recently using your DESeq2 a lot, but unfortunately I could not >>> get any run successful when I use multiple factors. So, I wonder it must >>> be something I misused or misunderstood. >>> > >>> > Here is one of such examples: >>> > >>> > > dds <- DESeqDataSetFromMatrix(countData = countData, colData = >>> colData, design = ~ age + gender + PMI + condition) >>> > it appears that the last variable in the design formula, 'condition', >>> has a factor level, 'control', which is not the base level. we recommend >>> you use factor(...,levels=...) or relevel() to set this as the base level >>> before proceeding. for more information, please see the 'Note on factor >>> levels' in vignette('DESeq2'). >>> > > colData(dds)$condition <- factor(colData(dds)$condition, >>> levels=c("control","case")) >>> > > colData(dds)$condition <- relevel(colData(dds)$condition, "control") >>> > > >>> > > dds <- DESeq(dds) >>> > estimating size factors >>> > estimating dispersions >>> > gene-wise dispersion estimates >>> > >>> > error: inv(): matrix appears to be singular >>> > >>> > Error: >>> > >>> > â >>> > I thought I had solved many of these such problems in DESeq2 version >>> 1.2. I don't know what version you are using, but if it is v1.0 that could >>> be a solution.â >>> > >>> > >>> > >>> > >>> > colData is like this: >>> > > colData >>> > condition age gender PMI >>> > h3k4me3_HD_batch1_3576 case 55 1 37.25 >>> > h3k4me3_HD_batch1_3584 case 56 2 19.00 >>> > h3k4me3_HD_batch1_4189 case 71 1 20.50 >>> > h3k4me3_HD_batch1_4242 case 69 1 19.06 >>> > h3k4me3_HD_batch1_4412 case 43 1 21.30 >>> > h3k4me3_HD_batch1_4430 case 45 1 3.73 >>> > Schraham_1644 control 81 1 8.00 >>> > Schraham_3706R control 63 1 24.50 >>> > Schraham_7244 control 64 1 28.21 >>> > Schraham_R30 control 74 1 12.00 >>> > Schraham_R7 control 55 1 17.00 >>> > >>> > >>> > âThe error means that during the estimation steps, it has to solve (X' >>> W Xâ)^-1, where X is the design matrix and W is a weight matrix based on >>> the variance model for each gene. >>> > >>> > Looking at the column data, I would guess that the single '2' in the >>> gender column might be the problem, as this column is nearly the same as >>> the intercept. If you remove the 'gender' variable from the design, do you >>> still get the error? >>> > >>> > Also, in general, we recommend splitting continuous variables into >>> factors like so: >>> > >>> > colData(dds)$ageFctr <- cut(colData(dds)$age, 3)) >>> > >>> > cutting the age into three groups, then: >>> > >>> > design â(dds) <- ~ ageâFctrâ + gender + PMI âFctrâ + condition >>> > â >>> > âThis allows a more general dependence of counts on continuous >>> covariates rather than the exponential: >>> > >>> > counts ~ exp(age * beta) >>> > >>> > âIn some experiments, it makes sense to have this exponential increase >>> in mRNA with a covariate, but with things like 'age' or other continuous >>> covariates about subjects, it makes more sense to fit means for each group.â >>> > >>> > âMikeâ >>> > >>> > >>> > >>> > Do you have any clue for the error? >>> > >>> > -Xianjun >>> > --------------------------------------------------------------------- >>> > Instructor, Department of Neurology >>> > Harvard Medical School >>> > Faculty of Neurology Department >>> > Brigham and Women's Hospital >>> > >>> > Tel: (+1)617-768-8691 >>> > Address: 65 Landsdowne St, Cambridge, MA 02139 >>> > Blog: http://onetipperday.blogspot.com/ >>> > >>> > >>> > >>> > The information in this e-mail is intended only for the person to whom >>> it is >>> > addressed. If you believe this e-mail was sent to you in error and the >>> e-mail >>> > contains patient information, please contact the Partners Compliance >>> HelpLine at >>> > http://www.partners.org/complianceline . If the e-mail was sent to >>> you in error >>> > but does not contain patient information, please contact the sender >>> and properly >>> > dispose of the e-mail. >>> > >>> > >>> >>> >> >> > > [[alternative HTML version deleted]]

score 0 · Answer 2 · 2014-02-05

Hi Mike, Thanks for quick response. It helps a lot. Note that the transformations have nothing to do with testing. They are not used by DESeq() or results(). The transformations are an entirely separate functionality included in our package for users who wish to do clustering, visualization, or machine learning applications. This is explained in the vignette. Oh, this is very important clarification. I thought DEseq() use the fancy vst, and then use generalized linear model to do the differential analysis. May I know why DEseq2 not using vsd for testing, since you already proved it better in stabilizing the variance in vst.pdf? Also, is this same case for rlogTransformation()? -Xianjun On Feb 5, 2014, at 11:40 AM, Michael Love <michaelisaiahlove@gmail.com<mailto:michaelisaiahlove@gmail.com>> wrote: hi Xianjun, base mean is rowMeans(counts(dds, normalized=TRUE)), the mean of the counts divided by the size factors. The transformations involve much more complicated mathematics than simply dividing the size factors and taking logarithm. It is not expected that this will relate 1-1 with the simple base mean calculation. Note that the transformations have nothing to do with testing. They are not used by DESeq() or results(). The transformations are an entirely separate functionality included in our package for users who wish to do clustering, visualization, or machine learning applications. This is explained in the vignette. I'm happy to answer further questions, but please CC the Bioconductor mailing list. This way other users can benefit from the questions and answers. best Mike On Wed, Feb 5, 2014 at 11:27 AM, Dong, Xianjun <xdong@rics.bwh.harvard.edu<mailto:xdong@rics.bwh.harvard.edu>> wrote: Hi Mike One more question: whatâs the baseMean from result(dds) mean? I tried to compare rowMeans(assay(vsd)) with result(dds)$baseMean. It seems they are different. I also tried log(rowMeans(assay(vsd))). They also look different. p.s. I calculate vsd as vsd <- varianceStabilizingTransformation(dds, blind=TRUE) > head(rowMeans(assay(vsd))) ENST00000516445.1___ENSG00000252254.1___misc_RNA___Y_RNA 4.570189 ENST00000363889.1___ENSG00000200759.1___snRNA___U6 4.702925 ENST00000458942.1___ENSG00000238545.1___snoRNA___snoU13 4.551350 ENST00000416470.1___ENSG00000235185.1___antisense___RP5-1056L3.1 7.839727 ENST00000400490.2___ENSG00000158828.5___protein_coding___PINK1 4.650183 ENST00000374514.3___ENSG00000011009.6___protein_coding___LYPLA2 6.800622 --------------------------------------------------------------------- > head(res) DataFrame with 6 rows and 6 columns baseMean <numeric> ENST00000516445.1___ENSG00000252254.1___misc_RNA___Y_RNA 0.7759528 ENST00000363889.1___ENSG00000200759.1___snRNA___U6 1.2084540 ENST00000458942.1___ENSG00000238545.1___snoRNA___snoU13 0.5918945 ENST00000416470.1___ENSG00000235185.1___antisense___RP5-1056L3.1 189.9427698 ENST00000400490.2___ENSG00000158828.5___protein_coding___PINK1 1.1895997 ENST00000374514.3___ENSG00000011009.6___protein_coding___LYPLA2 74.6999546 Instructor, Department of Neurology Harvard Medical School Faculty of Neurology Department Brigham and Women's Hospital Tel: (+1)617-768-8691<tel:%28%2b1%29617-768-8691> Address: 65 Landsdowne St, Cambridge, MA 02139 Blog: http://onetipperday.blogspot.com/ On Jan 16, 2014, at 8:07 AM, Michael Love <michaelisaiahlove@gmail.com<mailto:michaelisaiahlove@gmail.com>> wrote: > hi Xianjun, > > > On Wed, Jan 15, 2014 at 11:03 PM, Dong, Xianjun <xdong@rics.bwh.harvard.edu<mailto:xdong@rics.bwh.harvard.edu>> wrote: > Hi Michael, > > I was recently using your DESeq2 a lot, but unfortunately I could not get any run successful when I use multiple factors. So, I wonder it must be something I misused or misunderstood. > > Here is one of such examples: > > > dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~ age + gender + PMI + condition) > it appears that the last variable in the design formula, 'condition', has a factor level, 'control', which is not the base level. we recommend you use factor(...,levels=...) or relevel() to set this as the base level before proceeding. for more information, please see the 'Note on factor levels' in vignette('DESeq2'). > > colData(dds)$condition <- factor(colData(dds)$condition, levels=c("control","case")) > > colData(dds)$condition <- relevel(colData(dds)$condition, "control") > > > > dds <- DESeq(dds) > estimating size factors > estimating dispersions > gene-wise dispersion estimates > > error: inv(): matrix appears to be singular > > Error: > > â > I thought I had solved many of these such problems in DESeq2 version 1.2. I don't know what version you are using, but if it is v1.0 that could be a solution.â > > > > > colData is like this: > > colData > condition age gender PMI > h3k4me3_HD_batch1_3576 case 55 1 37.25 > h3k4me3_HD_batch1_3584 case 56 2 19.00 > h3k4me3_HD_batch1_4189 case 71 1 20.50 > h3k4me3_HD_batch1_4242 case 69 1 19.06 > h3k4me3_HD_batch1_4412 case 43 1 21.30 > h3k4me3_HD_batch1_4430 case 45 1 3.73 > Schraham_1644 control 81 1 8.00 > Schraham_3706R control 63 1 24.50 > Schraham_7244 control 64 1 28.21 > Schraham_R30 control 74 1 12.00 > Schraham_R7 control 55 1 17.00 > > > âThe error means that during the estimation steps, it has to solve (X' W Xâ)^-1, where X is the design matrix and W is a weight matrix based on the variance model for each gene. > > Looking at the column data, I would guess that the single '2' in the gender column might be the problem, as this column is nearly the same as the intercept. If you remove the 'gender' variable from the design, do you still get the error? > > Also, in general, we recommend splitting continuous variables into factors like so: > > colData(dds)$ageFctr <- cut(colData(dds)$age, 3)) > > cutting the age into three groups, then: > > design â(dds) <- ~ ageâFctrâ + gender + PMI âFctrâ + condition > â > âThis allows a more general dependence of counts on continuous covariates rather than the exponential: > > counts ~ exp(age * beta) > > âIn some experiments, it makes sense to have this exponential increase in mRNA with a covariate, but with things like 'age' or other continuous covariates about subjects, it makes more sense to fit means for each group.â > > âMikeâ > > > > Do you have any clue for the error? > > -Xianjun > --------------------------------------------------------------------- > Instructor, Department of Neurology > Harvard Medical School > Faculty of Neurology Department > Brigham and Women's Hospital > > Tel: (+1)617-768-8691 > Address: 65 Landsdowne St, Cambridge, MA 02139 > Blog: http://onetipperday.blogspot.com/ > > > > The information in this e-mail is intended only for the person to whom it is > addressed. If you believe this e-mail was sent to you in error and the e-mail > contains patient information, please contact the Partners Compliance HelpLine at > http://www.partners.org/complianceline . If the e-mail was sent to you in error > but does not contain patient information, please contact the sender and properly > dispose of the e-mail. > > [[alternative HTML version deleted]]