Entering edit mode
hi Xianjun,
On Wed, Feb 5, 2014 at 12:26 PM, Dong, Xianjun
<xdong@rics.bwh.harvard.edu>wrote:
> Hi Mike,
>
> Thanks for quick response. It helps a lot.
>
> Note that the transformations have nothing to do with testing.
They are
> not used by DESeq() or results(). The transformations are an
entirely
> separate functionality included in our package for users who wish to
do
> clustering, visualization, or machine learning applications. This
is
> explained in the vignette.
>
>
> Oh, this is very important clarification. I thought DEseq() use the
> fancy vst, and then use generalized linear model to do the
differential
> analysis. May I know why DEseq2 not using vsd for testing, since you
> already proved it better in stabilizing the variance in vst.pdf?
Also, is
> this same case for rlogTransformation()?
>
>
In my last email I recommended you read the vignette, as it will help
explain a lot of your questions. As would reading the documentation
for
?DESeq. In the very first sentence of the section of the vignette on
transformations we say:
> For testing for differential expression we operate on raw counts and
use
> discrete distributions, however for other downstream analyses (e.g.
for
> visualization or clustering it might be useful to work with
transformed
> versions of the count data.)
âDESeq() does not use the VST or the rlog transformation. It
performs the
exact steps outlined in ?DESeq, estimating model parameters, but
always
evaluating the likelihood of the model with respect to raw counts.
Why?
because this allows us to model and account for shot noise/Poisson
variance
and biological variance. At high counts this becomes less important,
but
performing modeling and inference on the raw count scale allows a
consistent approach to high and low counts. We do not need to
stabilize
the variance ahead of DESeq(), because DESeq() involves an estimation
of
dispersion step which takes into account the dependence of variance on
mean.
âbest,
Mikeâ
> -Xianjun
>
>
> On Feb 5, 2014, at 11:40 AM, Michael Love
<michaelisaiahlove@gmail.com>
> wrote:
>
> hi Xianjun,
>
> base mean is rowMeans(counts(dds, normalized=TRUE)), the mean of
the
> counts divided by the size factors.
>
> The transformations involve much more complicated mathematics than
> simply dividing the size factors and taking logarithm. It is not
expected
> that this will relate 1-1 with the simple base mean calculation.
>
> Note that the transformations have nothing to do with testing. They
are
> not used by DESeq() or results(). The transformations are an
entirely
> separate functionality included in our package for users who wish to
do
> clustering, visualization, or machine learning applications. This
is
> explained in the vignette.
>
> I'm happy to answer further questions, but please CC the
Bioconductor
> mailing list. This way other users can benefit from the questions
and
> answers.
>
> best
>
> Mike
>
>
>
>
> On Wed, Feb 5, 2014 at 11:27 AM, Dong, Xianjun
<xdong@rics.bwh.harvard.edu> > wrote:
>
>> Hi Mike
>>
>> One more question: whatâs the baseMean from result(dds) mean? I
tried to
>> compare rowMeans(assay(vsd)) with result(dds)$baseMean. It seems
they are
>> different. I also tried log(rowMeans(assay(vsd))). They also look
>> different.
>>
>> p.s. I calculate vsd as vsd <-
varianceStabilizingTransformation(dds,
>> blind=TRUE)
>>
>>
>> > head(rowMeans(assay(vsd)))
>> ENST00000516445.1___ENSG00000252254.1___misc_RNA___Y_RNA
>> 4.570189
>> ENST00000363889.1___ENSG00000200759.1___snRNA___U6
>> 4.702925
>> ENST00000458942.1___ENSG00000238545.1___snoRNA___snoU13
>> 4.551350
>> ENST00000416470.1___ENSG00000235185.1___antisense___RP5-1056L3.1
>> 7.839727
>> ENST00000400490.2___ENSG00000158828.5___protein_coding___PINK1
>> 4.650183
>> ENST00000374514.3___ENSG00000011009.6___protein_coding___LYPLA2
>> 6.800622
>>
---------------------------------------------------------------------
>>
>>
>> > head(res)
>> DataFrame with 6 rows and 6 columns
>>
>> baseMean
>>
>> <numeric>
>> ENST00000516445.1___ENSG00000252254.1___misc_RNA___Y_RNA
>> 0.7759528
>> ENST00000363889.1___ENSG00000200759.1___snRNA___U6
>> 1.2084540
>> ENST00000458942.1___ENSG00000238545.1___snoRNA___snoU13
>> 0.5918945
>> ENST00000416470.1___ENSG00000235185.1___antisense___RP5-1056L3.1
>> 189.9427698
>> ENST00000400490.2___ENSG00000158828.5___protein_coding___PINK1
>> 1.1895997
>> ENST00000374514.3___ENSG00000011009.6___protein_coding___LYPLA2
>> 74.6999546
>>
>> Instructor, Department of Neurology
>> Harvard Medical School
>> Faculty of Neurology Department
>> Brigham and Women's Hospital
>>
>> Tel: (+1)617-768-8691
>> Address: 65 Landsdowne St, Cambridge, MA 02139
>> Blog: http://onetipperday.blogspot.com/
>>
>> On Jan 16, 2014, at 8:07 AM, Michael Love
<michaelisaiahlove@gmail.com>
>> wrote:
>>
>> > hi Xianjun,
>> >
>> >
>> > On Wed, Jan 15, 2014 at 11:03 PM, Dong, Xianjun <
>> XDONG@rics.bwh.harvard.edu> wrote:
>> > Hi Michael,
>> >
>> > I was recently using your DESeq2 a lot, but unfortunately I could
not
>> get any run successful when I use multiple factors. So, I wonder
it must
>> be something I misused or misunderstood.
>> >
>> > Here is one of such examples:
>> >
>> > > dds <- DESeqDataSetFromMatrix(countData = countData, colData =
>> colData, design = ~ age + gender + PMI + condition)
>> > it appears that the last variable in the design formula,
'condition',
>> has a factor level, 'control', which is not the base level. we
recommend
>> you use factor(...,levels=...) or relevel() to set this as the base
level
>> before proceeding. for more information, please see the 'Note on
factor
>> levels' in vignette('DESeq2').
>> > > colData(dds)$condition <- factor(colData(dds)$condition,
>> levels=c("control","case"))
>> > > colData(dds)$condition <- relevel(colData(dds)$condition,
"control")
>> > >
>> > > dds <- DESeq(dds)
>> > estimating size factors
>> > estimating dispersions
>> > gene-wise dispersion estimates
>> >
>> > error: inv(): matrix appears to be singular
>> >
>> > Error:
>> >
>> > â
>> > I thought I had solved many of these such problems in DESeq2
version
>> 1.2. I don't know what version you are using, but if it is v1.0
that could
>> be a solution.â
>> >
>> >
>> >
>> >
>> > colData is like this:
>> > > colData
>> > condition age gender PMI
>> > h3k4me3_HD_batch1_3576 case 55 1 37.25
>> > h3k4me3_HD_batch1_3584 case 56 2 19.00
>> > h3k4me3_HD_batch1_4189 case 71 1 20.50
>> > h3k4me3_HD_batch1_4242 case 69 1 19.06
>> > h3k4me3_HD_batch1_4412 case 43 1 21.30
>> > h3k4me3_HD_batch1_4430 case 45 1 3.73
>> > Schraham_1644 control 81 1 8.00
>> > Schraham_3706R control 63 1 24.50
>> > Schraham_7244 control 64 1 28.21
>> > Schraham_R30 control 74 1 12.00
>> > Schraham_R7 control 55 1 17.00
>> >
>> >
>> > âThe error means that during the estimation steps, it has to
solve (X'
>> W Xâ)^-1, where X is the design matrix and W is a weight matrix
based on
>> the variance model for each gene.
>> >
>> > Looking at the column data, I would guess that the single '2' in
the
>> gender column might be the problem, as this column is nearly the
same as
>> the intercept. If you remove the 'gender' variable from the design,
do you
>> still get the error?
>> >
>> > Also, in general, we recommend splitting continuous variables
into
>> factors like so:
>> >
>> > colData(dds)$ageFctr <- cut(colData(dds)$age, 3))
>> >
>> > cutting the age into three groups, then:
>> >
>> > design â(dds) <- ~ ageâFctrâ + gender + PMI âFctrâ +
condition
>> > â
>> > âThis allows a more general dependence of counts on continuous
>> covariates rather than the exponential:
>> >
>> > counts ~ exp(age * beta)
>> >
>> > âIn some experiments, it makes sense to have this exponential
increase
>> in mRNA with a covariate, but with things like 'age' or other
continuous
>> covariates about subjects, it makes more sense to fit means for
each group.â
>> >
>> > âMikeâ
>> >
>> >
>> >
>> > Do you have any clue for the error?
>> >
>> > -Xianjun
>> >
---------------------------------------------------------------------
>> > Instructor, Department of Neurology
>> > Harvard Medical School
>> > Faculty of Neurology Department
>> > Brigham and Women's Hospital
>> >
>> > Tel: (+1)617-768-8691
>> > Address: 65 Landsdowne St, Cambridge, MA 02139
>> > Blog: http://onetipperday.blogspot.com/
>> >
>> >
>> >
>> > The information in this e-mail is intended only for the person to
whom
>> it is
>> > addressed. If you believe this e-mail was sent to you in error
and the
>> e-mail
>> > contains patient information, please contact the Partners
Compliance
>> HelpLine at
>> > http://www.partners.org/complianceline . If the e-mail was sent
to you
>> in error
>> > but does not contain patient information, please contact the
sender and
>> properly
>> > dispose of the e-mail.
>> >
>> >
>>
>>
>
>
[[alternative HTML version deleted]]