Hi Sophie,
On Wed, Apr 23, 2014 at 5:44 PM, Sophie Josephine Weiss
<sophie.weiss at="" colorado.edu=""> wrote:
>
> Thanks Michael,
> The entire dataset (attached code and .biom) is negatives
I don't see that the entire dataset is all negatives. I get the same
percent of negatives as you had zeros in the original counts:
z <- otu_table(x)
zz <- otu_table(DESeq_data)
> table(as.vector(z) > 0) / prod(dim(z))
FALSE TRUE
0.98022416 0.01977584
> table(as.vector(zz) > 0) / prod(dim(zz))
FALSE TRUE
0.98022416 0.01977584
> summary(as.vector(zz))
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.200 -1.197 -1.197 -1.126 -1.197 280.300
> summary(as.vector(zz)[as.vector(zz) > 0])
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.304 1.304 1.304 2.409 2.476 280.300
If you want to do something downstream which requires positive values,
set all the negative values to 0 as I wrote previously Or you can add
the absolute value of the smallest value, so if the smallest value is
-1.200, just add 1.2 to the matrix. I don't have any recommendations
though for what is a good idea here.
Mike
>
> - there was an error of "out of vertex space" as described here, so
I tried setting maxk=300 as suggested.
> Commands are below.
> Thanks again!
> Sophie
>
> source("
http://bioconductor.org/biocLite.R")
> biocLite("phyloseq")
> biocLite("DESeq")
>
> library("phyloseq")
> library("DESeq")
> library("biom")
>
> file = "~/Downloads/study_449_closed_reference_otu_table.biom"
> x = import_biom(file)
> source("~/Downloads/deseq_varstab.R")
> DESeq_data = deseq_varstab(x, method = "blind", sharingMode =
"maximum", fitType = "local", locfit_extra_args=list(maxk=300))
> write_biom(make_biom(DESeq_data at
otu_table),"~/Desktop/449_Costello_DESeq.biom.tsv")
>
>
> On Sat, Apr 19, 2014 at 11:29 AM, Michael Love <michaelisaiahlove at="" gmail.com=""> wrote:
>>
>> hi Sophie,
>>
>> You are getting negative values from the transformation for the
>> reasons I mentioned earlier, the transformation is log2-like.
>>
>> If you want to do something downstream of our software which
requires
>> non-negative values, below is some example code of how to threshold
>> negative values for a matrix in R.
>>
>> The question of what is the best distance to use for taxa counts,
or
>> whether ANOVA on variance stabilized data is a good idea for taxa
>> counts, depends on the properties of the data, and this is an area
of
>> active research. As I don't have experience analyzing this kind of
>> data, I don't want to make any guesses.
>>
>> > m <- matrix(-2:5, ncol=2)
>> > m
>> [,1] [,2]
>> [1,] -2 2
>> [2,] -1 3
>> [3,] 0 4
>> [4,] 1 5
>> > m[m < 0] <- 0
>> > m
>> [,1] [,2]
>> [1,] 0 2
>> [2,] 0 3
>> [3,] 0 4
>> [4,] 1 5
>>
>> On Fri, Apr 18, 2014 at 3:32 PM, Sophie Josephine Weiss
>> <sophie.weiss at="" colorado.edu=""> wrote:
>> > Hi Mike,
>> > Could you please check whether I am running this correctly? I
have double
>> > checked all the parameters, but for some reason, I am getting
negatives
>> > using the R script on the attached .biom dataset. There are no
replicates
>> > in this microbial dataset.
>> > Thanks for your advice,
>> > Sophie
>> >
>> >
>> > On Wed, Apr 16, 2014 at 4:02 PM, Sophie Josephine Weiss
>> > <sophie.weiss at="" colorado.edu=""> wrote:
>> >>
>> >> Thanks Mike, that is what I thought. What if we wanted to
perform kruskal
>> >> wallis, or is it possible to perform anova on the variance-
stabilized
>> >> matrix?
>> >>
>> >>
>> >> On Wed, Apr 16, 2014 at 2:29 PM, Michael Love
>> >> <michaelisaiahlove at="" gmail.com=""> wrote:
>> >>>
>> >>> hi Sophie,
>> >>>
>> >>> We recommend using the standard DESeq() function for
differential
>> >>> expression.
>> >>>
>> >>> This is mentioned in the first line of the vignette section on
>> >>> transformations:
>> >>>
>> >>> "In order to test for diff erential expression, we operate on
raw
>> >>> counts and use discrete distributions as
>> >>> described in the previous section"
>> >>>
>> >>> Also, in the McMurdie and Holmes, they are using the DESeq()
function,
>> >>> as shown in their supplemental material:
>> >>>
>> >>>
>> >>>
http://joey711.github.io/waste-not-supplemental/simulation-
differential-abundance/simulation-differential-abundance-server.html
>> >>>
>> >>> On Wed, Apr 16, 2014 at 3:22 PM, Sophie Josephine Weiss
>> >>> <sophie.weiss at="" colorado.edu=""> wrote:
>> >>> > Please help with this? Thanks again.
>> >>> >
>> >>> >
>> >>> > On Mon, Apr 14, 2014 at 6:02 PM, Sophie Josephine Weiss
>> >>> > <sophie.weiss at="" colorado.edu=""> wrote:
>> >>> >>
>> >>> >> Thanks again Mike - would it be ok to do chi-2 and other
significance
>> >>> >> tests on the DESeq transformed datasets using independent
code, or is
>> >>> >> it
>> >>> >> necessary to do the differential expression tests strictly
within
>> >>> >> DESeq2?
>> >>> >>
>> >>> >> Sophie
>> >>> >>
>> >>> >>
>> >>> >> On Mon, Apr 14, 2014 at 5:41 PM, Michael Love
>> >>> >> <michaelisaiahlove at="" gmail.com=""> wrote:
>> >>> >>>
>> >>> >>> hi Sophie,
>> >>> >>>
>> >>> >>> The VST code is the same in DESeq and DESeq2. The
estimation of
>> >>> >>> dispersion is slightly different (details are in the
vignette
>> >>> >>> "Changes
>> >>> >>> from DESeq to DESeq2"), but the fitted line (which is used
by the
>> >>> >>> VST)
>> >>> >>> should be very similar.
>> >>> >>>
>> >>> >>> Mike
>> >>> >>>
>> >>> >>> On Mon, Apr 14, 2014 at 6:27 PM, Sophie Josephine Weiss
>> >>> >>> <sophie.weiss at="" colorado.edu=""> wrote:
>> >>> >>> > Hi Mike,
>> >>> >>> > The McMurdie and Holmes paper uses DESeq for matrix
normalization -
>> >>> >>> > do
>> >>> >>> > you
>> >>> >>> > think that is ok, or would it be better to use DESeq 2?
>> >>> >>> > Thanks again,
>> >>> >>> > Sophie
>> >>> >>> >
>> >>> >>> >
>> >>> >>> > On Mon, Apr 14, 2014 at 3:40 PM, Michael Love
>> >>> >>> > <michaelisaiahlove at="" gmail.com="">
>> >>> >>> > wrote:
>> >>> >>> >>
>> >>> >>> >> hi Sophie,
>> >>> >>> >>
>> >>> >>> >>
>> >>> >>> >> On Mon, Apr 14, 2014 at 1:15 PM, Sophie Josephine Weiss
>> >>> >>> >> <sophie.weiss at="" colorado.edu=""> wrote:
>> >>> >>> >> >
>> >>> >>> >> > Hi Mike,
>> >>> >>> >> > Thanks for the references. By "threshold at 0" do you
mean set
>> >>> >>> >> > any
>> >>> >>> >> > negative values equal to 0?
>> >>> >>> >>
>> >>> >>> >>
>> >>> >>> >> yes.
>> >>> >>> >>
>> >>> >>> >>
>> >>> >>> >> >
>> >>> >>> >> > Do you think this is the best approach?
>> >>> >>> >>
>> >>> >>> >>
>> >>> >>> >> I haven't explored this area, and would defer to the
McMurdie and
>> >>> >>> >> Holmes paper for the best combinations of distance and
>> >>> >>> >> transformation.
>> >>> >>> >>
>> >>> >>> >>
>> >>> >>> >> >
>> >>> >>> >> > Thanks again,
>> >>> >>> >> > Sophie
>> >>> >>> >> >
>> >>> >>> >> >
>> >>> >>> >> > On Mon, Apr 14, 2014 at 11:01 AM, Michael Love
>> >>> >>> >> > <michaelisaiahlove at="" gmail.com=""> wrote:
>> >>> >>> >> >>
>> >>> >>> >> >> I tried poking around here
>> >>> >>> >> >>
http://joey711.github.io/phyloseq/distance
>> >>> >>> >> >> but couldn't see if the authors did anything for
distances
>> >>> >>> >> >> requiring
>> >>> >>> >> >> non-negative data. It appears
>> >>> >>> >> >>
>> >>> >>> >> >>
>> >>> >>> >> >>
http://www.ploscompbiol.org/article/info%3Adoi%2F10.1
371%2Fjournal.pcbi.1003531
>> >>> >>> >> >> that VST was tested with Bray-Curtis distance. I
think the
>> >>> >>> >> >> distance
>> >>> >>> >> >> is
>> >>> >>> >> >> designed for counts, but you could always threshold
at 0 to
>> >>> >>> >> >> insist
>> >>> >>> >> >> that the
>> >>> >>> >> >> log2-like quantity act more like a count.
>> >>> >>> >> >>
>> >>> >>> >> >>
>> >>> >>> >> >>
>> >>> >>> >> >> On Mon, Apr 14, 2014 at 12:23 PM, Sophie Josephine
Weiss
>> >>> >>> >> >> <sophie.weiss at="" colorado.edu=""> wrote:
>> >>> >>> >> >>>
>> >>> >>> >> >>> Hi Mike,
>> >>> >>> >> >>> Thanks for explaining more. I am used to working
with
>> >>> >>> >> >>> rarefied
>> >>> >>> >> >>> microbial datasets, that is why. Instead of
rarefying I would
>> >>> >>> >> >>> like to use
>> >>> >>> >> >>> the DESeq method.
>> >>> >>> >> >>>
>> >>> >>> >> >>> How would you then suggest going about calculating
bray-curtis
>> >>> >>> >> >>> distance, or summarized taxa diagrams with these new
>> >>> >>> >> >>> transformed
>> >>> >>> >> >>> matrices
>> >>> >>> >> >>> with negative values?
>> >>> >>> >> >>> Thanks again,
>> >>> >>> >> >>> Sophie
>> >>> >>> >> >>>
>> >>> >>> >> >>>
>> >>> >>> >> >>> On Mon, Apr 14, 2014 at 7:17 AM, Michael Love
>> >>> >>> >> >>> <michaelisaiahlove at="" gmail.com=""> wrote:
>> >>> >>> >> >>>>
>> >>> >>> >> >>>> hi Sophie,
>> >>> >>> >> >>>>
>> >>> >>> >> >>>> Can you explain why you don't want negative values
in the
>> >>> >>> >> >>>> transformed
>> >>> >>> >> >>>> values? Adding one to the raw counts is not
sufficient. I
>> >>> >>> >> >>>> should
>> >>> >>> >> >>>> have said
>> >>> >>> >> >>>> in my previous email, "the expected counts on the
common
>> >>> >>> >> >>>> scale".
>> >>> >>> >> >>>> If the
>> >>> >>> >> >>>> size factor for a sample is 2, then an expected
count of 1
>> >>> >>> >> >>>> leads
>> >>> >>> >> >>>> to an
>> >>> >>> >> >>>> expected count of 1/2 on the common scale (after
accounting
>> >>> >>> >> >>>> for
>> >>> >>> >> >>>> size
>> >>> >>> >> >>>> factors).
>> >>> >>> >> >>>>
>> >>> >>> >> >>>>
>> >>> >>> >> >>>> On Sun, Apr 13, 2014 at 11:50 PM, Sophie Josephine
Weiss
>> >>> >>> >> >>>> <sophie.weiss at="" colorado.edu=""> wrote:
>> >>> >>> >> >>>>>
>> >>> >>> >> >>>>> Hi Mike,
>> >>> >>> >> >>>>> Thanks for your reply! Ok, makes sense, but I
added 1 to
>> >>> >>> >> >>>>> all my
>> >>> >>> >> >>>>> matrix values, so the lowest value in the matrix
is 1 -
>> >>> >>> >> >>>>> there
>> >>> >>> >> >>>>> are still
>> >>> >>> >> >>>>> negatives?
>> >>> >>> >> >>>>> Thanks again,
>> >>> >>> >> >>>>> Sophie
>> >>> >>> >> >>>>>
>> >>> >>> >> >>>>>
>> >>> >>> >> >>>>> On Sun, Apr 13, 2014 at 9:01 PM, Michael Love
>> >>> >>> >> >>>>> <michaelisaiahlove at="" gmail.com=""> wrote:
>> >>> >>> >> >>>>>>
>> >>> >>> >> >>>>>> hi Sophie,
>> >>> >>> >> >>>>>>
>> >>> >>> >> >>>>>> The transformations in DESeq and DESeq2 are
log2-like
>> >>> >>> >> >>>>>> transformations. If the expected count is between
0 and 1,
>> >>> >>> >> >>>>>> the
>> >>> >>> >> >>>>>> values can be
>> >>> >>> >> >>>>>> negative, this does not indicate a problem.
>> >>> >>> >> >>>>>>
>> >>> >>> >> >>>>>> Mike
>> >>> >>> >> >>>>>>
>> >>> >>> >> >>>>>>
>> >>> >>> >> >>>>>> On Sun, Apr 13, 2014 at 5:17 PM, Sophie Josephine
Weiss
>> >>> >>> >> >>>>>> <sophie.weiss at="" colorado.edu=""> wrote:
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> Hello,
>> >>> >>> >> >>>>>>> I have microbiome data with no replicates, from
different
>> >>> >>> >> >>>>>>> conditions. I am
>> >>> >>> >> >>>>>>> trying to transform the data using the DESeq
method, as
>> >>> >>> >> >>>>>>> described
>> >>> >>> >> >>>>>>> in
>> >>> >>> >> >>>>>>> McMurdie and Holmes 2014.
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> The attached file is the definition I am using,
as per the
>> >>> >>> >> >>>>>>> supplemental
>> >>> >>> >> >>>>>>> info in McMurdie and Holmes 2014, and the .biom
file I am
>> >>> >>> >> >>>>>>> using.
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> Thank you for your help,
>> >>> >>> >> >>>>>>> Sophie
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> _______________________________________________
>> >>> >>> >> >>>>>>> Bioconductor mailing list
>> >>> >>> >> >>>>>>> Bioconductor at r-project.org
>> >>> >>> >> >>>>>>>
https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >>> >>> >> >>>>>>> Search the archives:
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >>> >>> >> >>>>>>
>> >>> >>> >> >>>>>>
>> >>> >>> >> >>>>>
>> >>> >>> >> >>>>
>> >>> >>> >> >>>
>> >>> >>> >> >>
>> >>> >>> >> >
>> >>> >>> >
>> >>> >>> >
>> >>> >>
>> >>> >>
>> >>> >
>> >>
>> >>
>> >
>
>