Entering edit mode
assaf www
▴
140
@assaf-www-6709
Last seen 5.4 years ago
Thanks Tim , quickly going over the article, it looks like a kind of
cross-species-WGCNA ? very intriguing.
Does Edger DE analysis is built on the assumption that most genes are
not
differentially expressed,
and that only a small portion of them do (say <20%) ?
I mean, in cross-species studies, or when comparing different tissues
of
the same organism, if this assumption doesn't hold, should it be a
serious
concern ?
Assaf
On Thu, Aug 28, 2014 at 7:42 PM, Tim Triche, Jr. <tim.triche at="" gmail.com="">
wrote:
> This is super helpful. Just to be clear, the most robust solution
is to
> use edgeR and offset for putative gene length, TMM & library size
while
> using raw counts (not effective counts) estimated by e.g. RSEM,
eXpress, or
> the like?
>
> Also re: cross-species comparisons, while my experience is that it
is
> indeed a can of worms, Mark Gerstein's group recently published a
method
> that might interest others working on non-model or incompletely
annotated
> organisms:
>
> http://genomebiology.com/2014/15/8/R100
>
> Any thoughts on applicability of the method for kooky experiments
such as
> comparing Drosophila hemocytes, zebrafish vascular endothelial
progenitors
> and the same in mice? Or for that matter, alligator
differentiation.
>
> I never realized how hard RNAseq in non-model organisms was until I
tried
> it.
>
>
> Statistics is the grammar of science.
> Karl Pearson <http: en.wikipedia.org="" wiki="" the_grammar_of_science="">
>
>
> On Thu, Aug 28, 2014 at 8:37 AM, assaf www <assafwww at="" gmail.com="">
wrote:
>
>> I checked, it's true, the results look the same.
>> As for FPKM, of course you are right.
>>
>> Thanks a lot
>> Assaf
>>
>>
>>
>> On Thu, Aug 28, 2014 at 2:47 AM, Gordon K Smyth <smyth at="" wehi.edu.au="">
>> wrote:
>>
>> > The code should have been:
>> >
>> > offset <- expandAsMatrix(getOffset(y),dim(y))
>> > offset <- offset + gl
>> >
>> > This should give same result as your code.
>> >
>> > rpkm() corrects for gene length as well as library size -- that's
the
>> > whole purpose of RPKMs:
>> >
>> > rpkm(y, gene.length=geneLength)
>> >
>> > should work for you without any modification.
>> >
>> > Gordon
>> >
>> >
>> > On Wed, 27 Aug 2014, assaf www wrote:
>> >
>> > This is very helpful for me, thanks.
>> >>
>> >> A slight change that I made in the code you sent, to avoid some
R
>> erros,
>> >> is
>> >>
>> >> # to replace:
>> >> offset = offset + gl
>> >> # with:
>> >> offset = sweep(gl,2,offset,"+")
>> >>
>> >> In addition to differential expression tests, I wanted also to
extract
>> >> FPKMs values (and/or normalized CPM values), that would take
into
>> account
>> >> all components of the offset (which if I'm not mistaken are:
>> library_size
>> >> +
>> >> TMM + gene_size).
>> >> I assume rpkm()/cpm() should correct only for library_size +
TMM.
>> >> Is there a possibly "decent" solution for that ?
>> >>
>> >> all the best, and thanks,
>> >> Assaf
>> >>
>> >>
>> >>
>> >> On Wed, Aug 27, 2014 at 4:45 AM, Gordon K Smyth <smyth at="" wehi.edu.au="">
>> >> wrote:
>> >>
>> >> It works something like this:
>> >>>
>> >>> library(edgeR)
>> >>> y <- DGEList(counts=counts)
>> >>> y <- calcNormFactors(y)
>> >>>
>> >>> # Column correct log gene lengths
>> >>> # Columns of gl should add to zero
>> >>>
>> >>> gl <- log(geneLength)
>> >>> gl <- t(t(gl)-colMeans(gl))
>> >>>
>> >>> # Combine library sizes, norm factors and gene lengths:
>> >>>
>> >>> offset <- expandAsMatrix(getOffset(y))
>> >>> offset <- offset + gl
>> >>>
>> >>> Then
>> >>>
>> >>> y$offset <- offset
>> >>> y <- estimateGLMCommonDisp(y,design)
>> >>>
>> >>> etc.
>> >>>
>> >>> Note that I have not tried this myself. It should work in
principle
>> from
>> >>> a differential expression point of view.
>> >>>
>> >>> On the other hand, there may be side effects regarding
dispersion
>> trend
>> >>> estimation -- I do not have time to explore this.
>> >>>
>> >>> Gordon
>> >>>
>> >>> ---------------------------------------------
>> >>> Professor Gordon K Smyth,
>> >>> Bioinformatics Division,
>> >>> Walter and Eliza Hall Institute of Medical Research,
>> >>> 1G Royal Parade, Parkville, Vic 3052, Australia.
>> >>> http://www.statsci.org/smyth
>> >>>
>> >>> On Wed, 27 Aug 2014, assaf www wrote:
>> >>>
>> >>> Probably wrong, but the reason I thought of using quantile
>> normalization
>> >>>
>> >>>> is
>> >>>> the need to correct both for the species-length, and library
size.
>> >>>>
>> >>>>
>> >>>> On Wed, Aug 27, 2014 at 2:40 AM, Gordon K Smyth <smyth at="" wehi.edu.au="">
>> >>>> wrote:
>> >>>>
>> >>>> That doesn't look helpful to me. I suggested that you
incorporate
>> gene
>> >>>>
>> >>>>> lengths into the offsets, not do quantile normalization of
cpms.
>> >>>>>
>> >>>>> Sorry, I just don't have time to develop a code example for
you. I
>> >>>>> hope
>> >>>>> someone else will help.
>> >>>>>
>> >>>>> The whole topic of interspecies differential expression is a
can of
>> >>>>> worms.
>> >>>>> Even if you adjust for gene length, there will still be
differences
>> in
>> >>>>> gc
>> >>>>> content and mappability between the species.
>> >>>>>
>> >>>>> Gordon
>> >>>>>
>> >>>>>
>> >>>>> On Wed, 27 Aug 2014, assaf www wrote:
>> >>>>>
>> >>>>> Dear Gordon thanks,
>> >>>>>
>> >>>>>
>> >>>>>> Suppose I start with the following matrices:
>> >>>>>>
>> >>>>>> # 'counts' is the Rsem filtered counts
>> >>>>>>
>> >>>>>> counts[1:4,]
>> >>>>>>
>> >>>>>>>
>> >>>>>>> h0 h1 h2 n0 n1 n2
>> >>>>>>>
>> >>>>>> ENSRNOG00000000021 36 17 20 10 25 38
>> >>>>>> ENSRNOG00000000024 1283 615 731 644 807 991
>> >>>>>> ENSRNOG00000000028 26 12 11 18 23 28
>> >>>>>> ENSRNOG00000000029 22 13 12 16 17 15
>> >>>>>>
>> >>>>>> # 'geneLength' is the species-specific gene lengths, for
species
>> 'h'
>> >>>>>> and
>> >>>>>> 'n':
>> >>>>>>
>> >>>>>> geneLength[1:3,]
>> >>>>>>
>> >>>>>>>
>> >>>>>>> h0.length h1.length h2.length n0.length
>> n1.length
>> >>>>>>>
>> >>>>>> n2.length
>> >>>>>> ENSRNOG00000000021 1200 1200 1200 1303
>> 1303
>> >>>>>> 1303
>> >>>>>> ENSRNOG00000000024 1050 1050 1050 3080
>> 3080
>> >>>>>> 3080
>> >>>>>> ENSRNOG00000000028 1047 1047 1047 1121
>> 1121
>> >>>>>> 1121
>> >>>>>>
>> >>>>>>
>> >>>>>> does the following code look correct, and may allow allows
across
>> >>>>>> species
>> >>>>>> analysis ?:
>> >>>>>> (technically it works, I checked)
>> >>>>>>
>> >>>>>> # quantile normalization: (as in here:
>> >>>>>> http://davetang.org/wiki/tiki-index.php?page=Analysing+
>> >>>>>> digital+gene+expression
>> >>>>>> )
>> >>>>>>
>> >>>>>> x1 = counts*1000/geneLength
>> >>>>>> library(limma)
>> >>>>>> x2 =
normalizeBetweenArrays(data.matrix(x1),method="quantile")
>> >>>>>> offset = log(counts+0.1)-log(x2+0.1)
>> >>>>>>
>> >>>>>> ...
>> >>>>>>
>> >>>>>> y <- estimateGLMCommonDisp(y,design,offset=offset)
>> >>>>>> y <- estimateGLMTrendedDisp(y,design,offset=offset)
>> >>>>>> y <- estimateGLMTagwiseDisp(y,design,offset=offset)
>> >>>>>> fit <- glmFit(y,design,offset=offset)
>> >>>>>>
>> >>>>>> ...
>> >>>>>>
>> >>>>>>
>> >>>>>> Thanks in advance for any help..,
>> >>>>>> Assaf
>> >>>>>>
>> >>>>>>
>> >>>>> ____________________________________________________________
>> >>> __________
>> >>> The information in this email is confidential and intended
solely for
>> the
>> >>> addressee.
>> >>> You must not disclose, forward, print or use it without the
>> permission of
>> >>> the sender.
>> >>>
______________________________________________________________________
>> >>>
>> >>>
>> >>
>> >
______________________________________________________________________
>> > The information in this email is confidential and
inte...{{dropped:10}}
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
[[alternative HTML version deleted]]