Interspecies differential expression of orthologs with Edger
1
0
Entering edit mode
assaf www ▴ 140
@assaf-www-6709
Last seen 5.4 years ago
Thanks Tim , quickly going over the article, it looks like a kind of cross-species-WGCNA ? very intriguing. Does Edger DE analysis is built on the assumption that most genes are not differentially expressed, and that only a small portion of them do (say <20%) ? I mean, in cross-species studies, or when comparing different tissues of the same organism, if this assumption doesn't hold, should it be a serious concern ? Assaf On Thu, Aug 28, 2014 at 7:42 PM, Tim Triche, Jr. <tim.triche at="" gmail.com=""> wrote: > This is super helpful. Just to be clear, the most robust solution is to > use edgeR and offset for putative gene length, TMM & library size while > using raw counts (not effective counts) estimated by e.g. RSEM, eXpress, or > the like? > > Also re: cross-species comparisons, while my experience is that it is > indeed a can of worms, Mark Gerstein's group recently published a method > that might interest others working on non-model or incompletely annotated > organisms: > > http://genomebiology.com/2014/15/8/R100 > > Any thoughts on applicability of the method for kooky experiments such as > comparing Drosophila hemocytes, zebrafish vascular endothelial progenitors > and the same in mice? Or for that matter, alligator differentiation. > > I never realized how hard RNAseq in non-model organisms was until I tried > it. > > > Statistics is the grammar of science. > Karl Pearson <http: en.wikipedia.org="" wiki="" the_grammar_of_science=""> > > > On Thu, Aug 28, 2014 at 8:37 AM, assaf www <assafwww at="" gmail.com=""> wrote: > >> I checked, it's true, the results look the same. >> As for FPKM, of course you are right. >> >> Thanks a lot >> Assaf >> >> >> >> On Thu, Aug 28, 2014 at 2:47 AM, Gordon K Smyth <smyth at="" wehi.edu.au=""> >> wrote: >> >> > The code should have been: >> > >> > offset <- expandAsMatrix(getOffset(y),dim(y)) >> > offset <- offset + gl >> > >> > This should give same result as your code. >> > >> > rpkm() corrects for gene length as well as library size -- that's the >> > whole purpose of RPKMs: >> > >> > rpkm(y, gene.length=geneLength) >> > >> > should work for you without any modification. >> > >> > Gordon >> > >> > >> > On Wed, 27 Aug 2014, assaf www wrote: >> > >> > This is very helpful for me, thanks. >> >> >> >> A slight change that I made in the code you sent, to avoid some R >> erros, >> >> is >> >> >> >> # to replace: >> >> offset = offset + gl >> >> # with: >> >> offset = sweep(gl,2,offset,"+") >> >> >> >> In addition to differential expression tests, I wanted also to extract >> >> FPKMs values (and/or normalized CPM values), that would take into >> account >> >> all components of the offset (which if I'm not mistaken are: >> library_size >> >> + >> >> TMM + gene_size). >> >> I assume rpkm()/cpm() should correct only for library_size + TMM. >> >> Is there a possibly "decent" solution for that ? >> >> >> >> all the best, and thanks, >> >> Assaf >> >> >> >> >> >> >> >> On Wed, Aug 27, 2014 at 4:45 AM, Gordon K Smyth <smyth at="" wehi.edu.au=""> >> >> wrote: >> >> >> >> It works something like this: >> >>> >> >>> library(edgeR) >> >>> y <- DGEList(counts=counts) >> >>> y <- calcNormFactors(y) >> >>> >> >>> # Column correct log gene lengths >> >>> # Columns of gl should add to zero >> >>> >> >>> gl <- log(geneLength) >> >>> gl <- t(t(gl)-colMeans(gl)) >> >>> >> >>> # Combine library sizes, norm factors and gene lengths: >> >>> >> >>> offset <- expandAsMatrix(getOffset(y)) >> >>> offset <- offset + gl >> >>> >> >>> Then >> >>> >> >>> y$offset <- offset >> >>> y <- estimateGLMCommonDisp(y,design) >> >>> >> >>> etc. >> >>> >> >>> Note that I have not tried this myself. It should work in principle >> from >> >>> a differential expression point of view. >> >>> >> >>> On the other hand, there may be side effects regarding dispersion >> trend >> >>> estimation -- I do not have time to explore this. >> >>> >> >>> Gordon >> >>> >> >>> --------------------------------------------- >> >>> Professor Gordon K Smyth, >> >>> Bioinformatics Division, >> >>> Walter and Eliza Hall Institute of Medical Research, >> >>> 1G Royal Parade, Parkville, Vic 3052, Australia. >> >>> http://www.statsci.org/smyth >> >>> >> >>> On Wed, 27 Aug 2014, assaf www wrote: >> >>> >> >>> Probably wrong, but the reason I thought of using quantile >> normalization >> >>> >> >>>> is >> >>>> the need to correct both for the species-length, and library size. >> >>>> >> >>>> >> >>>> On Wed, Aug 27, 2014 at 2:40 AM, Gordon K Smyth <smyth at="" wehi.edu.au=""> >> >>>> wrote: >> >>>> >> >>>> That doesn't look helpful to me. I suggested that you incorporate >> gene >> >>>> >> >>>>> lengths into the offsets, not do quantile normalization of cpms. >> >>>>> >> >>>>> Sorry, I just don't have time to develop a code example for you. I >> >>>>> hope >> >>>>> someone else will help. >> >>>>> >> >>>>> The whole topic of interspecies differential expression is a can of >> >>>>> worms. >> >>>>> Even if you adjust for gene length, there will still be differences >> in >> >>>>> gc >> >>>>> content and mappability between the species. >> >>>>> >> >>>>> Gordon >> >>>>> >> >>>>> >> >>>>> On Wed, 27 Aug 2014, assaf www wrote: >> >>>>> >> >>>>> Dear Gordon thanks, >> >>>>> >> >>>>> >> >>>>>> Suppose I start with the following matrices: >> >>>>>> >> >>>>>> # 'counts' is the Rsem filtered counts >> >>>>>> >> >>>>>> counts[1:4,] >> >>>>>> >> >>>>>>> >> >>>>>>> h0 h1 h2 n0 n1 n2 >> >>>>>>> >> >>>>>> ENSRNOG00000000021 36 17 20 10 25 38 >> >>>>>> ENSRNOG00000000024 1283 615 731 644 807 991 >> >>>>>> ENSRNOG00000000028 26 12 11 18 23 28 >> >>>>>> ENSRNOG00000000029 22 13 12 16 17 15 >> >>>>>> >> >>>>>> # 'geneLength' is the species-specific gene lengths, for species >> 'h' >> >>>>>> and >> >>>>>> 'n': >> >>>>>> >> >>>>>> geneLength[1:3,] >> >>>>>> >> >>>>>>> >> >>>>>>> h0.length h1.length h2.length n0.length >> n1.length >> >>>>>>> >> >>>>>> n2.length >> >>>>>> ENSRNOG00000000021 1200 1200 1200 1303 >> 1303 >> >>>>>> 1303 >> >>>>>> ENSRNOG00000000024 1050 1050 1050 3080 >> 3080 >> >>>>>> 3080 >> >>>>>> ENSRNOG00000000028 1047 1047 1047 1121 >> 1121 >> >>>>>> 1121 >> >>>>>> >> >>>>>> >> >>>>>> does the following code look correct, and may allow allows across >> >>>>>> species >> >>>>>> analysis ?: >> >>>>>> (technically it works, I checked) >> >>>>>> >> >>>>>> # quantile normalization: (as in here: >> >>>>>> http://davetang.org/wiki/tiki-index.php?page=Analysing+ >> >>>>>> digital+gene+expression >> >>>>>> ) >> >>>>>> >> >>>>>> x1 = counts*1000/geneLength >> >>>>>> library(limma) >> >>>>>> x2 = normalizeBetweenArrays(data.matrix(x1),method="quantile") >> >>>>>> offset = log(counts+0.1)-log(x2+0.1) >> >>>>>> >> >>>>>> ... >> >>>>>> >> >>>>>> y <- estimateGLMCommonDisp(y,design,offset=offset) >> >>>>>> y <- estimateGLMTrendedDisp(y,design,offset=offset) >> >>>>>> y <- estimateGLMTagwiseDisp(y,design,offset=offset) >> >>>>>> fit <- glmFit(y,design,offset=offset) >> >>>>>> >> >>>>>> ... >> >>>>>> >> >>>>>> >> >>>>>> Thanks in advance for any help.., >> >>>>>> Assaf >> >>>>>> >> >>>>>> >> >>>>> ____________________________________________________________ >> >>> __________ >> >>> The information in this email is confidential and intended solely for >> the >> >>> addressee. >> >>> You must not disclose, forward, print or use it without the >> permission of >> >>> the sender. >> >>> ______________________________________________________________________ >> >>> >> >>> >> >> >> > ______________________________________________________________________ >> > The information in this email is confidential and inte...{{dropped:10}} >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]
RNASeq Normalization Organism zebrafish edgeR • 1.3k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 7 hours ago
WEHI, Melbourne, Australia
On Tue, 2 Sep 2014, assaf www wrote: > Does Edger DE analysis is built on the assumption that most genes are > not differentially expressed, and that only a small portion of them do > (say <20%) ? Only the calcNormFactors() step of edgeR makes any assumption of this sort. calcNormFactors assumes that either that most genes are not DE or that the DE is reasonably symmetric. > I mean, in cross-species studies, or when comparing different tissues of > the same organism, if this assumption doesn't hold, should it be a > serious concern ? In a cross-species comparison there will be many DE genes, but some will be up and some will be down. The DE will not be all in one direction, I would guess that normalization will not be a serious concern. Of all the concerns with cross-species comparisons, this seems to me to be far from the most serious. Best wishes Gordon ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT
0
Entering edit mode
Thanks Gordon, To summarize the results I got on the cross-species data, after embedding the length-effect to the GLM offset matrix, as in the code you sent, please see the attached MA plot: 1) for >5 and <-5 log fold change, genes' logFC is positively correlated with mean log CPM, something I haven?t seen before in Edger standard runs. 2) most genes with fold change around > 1.3, or < -1.3, are significant, which looks to me too ?liberal?. Please note that each group contains 6 true biological replicates (variance within each group is large) . The first problem worries me most, any idea is very welcomed. Many thanks, Assaf On Wed, Sep 3, 2014 at 2:08 AM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > > On Tue, 2 Sep 2014, assaf www wrote: > > Does Edger DE analysis is built on the assumption that most genes are not >> differentially expressed, and that only a small portion of them do (say >> <20%) ? >> > > Only the calcNormFactors() step of edgeR makes any assumption of this > sort. calcNormFactors assumes that either that most genes are not DE or > that the DE is reasonably symmetric. > > I mean, in cross-species studies, or when comparing different tissues of >> the same organism, if this assumption doesn't hold, should it be a serious >> concern ? >> > > In a cross-species comparison there will be many DE genes, but some will > be up and some will be down. The DE will not be all in one direction, I > would guess that normalization will not be a serious concern. > > Of all the concerns with cross-species comparisons, this seems to me to be > far from the most serious. > > Best wishes > Gordon > > ______________________________________________________________________ > The information in this email is confidential and intended solely for the > addressee. > You must not disclose, forward, print or use it without the permission of > the sender. > ______________________________________________________________________ > -------------- next part -------------- A non-text attachment was scrubbed... Name: crossspecies.png Type: image/png Size: 65085 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20140905="" c599392b="" attachment.png="">
ADD REPLY

Login before adding your answer.

Traffic: 708 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6