Is there any prejudice whether to use edgeR or DESeq for differential expression analysis for RNA Seq data
1
0
Entering edit mode
@sakshi-gulati-5596
Last seen 10.3 years ago
Hi I am unsure as to if there is any particular condition that is the deciding factor between whether to use edgeR or DESeq packages for differential expression analysis for RNA Seq data. For example, does it depend upon how the counts were normalized? Thanks Sakshi Sakshi Gulati PhD Student Biomolecular Modelling Laboratory Cancer Research UK London Research Institute 44 Lincoln's Inn Fields London WC2A 3LY NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for ...{{dropped:19}}
edgeR DESeq edgeR DESeq • 4.5k views
ADD COMMENT
0
Entering edit mode
Mark Robinson ▴ 880
@mark-robinson-4908
Last seen 6.1 years ago
Hi Sakshi, The two packages are indeed fairly similar. They differ in their: i) look-and-feel -- overall the pipelines are quite similar, but things like specifying arbitrary contrasts, offsets, packaging the output statistics, etc. are, IMHO, easier in edgeR. ii) standard normalization (edgeR - TMM; DESeq - what I call "RLE", which is also implemented in edgeR's calcNormFactors) ? these are actually very similar anyways in the situations where I've tested it. iii) dispersion estimation (edgeR default - moderate to trend; DESeq default - take maximum of individual or trend). My impression is that this makes DESeq (slightly?) less powerful and edgeR (slightly?) sensitive to outliers. > I am unsure as to if there is any particular condition that is the deciding factor between whether to use edgeR or DESeq packages for differential expression analysis for RNA Seq data. I prefer edgeR, but there is some pretty strong prejudice behind that :) > For example, does it depend upon how the counts were normalized? I don't understand this question, since both packages expect un- normalized counts. Best, Mark On 07.11.2012, at 13:01, Sakshi Gulati wrote: > Hi > > I am unsure as to if there is any particular condition that is the deciding factor between whether to use edgeR or DESeq packages for differential expression analysis for RNA Seq data. For example, does it depend upon how the counts were normalized? > > Thanks > Sakshi > > > Sakshi Gulati > PhD Student > Biomolecular Modelling Laboratory > Cancer Research UK London Research Institute > 44 Lincoln's Inn Fields > London WC2A 3LY > > > NOTICE AND DISCLAIMER > This e-mail (including any attachments) is intended for ...{{dropped:19}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Hi Mark, Thanks for answering. It makes more sense now. I have upper quartile normalised RSEM counts per gene. Is that ok as an input for edgeR and/or DESeq? Thanks Sakshi -----Original Message----- From: Mark Robinson [mailto:mark.robinson@imls.uzh.ch] Sent: 07 November 2012 13:18 To: Sakshi Gulati Cc: bioconductor at r-project.org Subject: Re: [BioC] Is there any prejudice whether to use edgeR or DESeq for differential expression analysis for RNA Seq data Hi Sakshi, The two packages are indeed fairly similar. They differ in their: i) look-and-feel -- overall the pipelines are quite similar, but things like specifying arbitrary contrasts, offsets, packaging the output statistics, etc. are, IMHO, easier in edgeR. ii) standard normalization (edgeR - TMM; DESeq - what I call "RLE", which is also implemented in edgeR's calcNormFactors) ... these are actually very similar anyways in the situations where I've tested it. iii) dispersion estimation (edgeR default - moderate to trend; DESeq default - take maximum of individual or trend). My impression is that this makes DESeq (slightly?) less powerful and edgeR (slightly?) sensitive to outliers. > I am unsure as to if there is any particular condition that is the deciding factor between whether to use edgeR or DESeq packages for differential expression analysis for RNA Seq data. I prefer edgeR, but there is some pretty strong prejudice behind that :) > For example, does it depend upon how the counts were normalized? I don't understand this question, since both packages expect un- normalized counts. Best, Mark On 07.11.2012, at 13:01, Sakshi Gulati wrote: > Hi > > I am unsure as to if there is any particular condition that is the deciding factor between whether to use edgeR or DESeq packages for differential expression analysis for RNA Seq data. For example, does it depend upon how the counts were normalized? > > Thanks > Sakshi > > > Sakshi Gulati > PhD Student > Biomolecular Modelling Laboratory > Cancer Research UK London Research Institute > 44 Lincoln's Inn Fields > London WC2A 3LY > > > NOTICE AND DISCLAIMER > This e-mail (including any attachments) is intended for ...{{dropped:19}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for ...{{dropped:17}}
ADD REPLY
0
Entering edit mode
Hi, On Wed, Nov 7, 2012 at 9:42 AM, Sakshi Gulati <sakshi.gulati at="" cancer.org.uk=""> wrote: > Hi Mark, > > Thanks for answering. It makes more sense now. I have upper quartile normalised RSEM counts per gene. Is that ok as an input for edgeR and/or DESeq? My guess is that they are not fine. Not familiar with RSEM, but if these are actually *counts* (first hint that they are not counts if they are not integers) then you are ok. If these are something like (R|F)PKM, then you're not -- ditto if you are already inputting numbers that have been previously scaled to library size. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD REPLY
0
Entering edit mode
RSEM counts are indeed counts (not a normalized/scaled FPKM), but for the relatively small subset of reads that are ambiguously mapped ("multi-mapped"), then the count for that read gets broken up across the (probabilistically-weighted) possibilities. So the final counts are non-integer. We (and others) round these values to integers to usually good effect, since their magnitude remains consistent with the philosophically "true" count (i.e. the mean/dispersion relationship remains similar to what it would have been had we just ignored the multi- mapped reads and just used true counts). -Aaron On Wed, Nov 7, 2012 at 9:58 AM, Steve Lianoglou < mailinglist.honeypot@gmail.com> wrote: > Hi, > > On Wed, Nov 7, 2012 at 9:42 AM, Sakshi Gulati > <sakshi.gulati@cancer.org.uk> wrote: > > Hi Mark, > > > > Thanks for answering. It makes more sense now. I have upper quartile > normalised RSEM counts per gene. Is that ok as an input for edgeR and/or > DESeq? > > My guess is that they are not fine. > > Not familiar with RSEM, but if these are actually *counts* (first hint > that they are not counts if they are not integers) then you are ok. If > these are something like (R|F)PKM, then you're not -- ditto if you are > already inputting numbers that have been previously scaled to library > size. > > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Aaron, Yes, I had indeed rounded them up to avoid the non-integer issue as well. But, as the others pointed out, do you think it is an issue for me to use them in packages like edgeR or DESeq? Thanks Sakshi From: ajmackey@gmail.com [mailto:ajmackey@gmail.com] On Behalf Of Aaron Mackey Sent: 07 November 2012 16:07 To: Steve Lianoglou Cc: Sakshi Gulati; bioconductor@r-project.org Subject: Re: [BioC] Is there any prejudice whether to use edgeR or DESeq for differential expression analysis for RNA Seq data RSEM counts are indeed counts (not a normalized/scaled FPKM), but for the relatively small subset of reads that are ambiguously mapped ("multi-mapped"), then the count for that read gets broken up across the (probabilistically-weighted) possibilities. So the final counts are non-integer. We (and others) round these values to integers to usually good effect, since their magnitude remains consistent with the philosophically "true" count (i.e. the mean/dispersion relationship remains similar to what it would have been had we just ignored the multi-mapped reads and just used true counts). -Aaron On Wed, Nov 7, 2012 at 9:58 AM, Steve Lianoglou <mailinglist.honeypot@ gmail.com<mailto:mailinglist.honeypot@gmail.com="">> wrote: Hi, On Wed, Nov 7, 2012 at 9:42 AM, Sakshi Gulati <sakshi.gulati@cancer.org.uk<mailto:sakshi.gulati@cancer.org.uk>> wrote: > Hi Mark, > > Thanks for answering. It makes more sense now. I have upper quartile normalised RSEM counts per gene. Is that ok as an input for edgeR and/or DESeq? My guess is that they are not fine. Not familiar with RSEM, but if these are actually *counts* (first hint that they are not counts if they are not integers) then you are ok. If these are something like (R|F)PKM, then you're not -- ditto if you are already inputting numbers that have been previously scaled to library size. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for ...{{dropped:19}}
ADD REPLY
0
Entering edit mode
Hi all, There are some information about performing diff. exp. analysis after aligning with RSEM on the RSEM webpage: http://deweylab.biostat.wisc.edu/rsem/README.html#de. Using EBSeq, a (non yet Bioc as far as I can tell) R package. Depending on how you ran RSEM - see there for RSEM output details: http://deweylab.biostat.wisc.edu/rsem/rsem-calculate-expression.html - you'll get TPM or FPKM values that are not suitable for being used with edgeR/DESeq. But an interesting proxy for counts might be the "expected_count" column of the result file, i.e. count corrected for multiple mapping. Comparing the outcome of edgeR/DESeq using these and the result you'd obtain from EBSeq is certainly worth a try. HTH, Nico --------------------------------------------------------------- Nicolas Delhomme Nathaniel Street Lab Department of Plant Physiology Ume? Plant Science Center Tel: +46 90 786 7989 Email: nicolas.delhomme at plantphys.umu.se SLU - Ume? universitet Ume? S-901 87 Sweden --------------------------------------------------------------- On Nov 7, 2012, at 3:58 PM, Steve Lianoglou wrote: > Hi, > > On Wed, Nov 7, 2012 at 9:42 AM, Sakshi Gulati > <sakshi.gulati at="" cancer.org.uk=""> wrote: >> Hi Mark, >> >> Thanks for answering. It makes more sense now. I have upper quartile normalised RSEM counts per gene. Is that ok as an input for edgeR and/or DESeq? > > My guess is that they are not fine. > > Not familiar with RSEM, but if these are actually *counts* (first hint > that they are not counts if they are not integers) then you are ok. If > these are something like (R|F)PKM, then you're not -- ditto if you are > already inputting numbers that have been previously scaled to library > size. > > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 757 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6