Heatmaps for EdgeR

0

Entering edit mode

Eleanor Su ▴ 110

@eleanor-su-6460

Last seen 10.1 years ago

Hi there, I'm an amateur edgeR user, and I'm having trouble generating a heat map for the differentially expressed genes. All examples that I've looked at requires that I normalize the counts but I've already normalized them prior to doing analysis in R. I'm running a glm with blocking and have generated my topTags. From here, I'm not sure how to generate a heatmap. Could you offer any advice or suggestions? Best, Eleanor Su M.S. Candidate Department of Biology University of Nevada, Reno Reno, Nevada 89577 775-742-4391 [[alternative HTML version deleted]]

edgeR edgeR • 6.1k views

ADD COMMENT • link updated 10.5 years ago by Steve Lianoglou ★ 13k • written 10.6 years ago by Eleanor Su ▴ 110

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 19 months ago

United States

Hi Eleanor, On Thu, Mar 20, 2014 at 4:36 PM, Eleanor Su <eleanorjinsu at="" gmail.com=""> wrote: > Hi there, > > I'm an amateur edgeR user, and I'm having trouble generating a heat map for > the differentially expressed genes. All examples that I've looked at > requires that I normalize the counts but I've already normalized them prior > to doing analysis in R. Can you explain what you mean with that a bit more. You shouldn't be doing any normalization of your actual counts prior to feeding them to edgeR, are you? > I'm running a glm with blocking and have generated > my topTags. From here, I'm not sure how to generate a heatmap. Could you > offer any advice or suggestions? Look at section 2.10 of the edgeR User's Guide (Clustering, heatmaps, etc.) where the authors identify this to still be a matter of research, but they suggest to use "moderated log-counts-per-million" HTH, -steve -- Steve Lianoglou Computational Biologist Genentech

ADD COMMENT • link 10.5 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 19 months ago

United States

Hi Eleanor, Please CC (use "reply-all") the bioconductor mailing list on all correspondences so that everyone can help (and benefit) from this discussion. Comments in line: On 21 Mar 2014, at 11:02, Eleanor Su wrote: > Can you explain what you mean with that a bit more. You shouldn't be > doing any normalization of your actual counts prior to feeding them to > edgeR, are you? > > I'm only working with small non-coding RNAs of a non-model organism. > Since > this is a fairly new kind of analysis, I'm following someone else's > pipeline. Thus I've normalized my samples prior doing analysis in R. > I've > normalize all my counts based on the reads generated. What I mean is that you shouldn't do that :-) Have you read through the edgeR User's Guide? The `calcNormFactors` does the step that it sounds like you are doing before analysis -- but it also keeps the count data "in tact" which is what you want. I guess you are dividing your counts by some normalization constant prior to edgeR analysis, which is a big no-no. The (expression) input to edgeR should be the raw count matrix of features x samples -- many people choose to use only uniquely mapping reads for this purpose, so probably a good idea for you to ensure that is the case (at least for your first analysis). >> Look at section 2.10 of the edgeR User's Guide (Clustering, heatmaps, >> etc.) where the authors identify this to still be a matter of >> research, but they suggest to use "moderated log-counts-per- million" > > I've generated a heatmap already using this script, but I only want a > heatmap of the significant differentially expressed sequences. What script? > When I > generate the heatmap accordingly to the section 2.10, I end up with a > heatmap that I can't even read because it's plotting all the > sequences. > Would you suggest just generating a new file with only significant > sequences and then generating a heatmap accordingly to section 2.10? When you call the `heatmap` function (or whatever function you are using to generate these things (the aheatmap function from the NMF package is quite nice, btw)), you should only pass it a matrix that consists of the rows you want to plot. You do not have to generate an intermediary new file to do this. Don't take this the wrong way, but it sounds like you are quite new to not just this analysis, but to R as a whole since indexing things (vectors, lists, matrices) is something very basic that you need to master before being conversant with the language. If this is the case, I'd strongly recommend you spend some time reading up on introductory R stuff (R comes with "an introduction to R") for some time before trying to do something any more advanced. Ensuring that you do so will not only mitigate the chances of you shooting yourself in the foot by doing something silly, but it will also allow you to get better (and more considered) help here since you will be able to ask the type of questions that will leverage the expertise from the people subscribed to this list. For instance, if you have questions regarding fundamental "R programming" type of things (indexing a matrix, for example), you should direct those to R-help, which you can subscribe to here: https://stat.ethz.ch/mailman/listinfo/r-help HTH, -steve -- Steve Lianoglou Computational Biologist Genentech

ADD COMMENT • link 10.5 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Hi Steve, Don't take this the wrong way, but it sounds like you are quite new to not just this analysis, but to R as a whole since indexing things (vectors, lists, matrices) is something very basic that you need to master before being conversant with the language. Indeed, I have limited knowledge in using R and edgeR. Thanks for the suggestion to contacting R-help for these questions. Unfortunately my graduate program offers very little help with R statistics and even fewer with bioinformatics especially that of small RNAs. With these limited resources, I feel like I'm working in the dark and my analysis, to say the least, is cryptic. I'll take a step back before I jump the gun with the analysis. Thanks for the insight. Best, Eleanor On Fri, Mar 21, 2014 at 11:19 AM, Steve Lianoglou <lianoglou.steve@gene.com>wrote: > Hi Eleanor, > > Please CC (use "reply-all") the bioconductor mailing list on all > correspondences so that everyone can help (and benefit) from this > discussion. > > Comments in line: > > > On 21 Mar 2014, at 11:02, Eleanor Su wrote: > > Can you explain what you mean with that a bit more. You shouldn't be >> doing any normalization of your actual counts prior to feeding them to >> edgeR, are you? >> >> I'm only working with small non-coding RNAs of a non-model organism. Since >> this is a fairly new kind of analysis, I'm following someone else's >> pipeline. Thus I've normalized my samples prior doing analysis in R. I've >> normalize all my counts based on the reads generated. >> > > What I mean is that you shouldn't do that :-) > > Have you read through the edgeR User's Guide? The `calcNormFactors` does > the step that it sounds like you are doing before analysis -- but it also > keeps the count data "in tact" which is what you want. I guess you are > dividing your counts by some normalization constant prior to edgeR > analysis, which is a big no-no. > > The (expression) input to edgeR should be the raw count matrix of features > x samples -- many people choose to use only uniquely mapping reads for this > purpose, so probably a good idea for you to ensure that is the case (at > least for your first analysis). > > > Look at section 2.10 of the edgeR User's Guide (Clustering, heatmaps, >>> etc.) where the authors identify this to still be a matter of >>> research, but they suggest to use "moderated log-counts-per- million" >>> >> >> I've generated a heatmap already using this script, but I only want a >> heatmap of the significant differentially expressed sequences. >> > > What script? > > > When I >> generate the heatmap accordingly to the section 2.10, I end up with a >> heatmap that I can't even read because it's plotting all the sequences. >> Would you suggest just generating a new file with only significant >> sequences and then generating a heatmap accordingly to section 2.10? >> > > When you call the `heatmap` function (or whatever function you are using > to generate these things (the aheatmap function from the NMF package is > quite nice, btw)), you should only pass it a matrix that consists of the > rows you want to plot. > > You do not have to generate an intermediary new file to do this. > > Don't take this the wrong way, but it sounds like you are quite new to not > just this analysis, but to R as a whole since indexing things (vectors, > lists, matrices) is something very basic that you need to master before > being conversant with the language. > > If this is the case, I'd strongly recommend you spend some time reading up > on introductory R stuff (R comes with "an introduction to R") for some time > before trying to do something any more advanced. > > Ensuring that you do so will not only mitigate the chances of you shooting > yourself in the foot by doing something silly, but it will also allow you > to get better (and more considered) help here since you will be able to ask > the type of questions that will leverage the expertise from the people > subscribed to this list. > > For instance, if you have questions regarding fundamental "R programming" > type of things (indexing a matrix, for example), you should direct those to > R-help, which you can subscribe to here: > > https://stat.ethz.ch/mailman/listinfo/r-help > > > HTH, > -steve > > -- > Steve Lianoglou > Computational Biologist > Genentech > [[alternative HTML version deleted]]

ADD REPLY • link 10.5 years ago Eleanor Su ▴ 110

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 19 months ago

United States

Hi, On 21 Mar 2014, at 11:39, Eleanor Su wrote: > Hi Steve, > >> Don't take this the wrong way, but it sounds like you are quite new >> to not >> just this analysis, but to R as a whole since indexing things >> (vectors, >> lists, matrices) is something very basic that you need to master >> before >> being conversant with the language. > > Indeed, I have limited knowledge in using R and edgeR. Thanks for the > suggestion to contacting R-help for these questions. Unfortunately my > graduate program offers very little help with R statistics and even > fewer > with bioinformatics especially that of small RNAs. With these limited > resources, I feel like I'm working in the dark and my analysis, to say > the > least, is cryptic. I'll take a step back before I jump the gun with > the > analysis. Thanks for the insight. This exact issue has been making its rounds on the internet due to this recent blogpost: http://biomickwatson.wordpress.com/2014/03/20/is-this-a-realistic- portrait-of-a-modern-studentpost-doc-in-biology/ So you are not alone ... but rest assured that many of us are here to help (and happy to do so ;-) Your analysis is on the right track. You should follow along with the examples in the edgeR (or even the limma (for limma::voom and its extensive linear modeling material)) user's guide(s) to get an idea of how to setup analyses for differential expression. Both of these manuals are very thorough and great to just digest and understand (be sure to read the relevant primary publications, as well). You should also take a look at the DESeq2 vignette, as similar material is presented there and perhaps this (third) treatment of the material might help it all to click. The fact that you are working with small RNAs doesn't change the picture *too much* for the "simple" differential expression stage of the game (putting mapping issues aside, for small molecules). Lastly, and this is important, you are also fortunate to be "in training" during the era of MOOCs. Coursera has a data analysis "track" that covers many things that will be relevant to you: https://www.coursera.org/specialization/jhudatascience/1 (and other courses of interest): https://www.coursera.org/jhu And ESPECIALLY take note of this class that is starting shortly: Data Analysis for Genomics https://www.edx.org/course/harvardx/harvardx-ph525x-data-analysis- genomics-1401 Don't miss it! The material is exactly the type of stuff that you need to know, and as a special treat, is taught by top-notch instructors. I'm planning to audit the class, and I (should ;-) know most of this stuff already! HTH, -steve -- Steve Lianoglou Computational Biologist Genentech

ADD COMMENT • link 10.5 years ago Steve Lianoglou ★ 13k

Login before adding your answer.