edgeR: design matrix for different condition

0

Entering edit mode

KJ Lim ▴ 420

@kj-lim-5288

Last seen 4.6 years ago

Finland

Dear the edgeR community, Good day. I'm analyzing my RNA-Seq experiment with edgeR. My study has 2 different genotypes (treeHS, treeLS) and time of treatment (0H, 3H, 24H, 96H). I first assumed that the treatment effect will be the same for each genotype and I have the design matrix as: design <- model.matrix(~tree+treat) > design (Intercept) treat03H treat24H treat96H treeLS 1 1 0 0 0 0 2 1 0 0 0 0 3 1 1 0 0 0 4 1 1 0 0 0 5 1 0 1 0 0 6 1 0 1 0 0 7 1 0 0 1 0 8 1 0 0 1 0 9 1 0 0 0 1 10 1 0 0 0 1 11 1 1 0 0 1 12 1 1 0 0 1 13 1 0 1 0 1 14 1 0 1 0 1 15 1 0 0 1 1 16 1 0 0 1 1 I used coef=4 to test for the differential expressions between treeHS and treeLS within the time of treatment, coef=3 to learn the differential expressions in 2 genotypes at time of treatment 96H. ** I would like to learn the genes are express in treeHS but not treeLS and vice versa at certain time of treatment let's say 24H or across the whole time of treatment, should I have the design matrix as below or more steps I need to do? design <- model.matrix(~tree*treat) Kindly please light me on this. I appreciate very much for your help and time. Have a nice day. Best regards, KJ Lim

edgeR edgeR • 2.6k views

ADD COMMENT • link updated 12.7 years ago by Gordon Smyth 52k • written 12.7 years ago by KJ Lim ▴ 420

0

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 3 hours ago

WEHI, Melbourne, Australia

Dear KJ Lim, I don't quite understand your question, because you seem to be asking for something that isn't a test for differential expression, which is what edgeR does. You question "I would like to learn the genes are express in treeHS but not treeLS and vice versa" seems to be about absolute expression levels. edgeR doesn't test for genes that are not expressed in particular conditions. I'll refer you to the limma section on interaction models in case that helps, see Section 8.5 of: http://bioconductor.org/packages/2.11/bioc/vignettes/limma/inst/doc/us ersguide.pdf Setting up a design matrix is the same for edgeR as it is for limma. Best wishes Gordon > Date: Tue, 24 Jul 2012 17:12:00 +0300 > From: KJ Lim <jinkeanlim at="" gmail.com=""> > To: Bioconductor mailing list <bioconductor at="" r-project.org=""> > Subject: [BioC] edgeR: design matrix for different condition > > Dear the edgeR community, > > Good day. > > I'm analyzing my RNA-Seq experiment with edgeR. My study has 2 > different genotypes (treeHS, treeLS) and time of treatment (0H, 3H, > 24H, 96H). > > I first assumed that the treatment effect will be the same for each > genotype and I have the design matrix as: > > design <- model.matrix(~tree+treat) > >> design > (Intercept) treat03H treat24H treat96H treeLS > 1 1 0 0 0 0 > 2 1 0 0 0 0 > 3 1 1 0 0 0 > 4 1 1 0 0 0 > 5 1 0 1 0 0 > 6 1 0 1 0 0 > 7 1 0 0 1 0 > 8 1 0 0 1 0 > 9 1 0 0 0 1 > 10 1 0 0 0 1 > 11 1 1 0 0 1 > 12 1 1 0 0 1 > 13 1 0 1 0 1 > 14 1 0 1 0 1 > 15 1 0 0 1 1 > 16 1 0 0 1 1 > > I used coef=4 to test for the differential expressions between treeHS > and treeLS within the time of treatment, coef=3 to learn the > differential expressions in 2 genotypes at time of treatment 96H. > > ** I would like to learn the genes are express in treeHS but not > treeLS and vice versa at certain time of treatment let's say 24H or > across the whole time of treatment, should I have the design matrix as > below or more steps I need to do? > > design <- model.matrix(~tree*treat) > > Kindly please light me on this. I appreciate very much for your help and time. > > Have a nice day. > > Best regards, > KJ Lim > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 12.7 years ago Gordon Smyth 52k

0

Entering edit mode

Dear Prof Gordon, Good day. Thanks for your prompt replied. Please allow me to rephrase my previous question. This model, ~tree+treat, is assumed effect of the time of treatment will be the same irrespective of genotype. Thus, test for the coef=3 will gives me the differential expression at 96H irrespective of genotype. I would like to learn the differential expression of treeHS vs treeLS at specific time points i.e. 24H or if possible across the whole time points. Should I fit my model as model A: ~tree*treat OR model B: ~tree+tree:treat ? The design matrix columns of model B give: "(Intercept)" "treeLS" "treeHS:treat03H" "treeLS:treat03H" "treeHS:treat24H" "treeLS:treat24H" "treeHS:treat96H" "treeLS:treat96H" Am I doing right if I fit the model B and test for the coef=3:4 to learn the differential expression of treeHS vs treeLS at specific time points i.e. 03H? Does this test could tells me which set of genes up/down-regulated in treeLS or treeHS? I'm not good in statistic and I'm learning, kindly please correct me if I'm wrong. Thank you very much for your time and suggestion. Best regards, KJ Lim On 26 July 2012 03:39, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > > Dear KJ Lim, > > I don't quite understand your question, because you seem to be asking for something that isn't a test for differential expression, which is what edgeR does. You question "I would like to learn the genes are express in treeHS but not treeLS and vice versa" seems to be about absolute expression levels. edgeR doesn't test for genes that are not expressed in particular conditions. > > I'll refer you to the limma section on interaction models in case that helps, see Section 8.5 of: > > http://bioconductor.org/packages/2.11/bioc/vignettes/limma/inst/doc/ usersguide.pdf > > Setting up a design matrix is the same for edgeR as it is for limma. > > Best wishes > Gordon > >> Date: Tue, 24 Jul 2012 17:12:00 +0300 >> From: KJ Lim <jinkeanlim at="" gmail.com=""> >> To: Bioconductor mailing list <bioconductor at="" r-project.org=""> >> Subject: [BioC] edgeR: design matrix for different condition >> >> Dear the edgeR community, >> >> Good day. >> >> I'm analyzing my RNA-Seq experiment with edgeR. My study has 2 >> different genotypes (treeHS, treeLS) and time of treatment (0H, 3H, >> 24H, 96H). >> >> I first assumed that the treatment effect will be the same for each >> genotype and I have the design matrix as: >> >> design <- model.matrix(~tree+treat) >> >>> design >> >> (Intercept) treat03H treat24H treat96H treeLS >> 1 1 0 0 0 0 >> 2 1 0 0 0 0 >> 3 1 1 0 0 0 >> 4 1 1 0 0 0 >> 5 1 0 1 0 0 >> 6 1 0 1 0 0 >> 7 1 0 0 1 0 >> 8 1 0 0 1 0 >> 9 1 0 0 0 1 >> 10 1 0 0 0 1 >> 11 1 1 0 0 1 >> 12 1 1 0 0 1 >> 13 1 0 1 0 1 >> 14 1 0 1 0 1 >> 15 1 0 0 1 1 >> 16 1 0 0 1 1 >> >> I used coef=4 to test for the differential expressions between treeHS >> and treeLS within the time of treatment, coef=3 to learn the >> differential expressions in 2 genotypes at time of treatment 96H. >> >> ** I would like to learn the genes are express in treeHS but not >> treeLS and vice versa at certain time of treatment let's say 24H or >> across the whole time of treatment, should I have the design matrix as >> below or more steps I need to do? >> >> design <- model.matrix(~tree*treat) >> >> Kindly please light me on this. I appreciate very much for your help and time. >> >> Have a nice day. >> >> Best regards, >> KJ Lim >> > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:6}}

ADD REPLY • link 12.7 years ago KJ Lim ▴ 420

0

Entering edit mode

Dear Prof. Gordon and List, I have an RNA-Seq expt for which I'd like to use edgeR, as it is multifactorial in design. Having gone through the user guide, I am a bit confused as to how to generate the model for my expt. The expt: 2 cell-lines (mut,wt), 2 conditions(stimulated, unstimulated), n=2 in each group. My aim: to detect DE genes based on the effect of stimulus on mut cells. Thus, dat Sample Group Stim 1 1 WT No 2 2 WT No 3 3 WT+ Yes 4 4 WT+ Yes 5 5 Mut No 6 6 Mut No 7 7 Mut+ Yes 8 8 Mut+ Yes Now, if this were array data the model would be: design = model.matrix(~dat$Group) and whilst fitting the model I could make a contrast such as (Mut+ - Mut) - (WT+ - WT) I am not sure how to do this for the RNA-Seq data (i.e. what should the model be? And what coefficients should I pull out?) Whether the model should be: 1) model.matrix(~dat$Group) and somehow in the glmLRT function specify the above contrast in some manner? 2) model.matrix(~dat$Group+dat$Group*dat$Stim) (coefficient/contrast?) 3) model.matrix(~dat$Group*dat$Stim) (coefficient/contrast?) I'd appreciate any help and advice. Many Thanks, Natasha

ADD REPLY • link 12.7 years ago Natasha ▴ 440

0

Entering edit mode

Hi Natasha, On 7/26/12 8:29 AM, Natasha Sahgal wrote: > Dear Prof. Gordon and List, > > I have an RNA-Seq expt for which I'd like to use edgeR, as it is multifactorial in design. > Having gone through the user guide, I am a bit confused as to how to generate the model for my expt. > > The expt: 2 cell-lines (mut,wt), 2 conditions(stimulated, unstimulated), n=2 in each group. > My aim: to detect DE genes based on the effect of stimulus on mut cells. > > Thus, > dat > Sample Group Stim > 1 1 WT No > 2 2 WT No > 3 3 WT+ Yes > 4 4 WT+ Yes > 5 5 Mut No > 6 6 Mut No > 7 7 Mut+ Yes > 8 8 Mut+ Yes > > Now, if this were array data the model would be: > design = model.matrix(~dat$Group) > and whilst fitting the model I could make a contrast such as (Mut+ - Mut) - (WT+ - WT) > > I am not sure how to do this for the RNA-Seq data (i.e. what should the model be? And what coefficients should I pull out?) You use the same exact design matrix you would have used for array data. Note that glmLRT() has a contrast argument, so if you parameterize the way you are doing here, you will need to use that (I am assuming you are just doing something like model.matrix(~Group), as including Stim doesn't make sense). Alternatively you could parameterize by separating out the group from the stim and fit the interaction as part of the design matrix: > Group <- factor(rep(c("WT", "Mut"), each=4)) > Group <- relevel(Group, "WT") > Stim <- factor(rep(c("No","Yes"), each=2, times=2)) > model.matrix(~Group*Stim) (Intercept) GroupMut StimYes GroupMut:StimYes 1 1 0 0 0 2 1 0 0 0 3 1 0 1 0 4 1 0 1 0 5 1 1 0 0 6 1 1 0 0 7 1 1 1 1 8 1 1 1 1 And glmLRT() will test the interaction term by default. Best, Jim > Whether the model should be: > 1) model.matrix(~dat$Group) and somehow in the glmLRT function specify the above contrast in some manner? > > 2) model.matrix(~dat$Group+dat$Group*dat$Stim) (coefficient/contrast?) > > 3) model.matrix(~dat$Group*dat$Stim) (coefficient/contrast?) > > > I'd appreciate any help and advice. > > > Many Thanks, > Natasha > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 12.7 years ago James W. MacDonald 68k

0

Entering edit mode

Dear James, Thank you for the prompt response. When it comes to interactions, I do tend to get stumped easily. I had some additional queries: 1) I realized I made a typo in the array design model which would be: design = model.matrix(~0+dat$Group): to give the contrast of (Mut+ - Mut) - (WT+ - WT). As you mentioned, I did notice the glmLRT function has a contrast parameter, but was/am not sure if the above design model (i.e. ~0+dat$Group) would work with the aforementioned contrast? Since that was limma based for arrays and this is RNA-Seq. 2) Thank you for the detailed alternative parameterization method, I realize the error in my dat object. On another note: 3) I do not completely follow the cpm (counts per million) concept for filtering the data. (Probably very na?ve but) How does one decide on the cpm? How does one equate it to (read) counts - that's what the count table is right - number of reads? Why not just use those (i.e. say something like >= 10 reads in at least 2 samples (in my case here, as I gather from reading the user guide its about 50% of samples)? 4) Since tag-wise dispersion is recommended and one must calculate either common or trended dispersion prior to that. How does one decide to use common dispersion or trended dispersion or both before calculating tag-wise dispersion? (Or am I missing something obvious here). Many Thanks in advance. Best Wishes, Natasha -----Original Message----- From: James W. MacDonald [mailto:jmacdon@uw.edu] Sent: 26 July 2012 16:03 To: Natasha Sahgal; 'bioconductor at r-project.org' Subject: Re: [BioC] edgeR: generating a correct design matrix - multifactorial design Hi Natasha, On 7/26/12 8:29 AM, Natasha Sahgal wrote: > Dear Prof. Gordon and List, > > I have an RNA-Seq expt for which I'd like to use edgeR, as it is multifactorial in design. > Having gone through the user guide, I am a bit confused as to how to generate the model for my expt. > > The expt: 2 cell-lines (mut,wt), 2 conditions(stimulated, unstimulated), n=2 in each group. > My aim: to detect DE genes based on the effect of stimulus on mut cells. > > Thus, > dat > Sample Group Stim > 1 1 WT No > 2 2 WT No > 3 3 WT+ Yes > 4 4 WT+ Yes > 5 5 Mut No > 6 6 Mut No > 7 7 Mut+ Yes > 8 8 Mut+ Yes > > Now, if this were array data the model would be: > design = model.matrix(~dat$Group) > and whilst fitting the model I could make a contrast such as (Mut+ - > Mut) - (WT+ - WT) > > I am not sure how to do this for the RNA-Seq data (i.e. what should > the model be? And what coefficients should I pull out?) You use the same exact design matrix you would have used for array data. Note that glmLRT() has a contrast argument, so if you parameterize the way you are doing here, you will need to use that (I am assuming you are just doing something like model.matrix(~Group), as including Stim doesn't make sense). Alternatively you could parameterize by separating out the group from the stim and fit the interaction as part of the design matrix: > Group <- factor(rep(c("WT", "Mut"), each=4)) > Group <- relevel(Group, "WT") > Stim <- factor(rep(c("No","Yes"), each=2, times=2)) > model.matrix(~Group*Stim) (Intercept) GroupMut StimYes GroupMut:StimYes 1 1 0 0 0 2 1 0 0 0 3 1 0 1 0 4 1 0 1 0 5 1 1 0 0 6 1 1 0 0 7 1 1 1 1 8 1 1 1 1 And glmLRT() will test the interaction term by default. Best, Jim > Whether the model should be: > 1) model.matrix(~dat$Group) and somehow in the glmLRT function specify the above contrast in some manner? > > 2) model.matrix(~dat$Group+dat$Group*dat$Stim) (coefficient/contrast?) > > 3) model.matrix(~dat$Group*dat$Stim) (coefficient/contrast?) > > > I'd appreciate any help and advice. > > > Many Thanks, > Natasha > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 12.7 years ago Natasha ▴ 440

0

Entering edit mode

Hi Natasha, On 7/27/12 6:58 AM, Natasha Sahgal wrote: > Dear James, > > Thank you for the prompt response. > When it comes to interactions, I do tend to get stumped easily. > > I had some additional queries: > > 1) I realized I made a typo in the array design model which would be: design = model.matrix(~0+dat$Group): to give the contrast of (Mut+ - Mut) - (WT+ - WT). > > As you mentioned, I did notice the glmLRT function has a contrast parameter, but was/am not sure if the above design model (i.e. ~0+dat$Group) would work with the aforementioned contrast? Since that was limma based for arrays and this is RNA-Seq. Yes it will. You specify the design/contrast matrices exactly as you would for limma. > 2) Thank you for the detailed alternative parameterization method, I realize the error in my dat object. > > On another note: > > 3) I do not completely follow the cpm (counts per million) concept for filtering the data. (Probably very na?ve but) How does one decide on the cpm? How does one equate it to (read) counts - that's what the count table is right - number of reads? Why not just use those (i.e. say something like>= 10 reads in at least 2 samples (in my case here, as I gather from reading the user guide its about 50% of samples)? You use cpm rather than raw counts because the different samples may have different numbers of total reads. For example, one sample may have twice as many reads as another, in which case you wouldn't want to use raw reads to do any comparisons or filtering (e.g., if you have twice as many total reads in one sample vs another, you would expect approximately twice as many reads per transcript as well, but that wouldn't indicate differential expression). As for how many cpm to use to filter, this is sort of ad hoc, and is really intended to get rid of all those transcripts for which there is really no evidence for any expression. > 4) Since tag-wise dispersion is recommended and one must calculate either common or trended dispersion prior to that. > How does one decide to use common dispersion or trended dispersion or both before calculating tag-wise dispersion? (Or am I missing something obvious here). The default is to use trended dispersions if both common and trended dispersion data exist, so I assume that means the better idea is to use trended. But I don't know empirically how one might decide between the two. Best, Jim > > Many Thanks in advance. > > Best Wishes, > Natasha > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: 26 July 2012 16:03 > To: Natasha Sahgal; 'bioconductor at r-project.org' > Subject: Re: [BioC] edgeR: generating a correct design matrix - multifactorial design > > Hi Natasha, > > On 7/26/12 8:29 AM, Natasha Sahgal wrote: >> Dear Prof. Gordon and List, >> >> I have an RNA-Seq expt for which I'd like to use edgeR, as it is multifactorial in design. >> Having gone through the user guide, I am a bit confused as to how to generate the model for my expt. >> >> The expt: 2 cell-lines (mut,wt), 2 conditions(stimulated, unstimulated), n=2 in each group. >> My aim: to detect DE genes based on the effect of stimulus on mut cells. >> >> Thus, >> dat >> Sample Group Stim >> 1 1 WT No >> 2 2 WT No >> 3 3 WT+ Yes >> 4 4 WT+ Yes >> 5 5 Mut No >> 6 6 Mut No >> 7 7 Mut+ Yes >> 8 8 Mut+ Yes >> >> Now, if this were array data the model would be: >> design = model.matrix(~dat$Group) >> and whilst fitting the model I could make a contrast such as (Mut+ - >> Mut) - (WT+ - WT) >> >> I am not sure how to do this for the RNA-Seq data (i.e. what should >> the model be? And what coefficients should I pull out?) > You use the same exact design matrix you would have used for array data. > Note that glmLRT() has a contrast argument, so if you parameterize the way you are doing here, you will need to use that (I am assuming you are just doing something like model.matrix(~Group), as including Stim doesn't make sense). > > Alternatively you could parameterize by separating out the group from the stim and fit the interaction as part of the design matrix: > > > Group<- factor(rep(c("WT", "Mut"), each=4))> Group<- relevel(Group, "WT")> Stim<- factor(rep(c("No","Yes"), each=2, times=2))> model.matrix(~Group*Stim) > (Intercept) GroupMut StimYes GroupMut:StimYes > 1 1 0 0 0 > 2 1 0 0 0 > 3 1 0 1 0 > 4 1 0 1 0 > 5 1 1 0 0 > 6 1 1 0 0 > 7 1 1 1 1 > 8 1 1 1 1 > > And glmLRT() will test the interaction term by default. > > Best, > > Jim >> Whether the model should be: >> 1) model.matrix(~dat$Group) and somehow in the glmLRT function specify the above contrast in some manner? >> >> 2) model.matrix(~dat$Group+dat$Group*dat$Stim) (coefficient/contrast?) >> >> 3) model.matrix(~dat$Group*dat$Stim) (coefficient/contrast?) >> >> >> I'd appreciate any help and advice. >> >> >> Many Thanks, >> Natasha >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 12.7 years ago James W. MacDonald 68k

0

Entering edit mode

Dear KJ Lim, Thanks for the rephrasing, which is clearer. I would have liked you to mention however whether you read the documentation that I refered you do. On Thu, 26 Jul 2012, KJ Lim wrote: > Dear Prof Gordon, > > Good day. Thanks for your prompt replied. > > Please allow me to rephrase my previous question. > > This model, ~tree+treat, is assumed effect of the time of treatment > will be the same irrespective of genotype. Yes, this model makes that assumption. > Thus, test for the coef=3 will gives me the differential expression at > 96H irrespective of genotype. Actually coef=3 refers to 24H, according to the design matrix in your original email. A test for coef=3 would test for DE at 24H vs 0H, irrespective of genotype. But only if the assumption mentioned above is true, which is unlikely given the rest of your email. > I would like to learn the differential expression of treeHS vs treeLS > at specific time points i.e. 24H or if possible across the whole time > points. Should I fit my model as > > model A: ~tree*treat OR model B: ~tree+tree:treat ? These models are equivalent, so it is just a matter of convenience which one you use, as I tried to explain in the limma documentation I refered you to. > The design matrix columns of model B give: > "(Intercept)" "treeLS" "treeHS:treat03H" "treeLS:treat03H" > "treeHS:treat24H" "treeLS:treat24H" "treeHS:treat96H" > "treeLS:treat96H" > > Am I doing right if I fit the model B and test for the coef=3:4 to > learn the differential expression of treeHS vs treeLS at specific time > points i.e. 03H? Does this test could tells me which set of genes > up/down-regulated in treeLS or treeHS? Yes. Coef 3 tests for a 3H effect in HS. Coef 4 tests for a 3H in LS. So testing for both coefficients coef=3:4 tests for a 3H effect in either treeLS or treeHS. Best wishes Gordon > I'm not good in statistic and I'm learning, kindly please correct me > if I'm wrong. > > Thank you very much for your time and suggestion. > > Best regards, > KJ Lim > > > > > On 26 July 2012 03:39, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >> >> Dear KJ Lim, >> >> I don't quite understand your question, because you seem to be asking >> for something that isn't a test for differential expression, which is >> what edgeR does. You question "I would like to learn the genes are >> express in treeHS but not treeLS and vice versa" seems to be about >> absolute expression levels. edgeR doesn't test for genes that are not >> expressed in particular conditions. >> >> I'll refer you to the limma section on interaction models in case that >> helps, see Section 8.5 of: >> >> http://bioconductor.org/packages/2.11/bioc/vignettes/limma/inst/doc /usersguide.pdf >> >> Setting up a design matrix is the same for edgeR as it is for limma. >> >> Best wishes >> Gordon >> >>> Date: Tue, 24 Jul 2012 17:12:00 +0300 >>> From: KJ Lim <jinkeanlim at="" gmail.com=""> >>> To: Bioconductor mailing list <bioconductor at="" r-project.org=""> >>> Subject: [BioC] edgeR: design matrix for different condition >>> >>> Dear the edgeR community, >>> >>> Good day. >>> >>> I'm analyzing my RNA-Seq experiment with edgeR. My study has 2 >>> different genotypes (treeHS, treeLS) and time of treatment (0H, 3H, >>> 24H, 96H). >>> >>> I first assumed that the treatment effect will be the same for each >>> genotype and I have the design matrix as: >>> >>> design <- model.matrix(~tree+treat) >>> >>>> design >>> >>> (Intercept) treat03H treat24H treat96H treeLS >>> 1 1 0 0 0 0 >>> 2 1 0 0 0 0 >>> 3 1 1 0 0 0 >>> 4 1 1 0 0 0 >>> 5 1 0 1 0 0 >>> 6 1 0 1 0 0 >>> 7 1 0 0 1 0 >>> 8 1 0 0 1 0 >>> 9 1 0 0 0 1 >>> 10 1 0 0 0 1 >>> 11 1 1 0 0 1 >>> 12 1 1 0 0 1 >>> 13 1 0 1 0 1 >>> 14 1 0 1 0 1 >>> 15 1 0 0 1 1 >>> 16 1 0 0 1 1 >>> >>> I used coef=4 to test for the differential expressions between treeHS >>> and treeLS within the time of treatment, coef=3 to learn the >>> differential expressions in 2 genotypes at time of treatment 96H. >>> >>> ** I would like to learn the genes are express in treeHS but not >>> treeLS and vice versa at certain time of treatment let's say 24H or >>> across the whole time of treatment, should I have the design matrix as >>> below or more steps I need to do? >>> >>> design <- model.matrix(~tree*treat) >>> >>> Kindly please light me on this. I appreciate very much for your help and time. >>> >>> Have a nice day. >>> >>> Best regards, >>> KJ Lim ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 12.7 years ago Gordon Smyth 52k

0

Entering edit mode

Dear Prof Gordon, Good day. I have read the section 8.5 of the Limma manual as you suggested in previous emails. Thanks. If I understand you correctly, you would suggest me to carry out my DE analysis with limma package if I would like to learn the which genes are express in treeHS compare to treeLS (vice versa) at i.e. 24H. May I ask how can I generate the EList, "eset", in order to fit in the linear model as mentioned in the limma manual fit <- lmFit(eset, design) fit <- eBayes(fit) Please correct me if I'm wrong. Thank you very much for your time and help. Best regards, KJ Lim On 27 July 2012 02:31, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > Dear KJ Lim, > > Thanks for the rephrasing, which is clearer. I would have liked you to > mention however whether you read the documentation that I refered you do. > > > On Thu, 26 Jul 2012, KJ Lim wrote: > >> Dear Prof Gordon, >> >> Good day. Thanks for your prompt replied. >> >> Please allow me to rephrase my previous question. >> >> This model, ~tree+treat, is assumed effect of the time of treatment >> will be the same irrespective of genotype. > > > Yes, this model makes that assumption. > > >> Thus, test for the coef=3 will gives me the differential expression at 96H >> irrespective of genotype. > > > Actually coef=3 refers to 24H, according to the design matrix in your > original email. A test for coef=3 would test for DE at 24H vs 0H, > irrespective of genotype. But only if the assumption mentioned above is > true, which is unlikely given the rest of your email. > > >> I would like to learn the differential expression of treeHS vs treeLS >> at specific time points i.e. 24H or if possible across the whole time >> points. Should I fit my model as >> >> model A: ~tree*treat OR model B: ~tree+tree:treat ? > > > These models are equivalent, so it is just a matter of convenience which one > you use, as I tried to explain in the limma documentation I refered you to. > > >> The design matrix columns of model B give: >> "(Intercept)" "treeLS" "treeHS:treat03H" "treeLS:treat03H" >> "treeHS:treat24H" "treeLS:treat24H" "treeHS:treat96H" >> "treeLS:treat96H" >> >> Am I doing right if I fit the model B and test for the coef=3:4 to >> learn the differential expression of treeHS vs treeLS at specific time >> points i.e. 03H? Does this test could tells me which set of genes >> up/down-regulated in treeLS or treeHS? > > > Yes. Coef 3 tests for a 3H effect in HS. Coef 4 tests for a 3H in LS. So > testing for both coefficients coef=3:4 tests for a 3H effect in either > treeLS or treeHS. > > Best wishes > Gordon > > >> I'm not good in statistic and I'm learning, kindly please correct me >> if I'm wrong. >> >> Thank you very much for your time and suggestion. >> >> Best regards, >> KJ Lim >> >> >> >> >> On 26 July 2012 03:39, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>> >>> >>> Dear KJ Lim, >>> >>> I don't quite understand your question, because you seem to be asking for >>> something that isn't a test for differential expression, which is what edgeR >>> does. You question "I would like to learn the genes are express in treeHS >>> but not treeLS and vice versa" seems to be about absolute expression levels. >>> edgeR doesn't test for genes that are not expressed in particular >>> conditions. >>> >>> I'll refer you to the limma section on interaction models in case that >>> helps, see Section 8.5 of: >>> >>> >>> http://bioconductor.org/packages/2.11/bioc/vignettes/limma/inst/do c/usersguide.pdf >>> >>> Setting up a design matrix is the same for edgeR as it is for limma. >>> >>> Best wishes >>> Gordon >>> >>>> Date: Tue, 24 Jul 2012 17:12:00 +0300 >>>> From: KJ Lim <jinkeanlim at="" gmail.com=""> >>>> To: Bioconductor mailing list <bioconductor at="" r-project.org=""> >>>> Subject: [BioC] edgeR: design matrix for different condition >>>> >>>> Dear the edgeR community, >>>> >>>> Good day. >>>> >>>> I'm analyzing my RNA-Seq experiment with edgeR. My study has 2 >>>> different genotypes (treeHS, treeLS) and time of treatment (0H, 3H, >>>> 24H, 96H). >>>> >>>> I first assumed that the treatment effect will be the same for each >>>> genotype and I have the design matrix as: >>>> >>>> design <- model.matrix(~tree+treat) >>>> >>>>> design >>>> >>>> >>>> (Intercept) treat03H treat24H treat96H treeLS >>>> 1 1 0 0 0 0 >>>> 2 1 0 0 0 0 >>>> 3 1 1 0 0 0 >>>> 4 1 1 0 0 0 >>>> 5 1 0 1 0 0 >>>> 6 1 0 1 0 0 >>>> 7 1 0 0 1 0 >>>> 8 1 0 0 1 0 >>>> 9 1 0 0 0 1 >>>> 10 1 0 0 0 1 >>>> 11 1 1 0 0 1 >>>> 12 1 1 0 0 1 >>>> 13 1 0 1 0 1 >>>> 14 1 0 1 0 1 >>>> 15 1 0 0 1 1 >>>> 16 1 0 0 1 1 >>>> >>>> I used coef=4 to test for the differential expressions between treeHS >>>> and treeLS within the time of treatment, coef=3 to learn the >>>> differential expressions in 2 genotypes at time of treatment 96H. >>>> >>>> ** I would like to learn the genes are express in treeHS but not >>>> treeLS and vice versa at certain time of treatment let's say 24H or >>>> across the whole time of treatment, should I have the design matrix as >>>> below or more steps I need to do? >>>> >>>> design <- model.matrix(~tree*treat) >>>> >>>> Kindly please light me on this. I appreciate very much for your help and >>>> time. >>>> >>>> Have a nice day. >>>> >>>> Best regards, >>>> KJ Lim > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:6}}

ADD REPLY • link 12.7 years ago KJ Lim ▴ 420

0

Entering edit mode

Dear KH Kim, No, you have not understood me correctly. I did not suggest that you change from edgeR to limma. I suggested that you read the limma documentation because the design matrix is the same for edgeR as it is for limma, so the limma documentation would help you create the design matrix for your edgeR analysis. I thought from your last email that you had already done this and that you had completed the edgeR analysis satisfactorily. I wrote more documentation for the edgeR User's Guide a couple of days ago, trying to give advice for experiments such as yours. You might find that Section 3.3 of http://bioconductor.org/packages/2.11/bioc/vignettes/edgeR/inst/doc/ed geRUsersGuide.pdf gives more explanation. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. Tel: (03) 9345 2326, Fax (03) 9347 0852, http://www.statsci.org/smyth On Mon, 30 Jul 2012, KJ Lim wrote: > Dear Prof Gordon, > > Good day. > > I have read the section 8.5 of the Limma manual as you suggested in > previous emails. Thanks. > > If I understand you correctly, you would suggest me to carry out my DE > analysis with limma package if I would like to learn the which genes > are express in treeHS compare to treeLS (vice versa) at i.e. 24H. > > May I ask how can I generate the EList, "eset", in order to fit in the > linear model as mentioned in the limma manual > > fit <- lmFit(eset, design) > fit <- eBayes(fit) > > Please correct me if I'm wrong. > > Thank you very much for your time and help. > > Best regards, > KJ Lim > > > > On 27 July 2012 02:31, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >> Dear KJ Lim, >> >> Thanks for the rephrasing, which is clearer. I would have liked you to >> mention however whether you read the documentation that I refered you do. >> >> >> On Thu, 26 Jul 2012, KJ Lim wrote: >> >>> Dear Prof Gordon, >>> >>> Good day. Thanks for your prompt replied. >>> >>> Please allow me to rephrase my previous question. >>> >>> This model, ~tree+treat, is assumed effect of the time of treatment >>> will be the same irrespective of genotype. >> >> >> Yes, this model makes that assumption. >> >> >>> Thus, test for the coef=3 will gives me the differential expression at 96H >>> irrespective of genotype. >> >> >> Actually coef=3 refers to 24H, according to the design matrix in your >> original email. A test for coef=3 would test for DE at 24H vs 0H, >> irrespective of genotype. But only if the assumption mentioned above is >> true, which is unlikely given the rest of your email. >> >> >>> I would like to learn the differential expression of treeHS vs treeLS >>> at specific time points i.e. 24H or if possible across the whole time >>> points. Should I fit my model as >>> >>> model A: ~tree*treat OR model B: ~tree+tree:treat ? >> >> >> These models are equivalent, so it is just a matter of convenience which one >> you use, as I tried to explain in the limma documentation I refered you to. >> >> >>> The design matrix columns of model B give: >>> "(Intercept)" "treeLS" "treeHS:treat03H" "treeLS:treat03H" >>> "treeHS:treat24H" "treeLS:treat24H" "treeHS:treat96H" >>> "treeLS:treat96H" >>> >>> Am I doing right if I fit the model B and test for the coef=3:4 to >>> learn the differential expression of treeHS vs treeLS at specific time >>> points i.e. 03H? Does this test could tells me which set of genes >>> up/down-regulated in treeLS or treeHS? >> >> >> Yes. Coef 3 tests for a 3H effect in HS. Coef 4 tests for a 3H in LS. So >> testing for both coefficients coef=3:4 tests for a 3H effect in either >> treeLS or treeHS. >> >> Best wishes >> Gordon >> >> >>> I'm not good in statistic and I'm learning, kindly please correct me >>> if I'm wrong. >>> >>> Thank you very much for your time and suggestion. >>> >>> Best regards, >>> KJ Lim >>> >>> >>> >>> >>> On 26 July 2012 03:39, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>>> >>>> >>>> Dear KJ Lim, >>>> >>>> I don't quite understand your question, because you seem to be asking for >>>> something that isn't a test for differential expression, which is what edgeR >>>> does. You question "I would like to learn the genes are express in treeHS >>>> but not treeLS and vice versa" seems to be about absolute expression levels. >>>> edgeR doesn't test for genes that are not expressed in particular >>>> conditions. >>>> >>>> I'll refer you to the limma section on interaction models in case that >>>> helps, see Section 8.5 of: >>>> >>>> >>>> http://bioconductor.org/packages/2.11/bioc/vignettes/limma/inst/d oc/usersguide.pdf >>>> >>>> Setting up a design matrix is the same for edgeR as it is for limma. >>>> >>>> Best wishes >>>> Gordon >>>> >>>>> Date: Tue, 24 Jul 2012 17:12:00 +0300 >>>>> From: KJ Lim <jinkeanlim at="" gmail.com=""> >>>>> To: Bioconductor mailing list <bioconductor at="" r-project.org=""> >>>>> Subject: [BioC] edgeR: design matrix for different condition >>>>> >>>>> Dear the edgeR community, >>>>> >>>>> Good day. >>>>> >>>>> I'm analyzing my RNA-Seq experiment with edgeR. My study has 2 >>>>> different genotypes (treeHS, treeLS) and time of treatment (0H, 3H, >>>>> 24H, 96H). >>>>> >>>>> I first assumed that the treatment effect will be the same for each >>>>> genotype and I have the design matrix as: >>>>> >>>>> design <- model.matrix(~tree+treat) >>>>> >>>>>> design >>>>> >>>>> >>>>> (Intercept) treat03H treat24H treat96H treeLS >>>>> 1 1 0 0 0 0 >>>>> 2 1 0 0 0 0 >>>>> 3 1 1 0 0 0 >>>>> 4 1 1 0 0 0 >>>>> 5 1 0 1 0 0 >>>>> 6 1 0 1 0 0 >>>>> 7 1 0 0 1 0 >>>>> 8 1 0 0 1 0 >>>>> 9 1 0 0 0 1 >>>>> 10 1 0 0 0 1 >>>>> 11 1 1 0 0 1 >>>>> 12 1 1 0 0 1 >>>>> 13 1 0 1 0 1 >>>>> 14 1 0 1 0 1 >>>>> 15 1 0 0 1 1 >>>>> 16 1 0 0 1 1 >>>>> >>>>> I used coef=4 to test for the differential expressions between treeHS >>>>> and treeLS within the time of treatment, coef=3 to learn the >>>>> differential expressions in 2 genotypes at time of treatment 96H. >>>>> >>>>> ** I would like to learn the genes are express in treeHS but not >>>>> treeLS and vice versa at certain time of treatment let's say 24H or >>>>> across the whole time of treatment, should I have the design matrix as >>>>> below or more steps I need to do? >>>>> >>>>> design <- model.matrix(~tree*treat) >>>>> >>>>> Kindly please light me on this. I appreciate very much for your help and >>>>> time. >>>>> >>>>> Have a nice day. >>>>> >>>>> Best regards, >>>>> KJ Lim >> >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the >> addressee. >> You must not disclose, forward, print or use it without the permission of >> the sender. >> ______________________________________________________________________ > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 12.7 years ago Gordon Smyth 52k

0

Entering edit mode

Dear Prof. Gordon, Good day. Thanks for the correction, I'm sorry about that. I will read through the latest edgeR User's Guide. Thank you very much for your time and also the community. Have a nice day. Best regards, KJ Lim On 31 July 2012 02:24, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > Dear KH Kim, > > No, you have not understood me correctly. I did not suggest that you change > from edgeR to limma. I suggested that you read the limma documentation > because the design matrix is the same for edgeR as it is for limma, so the > limma documentation would help you create the design matrix for your edgeR > analysis. > > I thought from your last email that you had already done this and that you > had completed the edgeR analysis satisfactorily. > > I wrote more documentation for the edgeR User's Guide a couple of days ago, > trying to give advice for experiments such as yours. You might find that > Section 3.3 of > > http://bioconductor.org/packages/2.11/bioc/vignettes/edgeR/inst/doc/ edgeRUsersGuide.pdf > > gives more explanation. > > Best wishes > Gordon > > --------------------------------------------- > Professor Gordon K Smyth, > Bioinformatics Division, > Walter and Eliza Hall Institute of Medical Research, > 1G Royal Parade, Parkville, Vic 3052, Australia. > Tel: (03) 9345 2326, Fax (03) 9347 0852, > http://www.statsci.org/smyth > > > On Mon, 30 Jul 2012, KJ Lim wrote: > >> Dear Prof Gordon, >> >> Good day. >> >> I have read the section 8.5 of the Limma manual as you suggested in >> previous emails. Thanks. >> >> If I understand you correctly, you would suggest me to carry out my DE >> analysis with limma package if I would like to learn the which genes >> are express in treeHS compare to treeLS (vice versa) at i.e. 24H. >> >> May I ask how can I generate the EList, "eset", in order to fit in the >> linear model as mentioned in the limma manual >> >> fit <- lmFit(eset, design) >> fit <- eBayes(fit) >> >> Please correct me if I'm wrong. >> >> Thank you very much for your time and help. >> >> Best regards, >> KJ Lim >> >> >> >> On 27 July 2012 02:31, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>> >>> Dear KJ Lim, >>> >>> Thanks for the rephrasing, which is clearer. I would have liked you to >>> mention however whether you read the documentation that I refered you do. >>> >>> >>> On Thu, 26 Jul 2012, KJ Lim wrote: >>> >>>> Dear Prof Gordon, >>>> >>>> Good day. Thanks for your prompt replied. >>>> >>>> Please allow me to rephrase my previous question. >>>> >>>> This model, ~tree+treat, is assumed effect of the time of treatment >>>> will be the same irrespective of genotype. >>> >>> >>> >>> Yes, this model makes that assumption. >>> >>> >>>> Thus, test for the coef=3 will gives me the differential expression at >>>> 96H >>>> irrespective of genotype. >>> >>> >>> >>> Actually coef=3 refers to 24H, according to the design matrix in your >>> original email. A test for coef=3 would test for DE at 24H vs 0H, >>> irrespective of genotype. But only if the assumption mentioned above is >>> true, which is unlikely given the rest of your email. >>> >>> >>>> I would like to learn the differential expression of treeHS vs treeLS >>>> at specific time points i.e. 24H or if possible across the whole time >>>> points. Should I fit my model as >>>> >>>> model A: ~tree*treat OR model B: ~tree+tree:treat ? >>> >>> >>> >>> These models are equivalent, so it is just a matter of convenience which >>> one >>> you use, as I tried to explain in the limma documentation I refered you >>> to. >>> >>> >>>> The design matrix columns of model B give: >>>> "(Intercept)" "treeLS" "treeHS:treat03H" "treeLS:treat03H" >>>> "treeHS:treat24H" "treeLS:treat24H" "treeHS:treat96H" >>>> "treeLS:treat96H" >>>> >>>> Am I doing right if I fit the model B and test for the coef=3:4 to >>>> learn the differential expression of treeHS vs treeLS at specific time >>>> points i.e. 03H? Does this test could tells me which set of genes >>>> up/down-regulated in treeLS or treeHS? >>> >>> >>> >>> Yes. Coef 3 tests for a 3H effect in HS. Coef 4 tests for a 3H in LS. >>> So >>> testing for both coefficients coef=3:4 tests for a 3H effect in either >>> treeLS or treeHS. >>> >>> Best wishes >>> Gordon >>> >>> >>>> I'm not good in statistic and I'm learning, kindly please correct me >>>> if I'm wrong. >>>> >>>> Thank you very much for your time and suggestion. >>>> >>>> Best regards, >>>> KJ Lim >>>> >>>> >>>> >>>> >>>> On 26 July 2012 03:39, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>>>> >>>>> >>>>> >>>>> Dear KJ Lim, >>>>> >>>>> I don't quite understand your question, because you seem to be asking >>>>> for >>>>> something that isn't a test for differential expression, which is what >>>>> edgeR >>>>> does. You question "I would like to learn the genes are express in >>>>> treeHS >>>>> but not treeLS and vice versa" seems to be about absolute expression >>>>> levels. >>>>> edgeR doesn't test for genes that are not expressed in particular >>>>> conditions. >>>>> >>>>> I'll refer you to the limma section on interaction models in case that >>>>> helps, see Section 8.5 of: >>>>> >>>>> >>>>> >>>>> http://bioconductor.org/packages/2.11/bioc/vignettes/limma/inst/ doc/usersguide.pdf >>>>> >>>>> Setting up a design matrix is the same for edgeR as it is for limma. >>>>> >>>>> Best wishes >>>>> Gordon >>>>> >>>>>> Date: Tue, 24 Jul 2012 17:12:00 +0300 >>>>>> From: KJ Lim <jinkeanlim at="" gmail.com=""> >>>>>> To: Bioconductor mailing list <bioconductor at="" r-project.org=""> >>>>>> Subject: [BioC] edgeR: design matrix for different condition >>>>>> >>>>>> Dear the edgeR community, >>>>>> >>>>>> Good day. >>>>>> >>>>>> I'm analyzing my RNA-Seq experiment with edgeR. My study has 2 >>>>>> different genotypes (treeHS, treeLS) and time of treatment (0H, 3H, >>>>>> 24H, 96H). >>>>>> >>>>>> I first assumed that the treatment effect will be the same for each >>>>>> genotype and I have the design matrix as: >>>>>> >>>>>> design <- model.matrix(~tree+treat) >>>>>> >>>>>>> design >>>>>> >>>>>> >>>>>> >>>>>> (Intercept) treat03H treat24H treat96H treeLS >>>>>> 1 1 0 0 0 0 >>>>>> 2 1 0 0 0 0 >>>>>> 3 1 1 0 0 0 >>>>>> 4 1 1 0 0 0 >>>>>> 5 1 0 1 0 0 >>>>>> 6 1 0 1 0 0 >>>>>> 7 1 0 0 1 0 >>>>>> 8 1 0 0 1 0 >>>>>> 9 1 0 0 0 1 >>>>>> 10 1 0 0 0 1 >>>>>> 11 1 1 0 0 1 >>>>>> 12 1 1 0 0 1 >>>>>> 13 1 0 1 0 1 >>>>>> 14 1 0 1 0 1 >>>>>> 15 1 0 0 1 1 >>>>>> 16 1 0 0 1 1 >>>>>> >>>>>> I used coef=4 to test for the differential expressions between treeHS >>>>>> and treeLS within the time of treatment, coef=3 to learn the >>>>>> differential expressions in 2 genotypes at time of treatment 96H. >>>>>> >>>>>> ** I would like to learn the genes are express in treeHS but not >>>>>> treeLS and vice versa at certain time of treatment let's say 24H or >>>>>> across the whole time of treatment, should I have the design matrix as >>>>>> below or more steps I need to do? >>>>>> >>>>>> design <- model.matrix(~tree*treat) >>>>>> >>>>>> Kindly please light me on this. I appreciate very much for your help >>>>>> and >>>>>> time. >>>>>> >>>>>> Have a nice day. >>>>>> >>>>>> Best regards, >>>>>> KJ Lim >>> >>> >>> >>> ______________________________________________________________________ >>> The information in this email is confidential and intended solely for the >>> addressee. >>> You must not disclose, forward, print or use it without the permission of >>> the sender. >>> ______________________________________________________________________ >> >> > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:6}}

ADD REPLY • link 12.7 years ago KJ Lim ▴ 420

Login before adding your answer.