Differential expression analysis in Limma for one factor after adjusting for a covariate

0

Entering edit mode

Sam McInturf ▴ 300

@sam-mcinturf-5291

Last seen 9.2 years ago

United States

James, Concerning how to interpret the coefficient names Fit a model with out an intercept and I understand the coefficients model.matrix(~gender+group+ gender:group) (Intercept) - test for significant intercept groupnormal - test for differences between the two groups genderM - test for differences between the genders groupnormal:genderM - test for the interaction term Fit with out an intercept model.matrix(~0+gender+group) groupnormal - ? groupdiseased - ? genderM - sig difference for gender what are groupnormal and groupdiseased testing against? Just that the coeff is not equal to zero? or is it a test of difference of means? newbie questions : / For microarrays and RNA seq I have always formatted my matrix as you specified: groupGend <- factor(paste(group, gender, sep = "_")) design <- model.matrix(~0+groupGend) where each contrast name is easy to read as "male.diseased compared to normal". Is this equivalent to the other matrices? Thanks On Fri, Aug 30, 2013 at 8:34 AM, James W. MacDonald <jmacdon@uw.edu> wrote: > > > On Friday, August 30, 2013 5:49:51 AM, QAMRA Aditi (GIS) wrote: > >> Hi, >> >> I have an expression dataset for both normal and diseased patients as >> well as their gender information. What I want to know is to test for >> difference in expression of males and females after having adjusted for >> differences between a normal and diseased tissue type (group ) using Limma >> rather than anova function in R, >> >> I have 2 questions - >> >> 1. Does Limma allow inclusion of covariates ? How do I first adjust the >> expression dataset to remove differences because of the sample being a >> diseased sample and then understand the true difference between the exp of >> male and female in Limma. What I have been able to do uptil now is >> difference between males/females and normals/diseased. Would >> (Male.Diseased-Male.Normal)-(**Female.Diseased-Female.Normal) (which is >> basically an interaction term) would give me this ? >> > > Any time you fit a model with various coefficients included, you are > automatically adjusting for those coefficients. In other words, if you fit > a model with sex and treatment and then compute the contrast between male > and female, you are doing so after adjusting for treatment. > > But your question isn't that clear, so I don't know if that answers it. > The interaction term gives you those genes that react differently to the > treatment in males as compared to females. This is different from finding > genes that are different in males vs females after adjusting for treatment, > but again it isn't totally clear to me what you are asking. > > > >> 2. I was trying include both gender and group information as factors - >> but when Im trying to build the model matrix - >> >> design <- model.matrix(~0+gender+group) >> >> where both gender and group are factors - i get the following layout of >> the design matrix - >> >> groupnormal groupdiseased genderM >> 1 1 0 0 >> 2 1 0 1 >> >> attr(,"assign") >> [1] 1 1 2 >> attr(,"contrasts") >> attr(,"contrasts")$group >> [1] "contr.treatment" >> >> attr(,"contrasts")$gender >> [1] "contr.treatment" >> >> Why do I not aslo see genderF as a column here ? >> > > Because that is the way R sets up the model matrix. The genderM > coefficient is computing the difference between males and females, so if > you want to test for sex differences you would simply test that this > coefficient is different from zero. > > But this is something that Gordon has been pointing out for years; the > conventional coefficients that you get from model.matrix() may not be the > most useful in the context of a microarray experiment. You could instead do > something like > > groupGend <- factor(paste(group, gender, sep = "_")) > > design <- model.matrix(~0+groupGend) > > and then your coefficients will be something directly interpretable, and > easier to understand (e.g., you will have four coefficients, male_normal, > male_diseased, female_normal, female_diseased, and then you can make more > directed comparisons). > > Best, > > Jim > > > > >> Thanks ! >> >> ------------------------------**- >> This e-mail and any attachments are only for the use of the intended >> recipient and may be confidential and/or privileged. If you are not the >> recipient, please delete it or notify the sender immediately. Please do not >> copy or use it for any purpose or disclose the contents to any other person >> as it may be an offence under the Official Secrets Act. >> ------------------------------**- >> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > -- Sam McInturf [[alternative HTML version deleted]]

Microarray limma Microarray limma • 2.8k views

ADD COMMENT • link updated 11.2 years ago by QAMRA Aditi GIS ▴ 120 • written 11.2 years ago by Sam McInturf ▴ 300

0

Entering edit mode

QAMRA Aditi GIS ▴ 120

@qamra-aditi-gis-6128

Last seen 10.2 years ago

Thanks a lot - It is exactly what I was trying to understand ! Could you help me understand one more thing ? Given the aim of finding genes that react differently to the treatment in males as compared to female, which approach would be better ? Approach1 - Find list of significantly differentially expressed genes between the 2 treatments and then run LIMMA again only on this subset of genes to compare difference between Males and females Approach2 - Use the interaction term to get the list of DEG that react differently to the treatment in males as compared to female Approach2 going by the results is more strict but I want to understand the pitfalls of approach 1 Thank you ! -----Original Message----- From: James W. MacDonald [mailto:jmacdon@uw.edu] Sent: Friday, August 30, 2013 9:35 PM To: QAMRA Aditi (GIS) Cc: bioconductor at r-project.org Subject: Re: [BioC] Differential expression analysis in Limma for one factor after adjusting for a covariate On Friday, August 30, 2013 5:49:51 AM, QAMRA Aditi (GIS) wrote: > Hi, > > I have an expression dataset for both normal and diseased patients as > well as their gender information. What I want to know is to test for > difference in expression of males and females after having adjusted > for differences between a normal and diseased tissue type (group ) > using Limma rather than anova function in R, > > I have 2 questions - > > 1. Does Limma allow inclusion of covariates ? How do I first adjust the expression dataset to remove differences because of the sample being a diseased sample and then understand the true difference between the exp of male and female in Limma. What I have been able to do uptil now is difference between males/females and normals/diseased. Would (Male.Diseased-Male.Normal)-(Female.Diseased-Female.Normal) (which is basically an interaction term) would give me this ? Any time you fit a model with various coefficients included, you are automatically adjusting for those coefficients. In other words, if you fit a model with sex and treatment and then compute the contrast between male and female, you are doing so after adjusting for treatment. But your question isn't that clear, so I don't know if that answers it. The interaction term gives you those genes that react differently to the treatment in males as compared to females. This is different from finding genes that are different in males vs females after adjusting for treatment, but again it isn't totally clear to me what you are asking. > > 2. I was trying include both gender and group information as factors - > but when Im trying to build the model matrix - > > design <- model.matrix(~0+gender+group) > > where both gender and group are factors - i get the following layout > of the design matrix - > > groupnormal groupdiseased genderM > 1 1 0 0 > 2 1 0 1 > > attr(,"assign") > [1] 1 1 2 > attr(,"contrasts") > attr(,"contrasts")$group > [1] "contr.treatment" > > attr(,"contrasts")$gender > [1] "contr.treatment" > > Why do I not aslo see genderF as a column here ? Because that is the way R sets up the model matrix. The genderM coefficient is computing the difference between males and females, so if you want to test for sex differences you would simply test that this coefficient is different from zero. But this is something that Gordon has been pointing out for years; the conventional coefficients that you get from model.matrix() may not be the most useful in the context of a microarray experiment. You could instead do something like groupGend <- factor(paste(group, gender, sep = "_")) design <- model.matrix(~0+groupGend) and then your coefficients will be something directly interpretable, and easier to understand (e.g., you will have four coefficients, male_normal, male_diseased, female_normal, female_diseased, and then you can make more directed comparisons). Best, Jim > > Thanks ! > > ------------------------------- > This e-mail and any attachments are only for the use of the intended recipient and may be confidential and/or privileged. If you are not the recipient, please delete it or notify the sender immediately. Please do not copy or use it for any purpose or disclose the contents to any other person as it may be an offence under the Official Secrets Act. > ------------------------------- > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 ------------------------------- This e-mail and any attachments are only for the use of the intended recipient and may be confidential and/or privileged. If you are not the recipient, please delete it or notify the sender immediately. Please do not copy or use it for any purpose or disclose the contents to any other person as it may be an offence under the Official Secrets Act. -------------------------------

ADD COMMENT • link 11.2 years ago QAMRA Aditi GIS ▴ 120

0

Entering edit mode

On 9/11/2013 7:55 AM, QAMRA Aditi (GIS) wrote: > Thanks a lot - It is exactly what I was trying to understand ! > > Could you help me understand one more thing ? Given the aim of finding genes that react differently to the treatment in males as compared to female, which approach would be better ? > > Approach1 - Find list of significantly differentially expressed genes between the 2 treatments and then run LIMMA again only on this subset of genes to compare difference between Males and females > > Approach2 - Use the interaction term to get the list of DEG that react differently to the treatment in males as compared to female > > Approach2 going by the results is more strict but I want to understand the pitfalls of approach 1 The difference is that Approach 1 as you describe it doesn't test what you want to find. The interaction specifically tests for a difference in the response to treatment in males vs females. Your approach 1 tests for a difference in response to treatment, and for those genes it then tests for a difference between the sexes. Now an example. Let's say a gene is highly up-regulated in females when you treat, and highly down-regulated in males when you treat. When you test for a treatment-specific difference, you may not achieve significance because the gene reacts differently in the two sexes. So you won't bring that gene forward to the next step, even though that is exactly the gene you are looking for. Best, Jim > > Thank you ! > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: Friday, August 30, 2013 9:35 PM > To: QAMRA Aditi (GIS) > Cc: bioconductor at r-project.org > Subject: Re: [BioC] Differential expression analysis in Limma for one factor after adjusting for a covariate > > > > On Friday, August 30, 2013 5:49:51 AM, QAMRA Aditi (GIS) wrote: >> Hi, >> >> I have an expression dataset for both normal and diseased patients as >> well as their gender information. What I want to know is to test for >> difference in expression of males and females after having adjusted >> for differences between a normal and diseased tissue type (group ) >> using Limma rather than anova function in R, >> >> I have 2 questions - >> >> 1. Does Limma allow inclusion of covariates ? How do I first adjust the expression dataset to remove differences because of the sample being a diseased sample and then understand the true difference between the exp of male and female in Limma. What I have been able to do uptil now is difference between males/females and normals/diseased. Would (Male.Diseased-Male.Normal)-(Female.Diseased-Female.Normal) (which is basically an interaction term) would give me this ? > Any time you fit a model with various coefficients included, you are automatically adjusting for those coefficients. In other words, if you fit a model with sex and treatment and then compute the contrast between male and female, you are doing so after adjusting for treatment. > > But your question isn't that clear, so I don't know if that answers it. > The interaction term gives you those genes that react differently to the treatment in males as compared to females. This is different from finding genes that are different in males vs females after adjusting for treatment, but again it isn't totally clear to me what you are asking. > >> 2. I was trying include both gender and group information as factors - >> but when Im trying to build the model matrix - >> >> design <- model.matrix(~0+gender+group) >> >> where both gender and group are factors - i get the following layout >> of the design matrix - >> >> groupnormal groupdiseased genderM >> 1 1 0 0 >> 2 1 0 1 >> >> attr(,"assign") >> [1] 1 1 2 >> attr(,"contrasts") >> attr(,"contrasts")$group >> [1] "contr.treatment" >> >> attr(,"contrasts")$gender >> [1] "contr.treatment" >> >> Why do I not aslo see genderF as a column here ? > Because that is the way R sets up the model matrix. The genderM coefficient is computing the difference between males and females, so if you want to test for sex differences you would simply test that this coefficient is different from zero. > > But this is something that Gordon has been pointing out for years; the conventional coefficients that you get from model.matrix() may not be the most useful in the context of a microarray experiment. You could instead do something like > > groupGend <- factor(paste(group, gender, sep = "_")) > > design <- model.matrix(~0+groupGend) > > and then your coefficients will be something directly interpretable, and easier to understand (e.g., you will have four coefficients, male_normal, male_diseased, female_normal, female_diseased, and then you can make more directed comparisons). > > Best, > > Jim > > >> Thanks ! >> >> ------------------------------- >> This e-mail and any attachments are only for the use of the intended recipient and may be confidential and/or privileged. If you are not the recipient, please delete it or notify the sender immediately. Please do not copy or use it for any purpose or disclose the contents to any other person as it may be an offence under the Official Secrets Act. >> ------------------------------- >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > ------------------------------- > This e-mail and any attachments are only for the use of the intended recipient and may be confidential and/or privileged. If you are not the recipient, please delete it or notify the sender immediately. Please do not copy or use it for any purpose or disclose the contents to any other person as it may be an offence under the Official Secrets Act. > ------------------------------- -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 11.2 years ago James W. MacDonald 67k

Login before adding your answer.