[Limma] lmFit handle missing value in phenotype

0

Entering edit mode

yao chen ▴ 210

@yao-chen-5205

Last seen 10.6 years ago

Hi, I am using Limma to get differential expressed genes between case and control samples. In addition, I want to adjust the "weight" of patients by including it in the "design" matrix. However, one weight value of a patient is missing, so I have a "NA" in design matrix. When I run lmFit, I got the error message :Error in qr.default(x) : NA/NaN/Inf in foreign function call (arg 1). Can limma handle the missing value in design matrix or I have to remove this sample ? Thanks, Chen [[alternative HTML version deleted]]

limma limma • 2.4k views

ADD COMMENT • link updated 13.0 years ago by Moshe Olshansky ▴ 260 • written 13.0 years ago by yao chen ▴ 210

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 1 day ago

United States

Hi Chen, On 4/4/2012 3:31 PM, yao chen wrote: > Hi, > > I am using Limma to get differential expressed genes between case and > control samples. In addition, I want to adjust the "weight" of patients by > including it in the "design" matrix. However, one weight value of a patient > is missing, so I have a "NA" in design matrix. When I run lmFit, I got the > error message :Error in qr.default(x) : NA/NaN/Inf in foreign function call > (arg 1). > > Can limma handle the missing value in design matrix or I have to remove > this sample ? It's not clear what you mean by weight here. Do you mean to say that you have weighed these subjects and want to control for their weight in your model? If I assume that this is what you mean, then you will not be able to include that person. If you use model.matrix() to create your design matrix, it will automatically drop that subject. Best, Jim > > Thanks, > > Chen > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 13.0 years ago James W. MacDonald 68k

0

Entering edit mode

Moshe Olshansky ▴ 260

@moshe-olshansky-4491

Last seen 10.6 years ago

Why do you want to include weight in the design matrix? It may be more reasonable to include weight in the linear model, i.e. expression ~ condition + weight and then your design matrix will have two columns (if there are two conditions): the first column will contain 0's and 1's depending on the condition (group) of the patient while the second one will be identically 1. And then the weights of the patients will be in the values (i.e. the second argument to lmFit) which I believe allows missing values. But this implicitly assumes that the weight (or some function of it - you can use any transformation you like) contributes additively (and linearly) to the expression (or it's logarithm). Moshe. > Hi, > > I am using Limma to get differential expressed genes between case and > control samples. In addition, I want to adjust the "weight" of patients by > including it in the "design" matrix. However, one weight value of a > patient > is missing, so I have a "NA" in design matrix. When I run lmFit, I got the > error message :Error in qr.default(x) : NA/NaN/Inf in foreign function > call > (arg 1). > > Can limma handle the missing value in design matrix or I have to remove > this sample ? > > Thanks, > > Chen > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 13.0 years ago Moshe Olshansky ▴ 260

0

Entering edit mode

Hi, Moshe, I still did not get it. How to use your method? lmFit<-(x,design) which x allow missing values, but design didn't. usually, I want to adjust batch effects, I will include it in design matrix, i.e. case control batch 1 0 1 0 1 1 1 0 2 which x is expression matrix, and then I will get the differential expressed gene between case and control with adjusting batch. In this case, I just want to simply use "weight" instead of "batch" in design matrix. But unfortunately, you can't have missing values in design matrix which I think it's very often if we have many samples. 2012/4/4 Moshe Olshansky <olshansky@wehi.edu.au> > Why do you want to include weight in the design matrix? > It may be more reasonable to include weight in the linear model, i.e. > expression ~ condition + weight and then your design matrix will have two > columns (if there are two conditions): the first column will contain 0's > and 1's depending on the condition (group) of the patient while the second > one will be identically 1. And then the weights of the patients will be in > the values (i.e. the second argument to lmFit) which I believe allows > missing values. > But this implicitly assumes that the weight (or some function of it - you > can use any transformation you like) contributes additively (and linearly) > to the expression (or it's logarithm). > > Moshe. > > > Hi, > > > > I am using Limma to get differential expressed genes between case and > > control samples. In addition, I want to adjust the "weight" of patients > by > > including it in the "design" matrix. However, one weight value of a > > patient > > is missing, so I have a "NA" in design matrix. When I run lmFit, I got > the > > error message :Error in qr.default(x) : NA/NaN/Inf in foreign function > > call > > (arg 1). > > > > Can limma handle the missing value in design matrix or I have to remove > > this sample ? > > > > Thanks, > > > > Chen > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:10}}

ADD REPLY • link 13.0 years ago yao chen ▴ 210

0

Entering edit mode

You are right. My mistake. Sorry... > Hi, Moshe, > > I still did not get it. How to use your method? > > lmFit<-(x,design) which x allow missing values, but design didn't. > > usually, I want to adjust batch effects, I will include it in design > matrix, i.e. > case control batch > 1 0 1 > 0 1 1 > 1 0 2 > which x is expression matrix, and then I will get the differential > expressed gene between case and control with adjusting batch. > > In this case, I just want to simply use "weight" instead of "batch" in > design matrix. But unfortunately, you can't have missing values in design > matrix which I think it's very often if we have many samples. > > 2012/4/4 Moshe Olshansky <olshansky at="" wehi.edu.au=""> > >> Why do you want to include weight in the design matrix? >> It may be more reasonable to include weight in the linear model, i.e. >> expression ~ condition + weight and then your design matrix will have >> two >> columns (if there are two conditions): the first column will contain 0's >> and 1's depending on the condition (group) of the patient while the >> second >> one will be identically 1. And then the weights of the patients will be >> in >> the values (i.e. the second argument to lmFit) which I believe allows >> missing values. >> But this implicitly assumes that the weight (or some function of it - >> you >> can use any transformation you like) contributes additively (and >> linearly) >> to the expression (or it's logarithm). >> >> Moshe. >> >> > Hi, >> > >> > I am using Limma to get differential expressed genes between case and >> > control samples. In addition, I want to adjust the "weight" of >> patients >> by >> > including it in the "design" matrix. However, one weight value of a >> > patient >> > is missing, so I have a "NA" in design matrix. When I run lmFit, I got >> the >> > error message :Error in qr.default(x) : NA/NaN/Inf in foreign function >> > call >> > (arg 1). >> > >> > Can limma handle the missing value in design matrix or I have to >> remove >> > this sample ? >> > >> > Thanks, >> > >> > Chen >> > >> > [[alternative HTML version deleted]] >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor at r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> > http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> >> >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for >> the >> addressee. >> You must not disclose, forward, print or use it without the permission >> of >> the sender. >> ______________________________________________________________________ >> > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 13.0 years ago Moshe Olshansky ▴ 260

0

Entering edit mode

Thanks, Moshe and James Any question about confounding factor if you or anybody can help me. Actually, I have many different phenotype of samples here, such as weight, age, cholesterol. I want to find the differential expressed genes between case and control (say:high blood pressure vs health ). I found some significant genes when I adjust the age, weight. However, when I adjust the cholesterol level, these genes became not significant (*P* values larger). So either cholesterol is confounding factor which I should control , or these differential expressed genes control the cholesterol level which I should not adjust. I am not sure which assumption is right. I am thinking it should be the last one. 2012/4/4 Moshe Olshansky <olshansky@wehi.edu.au> > You are right. My mistake. Sorry... > > > Hi, Moshe, > > > > I still did not get it. How to use your method? > > > > lmFit<-(x,design) which x allow missing values, but design didn't. > > > > usually, I want to adjust batch effects, I will include it in design > > matrix, i.e. > > case control batch > > 1 0 1 > > 0 1 1 > > 1 0 2 > > which x is expression matrix, and then I will get the differential > > expressed gene between case and control with adjusting batch. > > > > In this case, I just want to simply use "weight" instead of "batch" in > > design matrix. But unfortunately, you can't have missing values in design > > matrix which I think it's very often if we have many samples. > > > > 2012/4/4 Moshe Olshansky <olshansky@wehi.edu.au> > > > >> Why do you want to include weight in the design matrix? > >> It may be more reasonable to include weight in the linear model, i.e. > >> expression ~ condition + weight and then your design matrix will have > >> two > >> columns (if there are two conditions): the first column will contain 0's > >> and 1's depending on the condition (group) of the patient while the > >> second > >> one will be identically 1. And then the weights of the patients will be > >> in > >> the values (i.e. the second argument to lmFit) which I believe allows > >> missing values. > >> But this implicitly assumes that the weight (or some function of it - > >> you > >> can use any transformation you like) contributes additively (and > >> linearly) > >> to the expression (or it's logarithm). > >> > >> Moshe. > >> > >> > Hi, > >> > > >> > I am using Limma to get differential expressed genes between case and > >> > control samples. In addition, I want to adjust the "weight" of > >> patients > >> by > >> > including it in the "design" matrix. However, one weight value of a > >> > patient > >> > is missing, so I have a "NA" in design matrix. When I run lmFit, I got > >> the > >> > error message :Error in qr.default(x) : NA/NaN/Inf in foreign function > >> > call > >> > (arg 1). > >> > > >> > Can limma handle the missing value in design matrix or I have to > >> remove > >> > this sample ? > >> > > >> > Thanks, > >> > > >> > Chen > >> > > >> > [[alternative HTML version deleted]] > >> > > >> > _______________________________________________ > >> > Bioconductor mailing list > >> > Bioconductor@r-project.org > >> > https://stat.ethz.ch/mailman/listinfo/bioconductor > >> > Search the archives: > >> > http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > >> > >> > >> > >> ______________________________________________________________________ > >> The information in this email is confidential and intended solely for > >> the > >> addressee. > >> You must not disclose, forward, print or use it without the permission > >> of > >> the sender. > >> ______________________________________________________________________ > >> > > > > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:10}}

ADD REPLY • link 13.0 years ago yao chen ▴ 210

0

Entering edit mode

Hi Chen, On 4/5/2012 9:12 AM, yao chen wrote: > Thanks, Moshe and James > > Any question about confounding factor if you or anybody can help me. > > Actually, I have many different phenotype of samples here, such as > weight, age, cholesterol. I want to find the differential expressed > genes between case and control (say:high blood pressure vs health ). I > found some significant genes when I adjust the age, weight. However, > when I adjust the cholesterol level, these genes became not > significant (/P/ values larger). So either cholesterol is confounding > factor which I should control , or these differential expressed genes > control the cholesterol level which I should not adjust. I am not sure > which assumption is right. I am thinking it should be the last one. I'm not sure I completely understand what you are saying here. It is conventional in this field to state that 'genes are significant', but that is an incomplete statement. The real statement is that the gene expression appears to be significantly differentially expressed between (e.g., treatment and control). So are you saying that the case vs control contrast is no longer significant if you add cholesterol level to the model? If you add cholesterol level to the model, you have moved from an ANOVA model to an ANCOVA model in which you are fitting a linear model between the gene expression and cholesterol level, and allowing different intercepts between the different factor levels, but assuming equal slopes. If the contrasts between factor levels are no longer significant, that implies that the intercepts aren't different. In this case, the cholesterol level 'explains' more about gene expression than the treatment. Whether or not the genes control the cholesterol level or the cholesterol level is affecting gene expression cannot be determined. But what can be said is that once you control for cholesterol level, the differences between the factor levels are no longer significant. Also note that fitting a linear model (as compared to ANOVA) is trickier. For ANOVA, you really just assume that the distribution of the data for the different factor levels is relatively close to normal (or at least 'hump shaped'). With linear regression you can run into problems if your residuals are not correctly distributed, if you have high leverage data points, etc, which are not things you can check for thousands of models. In addition, you are making the assumption that the slopes of the regression lines are parallel, which might not always be valid. You may need to introduce an interaction term to allow for diverging slopes. Best, Jim > > 2012/4/4 Moshe Olshansky <olshansky at="" wehi.edu.au=""> <mailto:olshansky at="" wehi.edu.au="">> > > You are right. My mistake. Sorry... > > > Hi, Moshe, > > > > I still did not get it. How to use your method? > > > > lmFit<-(x,design) which x allow missing values, but design didn't. > > > > usually, I want to adjust batch effects, I will include it in design > > matrix, i.e. > > case control batch > > 1 0 1 > > 0 1 1 > > 1 0 2 > > which x is expression matrix, and then I will get the differential > > expressed gene between case and control with adjusting batch. > > > > In this case, I just want to simply use "weight" instead of > "batch" in > > design matrix. But unfortunately, you can't have missing values > in design > > matrix which I think it's very often if we have many samples. > > > > 2012/4/4 Moshe Olshansky <olshansky at="" wehi.edu.au=""> <mailto:olshansky at="" wehi.edu.au="">> > > > >> Why do you want to include weight in the design matrix? > >> It may be more reasonable to include weight in the linear > model, i.e. > >> expression ~ condition + weight and then your design matrix > will have > >> two > >> columns (if there are two conditions): the first column will > contain 0's > >> and 1's depending on the condition (group) of the patient while the > >> second > >> one will be identically 1. And then the weights of the patients > will be > >> in > >> the values (i.e. the second argument to lmFit) which I believe > allows > >> missing values. > >> But this implicitly assumes that the weight (or some function > of it - > >> you > >> can use any transformation you like) contributes additively (and > >> linearly) > >> to the expression (or it's logarithm). > >> > >> Moshe. > >> > >> > Hi, > >> > > >> > I am using Limma to get differential expressed genes between > case and > >> > control samples. In addition, I want to adjust the "weight" of > >> patients > >> by > >> > including it in the "design" matrix. However, one weight > value of a > >> > patient > >> > is missing, so I have a "NA" in design matrix. When I run > lmFit, I got > >> the > >> > error message :Error in qr.default(x) : NA/NaN/Inf in foreign > function > >> > call > >> > (arg 1). > >> > > >> > Can limma handle the missing value in design matrix or I have to > >> remove > >> > this sample ? > >> > > >> > Thanks, > >> > > >> > Chen > >> > > >> > [[alternative HTML version deleted]] > >> > > >> > _______________________________________________ > >> > Bioconductor mailing list > >> > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > >> > https://stat.ethz.ch/mailman/listinfo/bioconductor > >> > Search the archives: > >> > http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > >> > >> > >> > >> > ______________________________________________________________________ > >> The information in this email is confidential and intended > solely for > >> the > >> addressee. > >> You must not disclose, forward, print or use it without the > permission > >> of > >> the sender. > >> > ______________________________________________________________________ > >> > > > > > > ______________________________________________________________________ > The information in this email is confidential and intended solely > for the addressee. > You must not disclose, forward, print or use it without the > permission of the sender. > ______________________________________________________________________ > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 13.0 years ago James W. MacDonald 68k

0

Entering edit mode

Thanks, James. You are right. I find expression of some genes are significant between treatment and control: I use lmFit(x,design), in which design only include treatment vs control and the case vs control contrast is no longer significant when I add cholesterol level to the model. In this case, cholesterol level is in the design matrix, as case control cholesterol 1 0 130 0 1 120 1 0 110 My question is, the cholesterol explain gene expression more may be the reflection of treatment . That is treatment change the expression of some genes and then cause the cholesterol difference. Therefore it's not the confounding factor. But I don't know how to distinguish them. Any suggestion? Chen 2012/4/5 James W. MacDonald <jmacdon@uw.edu> > Hi Chen, > > > On 4/5/2012 9:12 AM, yao chen wrote: > >> Thanks, Moshe and James >> >> Any question about confounding factor if you or anybody can help me. >> >> Actually, I have many different phenotype of samples here, such as >> weight, age, cholesterol. I want to find the differential expressed genes >> between case and control (say:high blood pressure vs health ). I found some >> significant genes when I adjust the age, weight. However, when I adjust the >> cholesterol level, these genes became not significant (/P/ values larger). >> So either cholesterol is confounding factor which I should control , or >> these differential expressed genes control the cholesterol level which I >> should not adjust. I am not sure which assumption is right. I am thinking >> it should be the last one. >> > > I'm not sure I completely understand what you are saying here. It is > conventional in this field to state that 'genes are significant', but that > is an incomplete statement. The real statement is that the gene expression > appears to be significantly differentially expressed between (e.g., > treatment and control). So are you saying that the case vs control contrast > is no longer significant if you add cholesterol level to the model? > > If you add cholesterol level to the model, you have moved from an ANOVA > model to an ANCOVA model in which you are fitting a linear model between > the gene expression and cholesterol level, and allowing different > intercepts between the different factor levels, but assuming equal slopes. > If the contrasts between factor levels are no longer significant, that > implies that the intercepts aren't different. > > In this case, the cholesterol level 'explains' more about gene expression > than the treatment. Whether or not the genes control the cholesterol level > or the cholesterol level is affecting gene expression cannot be determined. > But what can be said is that once you control for cholesterol level, the > differences between the factor levels are no longer significant. > > Also note that fitting a linear model (as compared to ANOVA) is trickier. > For ANOVA, you really just assume that the distribution of the data for the > different factor levels is relatively close to normal (or at least 'hump > shaped'). With linear regression you can run into problems if your > residuals are not correctly distributed, if you have high leverage data > points, etc, which are not things you can check for thousands of models. In > addition, you are making the assumption that the slopes of the regression > lines are parallel, which might not always be valid. You may need to > introduce an interaction term to allow for diverging slopes. > > Best, > > Jim > > > >> 2012/4/4 Moshe Olshansky <olshansky@wehi.edu.au <mailto:="">> olshansky@wehi.edu.au>**> >> >> >> You are right. My mistake. Sorry... >> >> > Hi, Moshe, >> > >> > I still did not get it. How to use your method? >> > >> > lmFit<-(x,design) which x allow missing values, but design didn't. >> > >> > usually, I want to adjust batch effects, I will include it in design >> > matrix, i.e. >> > case control batch >> > 1 0 1 >> > 0 1 1 >> > 1 0 2 >> > which x is expression matrix, and then I will get the differential >> > expressed gene between case and control with adjusting batch. >> > >> > In this case, I just want to simply use "weight" instead of >> "batch" in >> > design matrix. But unfortunately, you can't have missing values >> in design >> > matrix which I think it's very often if we have many samples. >> > >> > 2012/4/4 Moshe Olshansky <olshansky@wehi.edu.au>> <mailto:olshansky@wehi.edu.au>**> >> >> > >> >> Why do you want to include weight in the design matrix? >> >> It may be more reasonable to include weight in the linear >> model, i.e. >> >> expression ~ condition + weight and then your design matrix >> will have >> >> two >> >> columns (if there are two conditions): the first column will >> contain 0's >> >> and 1's depending on the condition (group) of the patient while the >> >> second >> >> one will be identically 1. And then the weights of the patients >> will be >> >> in >> >> the values (i.e. the second argument to lmFit) which I believe >> allows >> >> missing values. >> >> But this implicitly assumes that the weight (or some function >> of it - >> >> you >> >> can use any transformation you like) contributes additively (and >> >> linearly) >> >> to the expression (or it's logarithm). >> >> >> >> Moshe. >> >> >> >> > Hi, >> >> > >> >> > I am using Limma to get differential expressed genes between >> case and >> >> > control samples. In addition, I want to adjust the "weight" of >> >> patients >> >> by >> >> > including it in the "design" matrix. However, one weight >> value of a >> >> > patient >> >> > is missing, so I have a "NA" in design matrix. When I run >> lmFit, I got >> >> the >> >> > error message :Error in qr.default(x) : NA/NaN/Inf in foreign >> function >> >> > call >> >> > (arg 1). >> >> > >> >> > Can limma handle the missing value in design matrix or I have to >> >> remove >> >> > this sample ? >> >> > >> >> > Thanks, >> >> > >> >> > Chen >> >> > >> >> > [[alternative HTML version deleted]] >> >> > >> >> > ______________________________**_________________ >> >> > Bioconductor mailing list >> >> > Bioconductor@r-project.org <mailto:bioconductor@r-**project.org<bioconductor@r-project.org> >> > >> >> >> > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> >> >> > Search the archives: >> >> > http://news.gmane.org/gmane.**science.biology.informatics.** >> conductor<http: news.gmane.org="" gmane.science.biology.informatics.c="" onductor=""> >> >> > >> >> >> >> >> >> >> >> >> ______________________________**______________________________** >> __________ >> >> The information in this email is confidential and intended >> solely for >> >> the >> >> addressee. >> >> You must not disclose, forward, print or use it without the >> permission >> >> of >> >> the sender. >> >> >> ______________________________**______________________________** >> __________ >> >> >> > >> >> >> >> ______________________________**______________________________** >> __________ >> The information in this email is confidential and intended solely >> for the addressee. >> You must not disclose, forward, print or use it without the >> permission of the sender. >> ______________________________**______________________________** >> __________ >> >> >> > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > [[alternative HTML version deleted]]

ADD REPLY • link 13.0 years ago yao chen ▴ 210

0

Entering edit mode

Hi Chen, On 4/5/2012 10:03 AM, yao chen wrote: > Thanks, James. > > You are right. I find expression of some genes are significant between > treatment and control: I use lmFit(x,design), in which design only > include treatment vs control > > and the case vs control contrast is no longer significant when I add > cholesterol level to the model. In this case, cholesterol level is in > the design matrix, as > case control cholesterol > 1 0 130 > 0 1 120 > 1 0 110 > My question is, the cholesterol explain gene expression more may be > the reflection of treatment . That is treatment change the expression > of some genes and then cause the cholesterol difference. Therefore > it's not the confounding factor. But I don't know how to distinguish them. That is the point I tried to make in my last response. All you know at this point is that the difference in gene expression correlates with cholesterol level. The model only tells you correlation, not causation. If you want to know if the gene causes differences in cholesterol levels or if cholesterol levels cause differential gene expression, you will have to design and perform an experiment to test for the causal factor. Best, Jim > > Any suggestion? > > > Chen > > > > 2012/4/5 James W. MacDonald <jmacdon at="" uw.edu="" <mailto:jmacdon="" at="" uw.edu="">> > > Hi Chen, > > > On 4/5/2012 9:12 AM, yao chen wrote: > > Thanks, Moshe and James > > Any question about confounding factor if you or anybody can > help me. > > Actually, I have many different phenotype of samples here, > such as weight, age, cholesterol. I want to find the > differential expressed genes between case and control > (say:high blood pressure vs health ). I found some significant > genes when I adjust the age, weight. However, when I adjust > the cholesterol level, these genes became not significant (/P/ > values larger). So either cholesterol is confounding factor > which I should control , or these differential expressed genes > control the cholesterol level which I should not adjust. I am > not sure which assumption is right. I am thinking it should be > the last one. > > > I'm not sure I completely understand what you are saying here. It > is conventional in this field to state that 'genes are > significant', but that is an incomplete statement. The real > statement is that the gene expression appears to be significantly > differentially expressed between (e.g., treatment and control). So > are you saying that the case vs control contrast is no longer > significant if you add cholesterol level to the model? > > If you add cholesterol level to the model, you have moved from an > ANOVA model to an ANCOVA model in which you are fitting a linear > model between the gene expression and cholesterol level, and > allowing different intercepts between the different factor levels, > but assuming equal slopes. If the contrasts between factor levels > are no longer significant, that implies that the intercepts aren't > different. > > In this case, the cholesterol level 'explains' more about gene > expression than the treatment. Whether or not the genes control > the cholesterol level or the cholesterol level is affecting gene > expression cannot be determined. But what can be said is that once > you control for cholesterol level, the differences between the > factor levels are no longer significant. > > Also note that fitting a linear model (as compared to ANOVA) is > trickier. For ANOVA, you really just assume that the distribution > of the data for the different factor levels is relatively close to > normal (or at least 'hump shaped'). With linear regression you can > run into problems if your residuals are not correctly distributed, > if you have high leverage data points, etc, which are not things > you can check for thousands of models. In addition, you are making > the assumption that the slopes of the regression lines are > parallel, which might not always be valid. You may need to > introduce an interaction term to allow for diverging slopes. > > Best, > > Jim > > > > 2012/4/4 Moshe Olshansky <olshansky at="" wehi.edu.au=""> <mailto:olshansky at="" wehi.edu.au=""> <mailto:olshansky at="" wehi.edu.au=""> <mailto:olshansky at="" wehi.edu.au="">>> > > > You are right. My mistake. Sorry... > > > Hi, Moshe, > > > > I still did not get it. How to use your method? > > > > lmFit<-(x,design) which x allow missing values, but design > didn't. > > > > usually, I want to adjust batch effects, I will include it > in design > > matrix, i.e. > > case control batch > > 1 0 1 > > 0 1 1 > > 1 0 2 > > which x is expression matrix, and then I will get the > differential > > expressed gene between case and control with adjusting batch. > > > > In this case, I just want to simply use "weight" instead of > "batch" in > > design matrix. But unfortunately, you can't have missing values > in design > > matrix which I think it's very often if we have many samples. > > > > 2012/4/4 Moshe Olshansky <olshansky at="" wehi.edu.au=""> <mailto:olshansky at="" wehi.edu.au=""> > <mailto:olshansky at="" wehi.edu.au="" <mailto:olshansky="" at="" wehi.edu.au="">>> > > > > >> Why do you want to include weight in the design matrix? > >> It may be more reasonable to include weight in the linear > model, i.e. > >> expression ~ condition + weight and then your design matrix > will have > >> two > >> columns (if there are two conditions): the first column will > contain 0's > >> and 1's depending on the condition (group) of the patient > while the > >> second > >> one will be identically 1. And then the weights of the patients > will be > >> in > >> the values (i.e. the second argument to lmFit) which I believe > allows > >> missing values. > >> But this implicitly assumes that the weight (or some function > of it - > >> you > >> can use any transformation you like) contributes additively > (and > >> linearly) > >> to the expression (or it's logarithm). > >> > >> Moshe. > >> > >> > Hi, > >> > > >> > I am using Limma to get differential expressed genes between > case and > >> > control samples. In addition, I want to adjust the > "weight" of > >> patients > >> by > >> > including it in the "design" matrix. However, one weight > value of a > >> > patient > >> > is missing, so I have a "NA" in design matrix. When I run > lmFit, I got > >> the > >> > error message :Error in qr.default(x) : NA/NaN/Inf in foreign > function > >> > call > >> > (arg 1). > >> > > >> > Can limma handle the missing value in design matrix or I > have to > >> remove > >> > this sample ? > >> > > >> > Thanks, > >> > > >> > Chen > >> > > >> > [[alternative HTML version deleted]] > >> > > >> > _______________________________________________ > >> > Bioconductor mailing list > >> > Bioconductor at r-project.org > <mailto:bioconductor at="" r-project.org=""> > <mailto:bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">> > > >> > https://stat.ethz.ch/mailman/listinfo/bioconductor > >> > Search the archives: > >> > > http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > >> > >> > >> > >> > > ______________________________________________________________________ > >> The information in this email is confidential and intended > solely for > >> the > >> addressee. > >> You must not disclose, forward, print or use it without the > permission > >> of > >> the sender. > >> > > ______________________________________________________________________ > >> > > > > > > > ______________________________________________________________________ > The information in this email is confidential and intended > solely > for the addressee. > You must not disclose, forward, print or use it without the > permission of the sender. > > ______________________________________________________________________ > > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 13.0 years ago James W. MacDonald 68k

0

Entering edit mode

Thanks a lot. Chen 2012/4/5 James W. MacDonald <jmacdon@uw.edu> > Hi Chen, > > > On 4/5/2012 10:03 AM, yao chen wrote: > >> Thanks, James. >> >> You are right. I find expression of some genes are significant between >> treatment and control: I use lmFit(x,design), in which design only include >> treatment vs control >> >> and the case vs control contrast is no longer significant when I add >> cholesterol level to the model. In this case, cholesterol level is in the >> design matrix, as >> case control cholesterol >> 1 0 130 >> 0 1 120 >> 1 0 110 >> My question is, the cholesterol explain gene expression more may be the >> reflection of treatment . That is treatment change the expression of some >> genes and then cause the cholesterol difference. Therefore it's not the >> confounding factor. But I don't know how to distinguish them. >> > > That is the point I tried to make in my last response. All you know at > this point is that the difference in gene expression correlates with > cholesterol level. The model only tells you correlation, not causation. > > If you want to know if the gene causes differences in cholesterol levels > or if cholesterol levels cause differential gene expression, you will have > to design and perform an experiment to test for the causal factor. > > Best, > > Jim > > > >> Any suggestion? >> >> >> Chen >> >> >> >> 2012/4/5 James W. MacDonald <jmacdon@uw.edu <mailto:jmacdon@uw.edu="">> >> >> >> Hi Chen, >> >> >> On 4/5/2012 9:12 AM, yao chen wrote: >> >> Thanks, Moshe and James >> >> Any question about confounding factor if you or anybody can >> help me. >> >> Actually, I have many different phenotype of samples here, >> such as weight, age, cholesterol. I want to find the >> differential expressed genes between case and control >> (say:high blood pressure vs health ). I found some significant >> genes when I adjust the age, weight. However, when I adjust >> the cholesterol level, these genes became not significant (/P/ >> values larger). So either cholesterol is confounding factor >> which I should control , or these differential expressed genes >> control the cholesterol level which I should not adjust. I am >> not sure which assumption is right. I am thinking it should be >> the last one. >> >> >> I'm not sure I completely understand what you are saying here. It >> is conventional in this field to state that 'genes are >> significant', but that is an incomplete statement. The real >> statement is that the gene expression appears to be significantly >> differentially expressed between (e.g., treatment and control). So >> are you saying that the case vs control contrast is no longer >> significant if you add cholesterol level to the model? >> >> If you add cholesterol level to the model, you have moved from an >> ANOVA model to an ANCOVA model in which you are fitting a linear >> model between the gene expression and cholesterol level, and >> allowing different intercepts between the different factor levels, >> but assuming equal slopes. If the contrasts between factor levels >> are no longer significant, that implies that the intercepts aren't >> different. >> >> In this case, the cholesterol level 'explains' more about gene >> expression than the treatment. Whether or not the genes control >> the cholesterol level or the cholesterol level is affecting gene >> expression cannot be determined. But what can be said is that once >> you control for cholesterol level, the differences between the >> factor levels are no longer significant. >> >> Also note that fitting a linear model (as compared to ANOVA) is >> trickier. For ANOVA, you really just assume that the distribution >> of the data for the different factor levels is relatively close to >> normal (or at least 'hump shaped'). With linear regression you can >> run into problems if your residuals are not correctly distributed, >> if you have high leverage data points, etc, which are not things >> you can check for thousands of models. In addition, you are making >> the assumption that the slopes of the regression lines are >> parallel, which might not always be valid. You may need to >> introduce an interaction term to allow for diverging slopes. >> >> Best, >> >> Jim >> >> >> >> 2012/4/4 Moshe Olshansky <olshansky@wehi.edu.au>> <mailto:olshansky@wehi.edu.au> <mailto:olshansky@wehi.edu.au>> >> <mailto:olshansky@wehi.edu.au>**>> >> >> >> You are right. My mistake. Sorry... >> >> > Hi, Moshe, >> > >> > I still did not get it. How to use your method? >> > >> > lmFit<-(x,design) which x allow missing values, but design >> didn't. >> > >> > usually, I want to adjust batch effects, I will include it >> in design >> > matrix, i.e. >> > case control batch >> > 1 0 1 >> > 0 1 1 >> > 1 0 2 >> > which x is expression matrix, and then I will get the >> differential >> > expressed gene between case and control with adjusting batch. >> > >> > In this case, I just want to simply use "weight" instead of >> "batch" in >> > design matrix. But unfortunately, you can't have missing values >> in design >> > matrix which I think it's very often if we have many samples. >> > >> > 2012/4/4 Moshe Olshansky <olshansky@wehi.edu.au>> <mailto:olshansky@wehi.edu.au> >> <mailto:olshansky@wehi.edu.au <mailto:olshansky@wehi.edu.au="">**>> >> >> >> > >> >> Why do you want to include weight in the design matrix? >> >> It may be more reasonable to include weight in the linear >> model, i.e. >> >> expression ~ condition + weight and then your design matrix >> will have >> >> two >> >> columns (if there are two conditions): the first column will >> contain 0's >> >> and 1's depending on the condition (group) of the patient >> while the >> >> second >> >> one will be identically 1. And then the weights of the patients >> will be >> >> in >> >> the values (i.e. the second argument to lmFit) which I believe >> allows >> >> missing values. >> >> But this implicitly assumes that the weight (or some function >> of it - >> >> you >> >> can use any transformation you like) contributes additively >> (and >> >> linearly) >> >> to the expression (or it's logarithm). >> >> >> >> Moshe. >> >> >> >> > Hi, >> >> > >> >> > I am using Limma to get differential expressed genes between >> case and >> >> > control samples. In addition, I want to adjust the >> "weight" of >> >> patients >> >> by >> >> > including it in the "design" matrix. However, one weight >> value of a >> >> > patient >> >> > is missing, so I have a "NA" in design matrix. When I run >> lmFit, I got >> >> the >> >> > error message :Error in qr.default(x) : NA/NaN/Inf in foreign >> function >> >> > call >> >> > (arg 1). >> >> > >> >> > Can limma handle the missing value in design matrix or I >> have to >> >> remove >> >> > this sample ? >> >> > >> >> > Thanks, >> >> > >> >> > Chen >> >> > >> >> > [[alternative HTML version deleted]] >> >> > >> >> > ______________________________**_________________ >> >> > Bioconductor mailing list >> >> > Bioconductor@r-project.org >> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">> >> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org=""> >> >> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org=""> >> >> >> >> >> > https://stat.ethz.ch/mailman/**listinfo/bioconductor<ht tps:="" stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> >> >> > Search the archives: >> >> > >> http://news.gmane.org/gmane.**science.biology.informatics.** >> conductor<http: news.gmane.org="" gmane.science.biology.informatics.c="" onductor=""> >> >> > >> >> >> >> >> >> >> >> >> ______________________________** >> ______________________________**__________ >> >> The information in this email is confidential and intended >> solely for >> >> the >> >> addressee. >> >> You must not disclose, forward, print or use it without the >> permission >> >> of >> >> the sender. >> >> >> ______________________________** >> ______________________________**__________ >> >> >> > >> >> >> >> ______________________________** >> ______________________________**__________ >> The information in this email is confidential and intended >> solely >> for the addressee. >> You must not disclose, forward, print or use it without the >> permission of the sender. >> ______________________________** >> ______________________________**__________ >> >> >> >> -- James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> >> >> > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > [[alternative HTML version deleted]]

ADD REPLY • link 13.0 years ago yao chen ▴ 210

Login before adding your answer.