Limma Design for Paired Samples Question
1
2
Entering edit mode
@zeynep-ozkeserli-5250
Last seen 10.0 years ago
Turkey
Dear Gordon and All, I have a question about creating the proper design matrix for my data. I am using Affymetrix GeneChips and the experimental design is like that: - We have two groups, which are "normal" and "diseased". For each person in each group, we collected tissues from two zones of the same organ. So my data looks like this: - normal tissue1 tissue2 - diseased tissue1 tissue2 So these are paired samples. Our questions are as follows: 1- Which genes are expressed differentially because of being normal or diseased. (the effect of being normal and diseased on gene expression) 2- Which genes are expressed differentially because of being tissue1 or tissue2. (the effect of being tissue type 1 or 2 on gene expression) 3- Which genes are expressed differentially in normal group between tissue1 and tissue2 4- Which genes are expressed differentially in diseased group between tissue1 and tissue2 For this purpose, I prepared a targets file in which I encoded the samples like I indicated. Which looks like: SlideNumber FileName Condition TissueType 1 AF01L.CEL AF LA 2 AF01R.CEL AF RA 3 AF02L.CEL AF LA 4 AF02R.CEL AF RA 5 AF03L.CEL AF LA 6 AF03R.CEL AF RA . . . 42 SR10L.CEL SR LA 43 SR10R.CEL SR RA 44 SR11L.CEL SR LA There are 45 CEL files, 22 with pairs and one single (would this be a problem?Should I take out this sample?). So I followed the code in limma users guide section 8.3 Paired Samples, which is: SibShip <- factor(targets$TissueType) Treat <- factor(targets$Condition, levels=c("AF","SR")) design <- model.matrix(~SibShip+Treat) fit <- lmFit(eset, design) fit <- eBayes(fit) With this, I think :) I formed a model with two factors: Condition as the main factor, and TissueType as the blocking factor. After fitting, my coefficients in MArrayLM is (Intercept) SibShipRA TreatSR 1007_s_at 7.828460 0.15501147 0.232908384 1053_at 3.868841 -0.01223435 -0.134041754 117_at 3.016848 0.08888230 0.197765562 121_at 4.721647 -0.04720335 0.052025910 1255_g_at 2.520547 -0.07208771 0.009031277 . . . So; 1- For my first question, which questions the effect of the main factor "Condition", (saved as Treat <- factor(targets$Condition, levels=c("AF","SR"))) topTable(fit, coef="TreatSR") gives me the answer. Is it true? 2- Same for the blocking factor, topTable(fit, coef="SibShipRA") gives me the answer for TissueType effect. Is this also true? >From previous correspondences I concluded that I could not get answers for my questions 3 and 4 with this model. And it might be meaningful to add an interaction term to the model. So for 3 and 4, I tried this: design<-model.matrix(~Sibship + Treat + Sibship:Treat) (because I thought Treat is varying faster, but I don't know how to check it.) After applying this, 1- Results for Treat drastically changed. I got nearly completely different top genes with topTable 2- Results for SibShip also changed, not that much, but p values were higher. 3- The fold change values for the same genes were different between two models. So I am suspicious. Am I doing something wrong here, or am I missing something? Any help would be great! Thank you in advance, Zeynep [[alternative HTML version deleted]]
limma limma • 7.4k views
ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 39 minutes ago
WEHI, Melbourne, Australia
Dear Zeynep, You are right to be suspicious, because the design of your experiment is more complex than your analysis has allowed for. Your experiment has two levels. First of all, suppose that you only wanted to compared tissue1 (LA) to tissue2 (RA). In that case you would have paired samples, and the comparison would be made within each subject, because each person returns both tissues. However, you also want to compare normal to diseased. This comparison can only be made between subjects, because each person can only be one of diseased or normal. Hence you need a type of analysis that allows you to make comparisons both within and between subjects (persons). This requires that you treat person as a random effect. To do this is limma, you need the following: Person <- substring(targets$FileName,3,4) Treat <- factor(paste(targets$Condition,targets$TissueType,sep=".")) design <- model.matrix(~0+Treat) colnames(design) <- levels(Treat) corfit <- duplicateCorrelation(eset,design,block=Person) corfit$consensus This should give a positive correlation value. It represents the correlation between measurements made on the same person. fit <- lmFit(eset,design,block=Person,correlation=corfit$consensus) Now you can make any comparison you want, for example: cm <- makeContrasts(AF.RAvsLA=AF.RA-AF.LA, RA.AFvsSR=AF.RA-SR.RA, levels=design) I have shown just two contrasts, but you could make any number of comparisons in this way. Then fit2 <- contrasts.fit(fit, cm) fit2 <- eBayes(fit2) Then topTable(fit2, coef="AF.RAvsLA") will give you genes that are differentially expressed between the RA and LA for AF subjects. Or topTable(fit2, coef="RA.AFvsSR") will find genes that are differentially expressed between the normal and diseased subjects in RA tissue. And so on. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. Tel: (03) 9345 2326, Fax (03) 9347 0852, http://www.statsci.org/smyth On Mon, 16 Jul 2012, zeynep ?zkeserli wrote: > Dear Gordon and All, > > I have a question about creating the proper design matrix for my data. > > I am using Affymetrix GeneChips and the experimental design is like that: > > - We have two groups, which are "normal" and "diseased". For each person in > each group, we collected tissues from two zones of the same organ. So my > data looks like this: > > - normal > > tissue1 > tissue2 > > - diseased > > tissue1 > tissue2 > > So these are paired samples. > > Our questions are as follows: > > 1- Which genes are expressed differentially because of being normal or > diseased. (the effect of being normal and diseased on gene expression) > 2- Which genes are expressed differentially because of being tissue1 or > tissue2. (the effect of being tissue type 1 or 2 on gene expression) > 3- Which genes are expressed differentially in normal group between tissue1 > and tissue2 > 4- Which genes are expressed differentially in diseased group between > tissue1 and tissue2 > > For this purpose, I prepared a targets file in which I encoded the samples > like I indicated. Which looks like: > > SlideNumber FileName Condition TissueType > 1 AF01L.CEL AF LA > 2 AF01R.CEL AF RA > 3 AF02L.CEL AF LA > 4 AF02R.CEL AF RA > 5 AF03L.CEL AF LA > 6 AF03R.CEL AF RA > . > . > . > 42 SR10L.CEL SR LA > 43 SR10R.CEL SR RA > 44 SR11L.CEL SR LA > > > There are 45 CEL files, 22 with pairs and one single (would this be a > problem? Should I take out this sample?). > > So I followed the code in limma users guide section 8.3 Paired Samples, > which is: > > SibShip <- factor(targets$TissueType) > Treat <- factor(targets$Condition, levels=c("AF","SR")) > design <- model.matrix(~SibShip+Treat) > fit <- lmFit(eset, design) > fit <- eBayes(fit) > > With this, I think :) I formed a model with two factors: Condition as the > main factor, and TissueType as the blocking factor. > After fitting, my coefficients in MArrayLM is > > (Intercept) SibShipRA TreatSR > 1007_s_at 7.828460 0.15501147 0.232908384 > 1053_at 3.868841 -0.01223435 -0.134041754 > 117_at 3.016848 0.08888230 0.197765562 > 121_at 4.721647 -0.04720335 0.052025910 > 1255_g_at 2.520547 -0.07208771 0.009031277 > . > . > . > > So; > > 1- For my first question, which questions the effect of the main factor > "Condition", (saved as Treat <- factor(targets$Condition, > levels=c("AF","SR"))) > > topTable(fit, coef="TreatSR") > > gives me the answer. Is it true? > > 2- Same for the blocking factor, > > topTable(fit, coef="SibShipRA") > > gives me the answer for TissueType effect. Is this also true? > > From previous correspondences I concluded that I could not get answers > for my questions 3 and 4 with this model. And it might be meaningful to > add an interaction term to the model. So for 3 and 4, I tried this: > > design<-model.matrix(~Sibship + Treat + Sibship:Treat) > > (because I thought Treat is varying faster, but I don't know how to > check it.) > > After applying this, > > 1- Results for Treat drastically changed. I got nearly completely > different top genes with topTable > 2- Results for SibShip also changed, not that much, but p values were > higher. > 3- The fold change values for the same genes were different between two > models. > > So I am suspicious. Am I doing something wrong here, or am I missing > something? > > Any help would be great! > Thank you in advance, > > Zeynep > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:5}}
ADD COMMENT
0
Entering edit mode
Dear Gordon, Thank you very much for your kind, fast and detailed answer. I performed the limma analysis following your suggestions, without any problems and I think we've got some meaningful results. But at this point, I have another question regarding to sample size. I don't know if it is convenient to ask this question or not. If it is not, I am so sorry in advance. Q1- Is our dataset sufficient for performing all of these comparisons? I am sure we must have made serious calculations i.e. power analysis on this but maybe you can roughly warn us for some of them. Is there any hypothesis for which we can't get reliable results with this dataset? For this, let me remind you the group sizes, and the comparisons we are making: n_Diseased= 23 n_Normal= 22 n_LA= 23 n_RA= 22 n_Diseased&LA= 12 n_Normal&LA= 11 n_Diseased&RA= 11 n_Normal&LA= 11 Comparisons: 1- Which genes are expressed differentially because of being normal or diseased. (the effect of being normal and diseased on gene expression) 2- Which genes are expressed differentially because of being tissue1 or tissue2. (the effect of being tissue type 1 or 2 on gene expression) 3- Which genes are expressed differentially in normal group between tissue1 and tissue2 4- Which genes are expressed differentially in diseased group between tissue1 and tissue2 Q2: Is the model I applied on our data "Two way random effects model"? Thank you for your previous support again and thank you in advance for your answers. So much appreciated. Zeynep On Tue, Jul 17, 2012 at 11:17 AM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > Dear Zeynep, > > You are right to be suspicious, because the design of your experiment is > more complex than your analysis has allowed for. > > Your experiment has two levels. First of all, suppose that you only > wanted to compared tissue1 (LA) to tissue2 (RA). In that case you would > have paired samples, and the comparison would be made within each subject, > because each person returns both tissues. > > However, you also want to compare normal to diseased. This comparison can > only be made between subjects, because each person can only be one of > diseased or normal. > > Hence you need a type of analysis that allows you to make comparisons both > within and between subjects (persons). This requires that you treat person > as a random effect. To do this is limma, you need the following: > > Person <- substring(targets$FileName,3,**4) > Treat <- factor(paste(targets$**Condition,targets$TissueType,**sep=".")) > design <- model.matrix(~0+Treat) > colnames(design) <- levels(Treat) > corfit <- duplicateCorrelation(eset,**design,block=Person) > corfit$consensus > > This should give a positive correlation value. It represents the > correlation between measurements made on the same person. > > fit <- lmFit(eset,design,block=**Person,correlation=corfit$**consensus) > > Now you can make any comparison you want, for example: > > cm <- makeContrasts(AF.RAvsLA=AF.RA-**AF.LA <http: af.ra-af.la="">, > RA.AFvsSR=AF.RA-SR.RA, levels=design) > > I have shown just two contrasts, but you could make any number of > comparisons in this way. Then > > fit2 <- contrasts.fit(fit, cm) > fit2 <- eBayes(fit2) > > Then > > topTable(fit2, coef="AF.RAvsLA") > > will give you genes that are differentially expressed between the RA and > LA for AF subjects. Or > > topTable(fit2, coef="RA.AFvsSR") > > will find genes that are differentially expressed between the normal and > diseased subjects in RA tissue. > > And so on. > > Best wishes > Gordon > > ------------------------------**--------------- > Professor Gordon K Smyth, > Bioinformatics Division, > Walter and Eliza Hall Institute of Medical Research, > 1G Royal Parade, Parkville, Vic 3052, Australia. > Tel: (03) 9345 2326, Fax (03) 9347 0852, > http://www.statsci.org/smyth > > > On Mon, 16 Jul 2012, zeynep özkeserli wrote: > > Dear Gordon and All, >> >> I have a question about creating the proper design matrix for my data. >> >> I am using Affymetrix GeneChips and the experimental design is like that: >> >> - We have two groups, which are "normal" and "diseased". For each person >> in >> each group, we collected tissues from two zones of the same organ. So my >> data looks like this: >> >> - normal >> >> tissue1 >> tissue2 >> >> - diseased >> >> tissue1 >> tissue2 >> >> So these are paired samples. >> >> Our questions are as follows: >> >> 1- Which genes are expressed differentially because of being normal or >> diseased. (the effect of being normal and diseased on gene expression) >> 2- Which genes are expressed differentially because of being tissue1 or >> tissue2. (the effect of being tissue type 1 or 2 on gene expression) >> 3- Which genes are expressed differentially in normal group between >> tissue1 >> and tissue2 >> 4- Which genes are expressed differentially in diseased group between >> tissue1 and tissue2 >> >> For this purpose, I prepared a targets file in which I encoded the samples >> like I indicated. Which looks like: >> >> SlideNumber FileName Condition TissueType >> 1 AF01L.CEL AF LA >> 2 AF01R.CEL AF RA >> 3 AF02L.CEL AF LA >> 4 AF02R.CEL AF RA >> 5 AF03L.CEL AF LA >> 6 AF03R.CEL AF RA >> . >> . >> . >> 42 SR10L.CEL SR LA >> 43 SR10R.CEL SR RA >> 44 SR11L.CEL SR LA >> >> >> There are 45 CEL files, 22 with pairs and one single (would this be a >> problem? Should I take out this sample?). >> >> >> So I followed the code in limma users guide section 8.3 Paired Samples, >> which is: >> >> SibShip <- factor(targets$TissueType) >> Treat <- factor(targets$Condition, levels=c("AF","SR")) >> design <- model.matrix(~SibShip+Treat) >> fit <- lmFit(eset, design) >> fit <- eBayes(fit) >> >> With this, I think :) I formed a model with two factors: Condition as the >> main factor, and TissueType as the blocking factor. >> After fitting, my coefficients in MArrayLM is >> >> (Intercept) SibShipRA TreatSR >> 1007_s_at 7.828460 0.15501147 0.232908384 >> 1053_at 3.868841 -0.01223435 -0.134041754 >> 117_at 3.016848 0.08888230 0.197765562 >> 121_at 4.721647 -0.04720335 0.052025910 >> 1255_g_at 2.520547 -0.07208771 0.009031277 >> . >> . >> . >> >> So; >> >> 1- For my first question, which questions the effect of the main factor >> "Condition", (saved as Treat <- factor(targets$Condition, >> levels=c("AF","SR"))) >> >> topTable(fit, coef="TreatSR") >> >> gives me the answer. Is it true? >> >> 2- Same for the blocking factor, >> >> topTable(fit, coef="SibShipRA") >> >> gives me the answer for TissueType effect. Is this also true? >> >> From previous correspondences I concluded that I could not get answers >> for my questions 3 and 4 with this model. And it might be meaningful to add >> an interaction term to the model. So for 3 and 4, I tried this: >> >> design<-model.matrix(~Sibship + Treat + Sibship:Treat) >> >> (because I thought Treat is varying faster, but I don't know how to check >> it.) >> >> After applying this, >> >> 1- Results for Treat drastically changed. I got nearly completely >> different top genes with topTable >> 2- Results for SibShip also changed, not that much, but p values were >> higher. >> 3- The fold change values for the same genes were different between two >> models. >> >> So I am suspicious. Am I doing something wrong here, or am I missing >> something? >> >> Any help would be great! >> Thank you in advance, >> >> Zeynep >> >> > ______________________________**______________________________**____ ______ > The information in this email is confidential and inte...{{dropped:9}}
ADD REPLY

Login before adding your answer.

Traffic: 584 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6