Question

Limma Design for Paired Samples Question

2

Entering edit mode

zeynep özkeserli ▴ 160

@zeynep-ozkeserli-5250

Last seen 10.3 years ago

Turkey

Dear Gordon and All, I have a question about creating the proper design matrix for my data. I am using Affymetrix GeneChips and the experimental design is like that: - We have two groups, which are "normal" and "diseased". For each person in each group, we collected tissues from two zones of the same organ. So my data looks like this: - normal tissue1 tissue2 - diseased tissue1 tissue2 So these are paired samples. Our questions are as follows: 1- Which genes are expressed differentially because of being normal or diseased. (the effect of being normal and diseased on gene expression) 2- Which genes are expressed differentially because of being tissue1 or tissue2. (the effect of being tissue type 1 or 2 on gene expression) 3- Which genes are expressed differentially in normal group between tissue1 and tissue2 4- Which genes are expressed differentially in diseased group between tissue1 and tissue2 For this purpose, I prepared a targets file in which I encoded the samples like I indicated. Which looks like: SlideNumber FileName Condition TissueType 1 AF01L.CEL AF LA 2 AF01R.CEL AF RA 3 AF02L.CEL AF LA 4 AF02R.CEL AF RA 5 AF03L.CEL AF LA 6 AF03R.CEL AF RA . . . 42 SR10L.CEL SR LA 43 SR10R.CEL SR RA 44 SR11L.CEL SR LA There are 45 CEL files, 22 with pairs and one single (would this be a problem?Should I take out this sample?). So I followed the code in limma users guide section 8.3 Paired Samples, which is: SibShip <- factor(targets$TissueType) Treat <- factor(targets$Condition, levels=c("AF","SR")) design <- model.matrix(~SibShip+Treat) fit <- lmFit(eset, design) fit <- eBayes(fit) With this, I think :) I formed a model with two factors: Condition as the main factor, and TissueType as the blocking factor. After fitting, my coefficients in MArrayLM is (Intercept) SibShipRA TreatSR 1007_s_at 7.828460 0.15501147 0.232908384 1053_at 3.868841 -0.01223435 -0.134041754 117_at 3.016848 0.08888230 0.197765562 121_at 4.721647 -0.04720335 0.052025910 1255_g_at 2.520547 -0.07208771 0.009031277 . . . So; 1- For my first question, which questions the effect of the main factor "Condition", (saved as Treat <- factor(targets$Condition, levels=c("AF","SR"))) topTable(fit, coef="TreatSR") gives me the answer. Is it true? 2- Same for the blocking factor, topTable(fit, coef="SibShipRA") gives me the answer for TissueType effect. Is this also true? >From previous correspondences I concluded that I could not get answers for my questions 3 and 4 with this model. And it might be meaningful to add an interaction term to the model. So for 3 and 4, I tried this: design<-model.matrix(~Sibship + Treat + Sibship:Treat) (because I thought Treat is varying faster, but I don't know how to check it.) After applying this, 1- Results for Treat drastically changed. I got nearly completely different top genes with topTable 2- Results for SibShip also changed, not that much, but p values were higher. 3- The fold change values for the same genes were different between two models. So I am suspicious. Am I doing something wrong here, or am I missing something? Any help would be great! Thank you in advance, Zeynep [[alternative HTML version deleted]]

limma limma • 7.5k views

ADD COMMENT • link updated 12.8 years ago by Gordon Smyth 52k • written 12.8 years ago by zeynep özkeserli ▴ 160

score 2 · Answer 1 · 2012-07-17

Dear Zeynep, You are right to be suspicious, because the design of your experiment is more complex than your analysis has allowed for. Your experiment has two levels. First of all, suppose that you only wanted to compared tissue1 (LA) to tissue2 (RA). In that case you would have paired samples, and the comparison would be made within each subject, because each person returns both tissues. However, you also want to compare normal to diseased. This comparison can only be made between subjects, because each person can only be one of diseased or normal. Hence you need a type of analysis that allows you to make comparisons both within and between subjects (persons). This requires that you treat person as a random effect. To do this is limma, you need the following: Person <- substring(targets$FileName,3,4) Treat <- factor(paste(targets$Condition,targets$TissueType,sep=".")) design <- model.matrix(~0+Treat) colnames(design) <- levels(Treat) corfit <- duplicateCorrelation(eset,design,block=Person) corfit$consensus This should give a positive correlation value. It represents the correlation between measurements made on the same person. fit <- lmFit(eset,design,block=Person,correlation=corfit$consensus) Now you can make any comparison you want, for example: cm <- makeContrasts(AF.RAvsLA=AF.RA-AF.LA, RA.AFvsSR=AF.RA-SR.RA, levels=design) I have shown just two contrasts, but you could make any number of comparisons in this way. Then fit2 <- contrasts.fit(fit, cm) fit2 <- eBayes(fit2) Then topTable(fit2, coef="AF.RAvsLA") will give you genes that are differentially expressed between the RA and LA for AF subjects. Or topTable(fit2, coef="RA.AFvsSR") will find genes that are differentially expressed between the normal and diseased subjects in RA tissue. And so on. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. Tel: (03) 9345 2326, Fax (03) 9347 0852, http://www.statsci.org/smyth On Mon, 16 Jul 2012, zeynep ?zkeserli wrote: > Dear Gordon and All, > > I have a question about creating the proper design matrix for my data. > > I am using Affymetrix GeneChips and the experimental design is like that: > > - We have two groups, which are "normal" and "diseased". For each person in > each group, we collected tissues from two zones of the same organ. So my > data looks like this: > > - normal > > tissue1 > tissue2 > > - diseased > > tissue1 > tissue2 > > So these are paired samples. > > Our questions are as follows: > > 1- Which genes are expressed differentially because of being normal or > diseased. (the effect of being normal and diseased on gene expression) > 2- Which genes are expressed differentially because of being tissue1 or > tissue2. (the effect of being tissue type 1 or 2 on gene expression) > 3- Which genes are expressed differentially in normal group between tissue1 > and tissue2 > 4- Which genes are expressed differentially in diseased group between > tissue1 and tissue2 > > For this purpose, I prepared a targets file in which I encoded the samples > like I indicated. Which looks like: > > SlideNumber FileName Condition TissueType > 1 AF01L.CEL AF LA > 2 AF01R.CEL AF RA > 3 AF02L.CEL AF LA > 4 AF02R.CEL AF RA > 5 AF03L.CEL AF LA > 6 AF03R.CEL AF RA > . > . > . > 42 SR10L.CEL SR LA > 43 SR10R.CEL SR RA > 44 SR11L.CEL SR LA > > > There are 45 CEL files, 22 with pairs and one single (would this be a > problem? Should I take out this sample?). > > So I followed the code in limma users guide section 8.3 Paired Samples, > which is: > > SibShip <- factor(targets$TissueType) > Treat <- factor(targets$Condition, levels=c("AF","SR")) > design <- model.matrix(~SibShip+Treat) > fit <- lmFit(eset, design) > fit <- eBayes(fit) > > With this, I think :) I formed a model with two factors: Condition as the > main factor, and TissueType as the blocking factor. > After fitting, my coefficients in MArrayLM is > > (Intercept) SibShipRA TreatSR > 1007_s_at 7.828460 0.15501147 0.232908384 > 1053_at 3.868841 -0.01223435 -0.134041754 > 117_at 3.016848 0.08888230 0.197765562 > 121_at 4.721647 -0.04720335 0.052025910 > 1255_g_at 2.520547 -0.07208771 0.009031277 > . > . > . > > So; > > 1- For my first question, which questions the effect of the main factor > "Condition", (saved as Treat <- factor(targets$Condition, > levels=c("AF","SR"))) > > topTable(fit, coef="TreatSR") > > gives me the answer. Is it true? > > 2- Same for the blocking factor, > > topTable(fit, coef="SibShipRA") > > gives me the answer for TissueType effect. Is this also true? > > From previous correspondences I concluded that I could not get answers > for my questions 3 and 4 with this model. And it might be meaningful to > add an interaction term to the model. So for 3 and 4, I tried this: > > design<-model.matrix(~Sibship + Treat + Sibship:Treat) > > (because I thought Treat is varying faster, but I don't know how to > check it.) > > After applying this, > > 1- Results for Treat drastically changed. I got nearly completely > different top genes with topTable > 2- Results for SibShip also changed, not that much, but p values were > higher. > 3- The fold change values for the same genes were different between two > models. > > So I am suspicious. Am I doing something wrong here, or am I missing > something? > > Any help would be great! > Thank you in advance, > > Zeynep > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:5}}