Question

help for designing a multifactorial GLM framework edgeR

0

Entering edit mode

damiano.porcelli • 0

@damianoporcelli-7118

Last seen 9.5 years ago

United Kingdom

Hello,

I have a big RNA-seq experiment and an "easy" biological question but not sure how to make the exact design and contrast within the GLM framework in edgeR. Any help is much appreciated.

I have 6 populations, 3 replicate x population (Cages), 2 treatments (cold and benign), 3 time blocks (Season). A dataframe showing the samples and experimental factors is reported below

My basic question is to find shared differentially expressed genes between cold and benign treatments, between all populations and between different seasons. In other words, I a looking for consistent expression changes moving from benign to cold at any given transition. I would also consider not using the "Cage" description to make the design more easy.

Is there a specific design and contrast I should use? Or should I use more contrast and then make a comparison between the outputs? I can't find any valuable example in the edgeR guide that would fit my problem.

Also, since some populations have not been tested in one season (winter2013) and some treatment are missing (e.g. Groningen/benign/winter.2011), some interactions won't work because the glmFit error "Design matrix not of full rank".

Thanks for the help!

Season	Popularion	Cage	Treatment	Sample
Winter.2011	Barcelona	1	Benign	BAR1A
Winter.2011	Barcelona	1	Cold	BAR1B
Winter.2011	Barcelona	2	Benign	BAR2A
Winter.2011	Barcelona	2	Cold	BAR2B
Winter.2011	Barcelona	3	Benign	BAR3A
Winter.2011	Barcelona	3	Cold	BAR3B
Winter.2011	Gif	1	Benign	GIF1A
Winter.2011	Gif	1	Cold	GIF1B
Winter.2011	Gif	2	Benign	GIF2A
Winter.2011	Gif	2	Cold	GIF2B
Winter.2011	Gif	3	Benign	GIF3A
Winter.2011	Gif	3	Cold	GIF3B
Winter.2011	Groningen	1	Cold	GRO1A
Winter.2011	Groningen	2	Cold	GRO2A
Winter.2011	Groningen	3	Cold	GRO3A
Winter.2011	Montpellier	1	Benign	MON1A
Winter.2011	Montpellier	1	Cold	MON1B
Winter.2011	Montpellier	2	Benign	MON2A
Winter.2011	Montpellier	2	Cold	MON2B
Winter.2011	Montpellier	3	Benign	MON3A
Winter.2011	Montpellier	3	Cold	MON3B
Winter.2011	Uppsala	1	Cold	UPP1A
Winter.2011	Uppsala	2	Cold	UPP2A
Winter.2011	Uppsala	3	Cold	UPP3A
Winter.2011	Valencia	1	Benign	VAL1A
Winter.2011	Valencia	1	Cold	VAL1B
Winter.2011	Valencia	2	Benign	VAL2A
Winter.2011	Valencia	2	Cold	VAL2B
Winter.2011	Valencia	3	Benign	VAL3A
Winter.2011	Valencia	3	Cold	VAL3B
Winter.2012	Barcelona	1	Benign	BAR1E
Winter.2012	Barcelona	1	Cold	BAR1F
Winter.2012	Barcelona	2	Benign	BAR2E
Winter.2012	Barcelona	2	Cold	BAR2F
Winter.2012	Barcelona	3	Benign	BAR3E
Winter.2012	Barcelona	3	Cold	BAR3F
Winter.2012	Gif	1	Benign	GIF1E
Winter.2012	Gif	1	Cold	GIF1F
Winter.2012	Gif	2	Benign	GIF2E
Winter.2012	Gif	2	Cold	GIF2F
Winter.2012	Gif	3	Benign	GIF3E
Winter.2012	Gif	3	Cold	GIF3F
Winter.2012	Groningen	1	Benign	GRO1E
Winter.2012	Groningen	1	Cold	GRO1F
Winter.2012	Groningen	2	Benign	GRO2E
Winter.2012	Groningen	2	Cold	GRO2F
Winter.2012	Groningen	3	Benign	GRO3E
Winter.2012	Groningen	3	Cold	GRO3F
Winter.2012	Montpellier	1	Benign	MON1E
Winter.2012	Montpellier	1	Cold	MON1F
Winter.2012	Montpellier	2	Benign	MON2E
Winter.2012	Montpellier	2	Cold	MON2F
Winter.2012	Montpellier	3	Benign	MON3E
Winter.2012	Montpellier	3	Cold	MON3F
Winter.2012	Uppsala	1	Benign	UPP_VAL1C
Winter.2012	Uppsala	1	Cold	UPP_VAL1D
Winter.2012	Uppsala	2	Benign	UPP_VAL2C
Winter.2012	Uppsala	2	Cold	UPP_VAL2D
Winter.2012	Uppsala	3	Benign	UPP_VAL3C
Winter.2012	Uppsala	3	Cold	UPP_VAL3D
Winter.2012	Valencia	1	Benign	VAL_VAL1E
Winter.2012	Valencia	1	Cold	VAL_VAL1F
Winter.2012	Valencia	2	Benign	VAL_VAL2E
Winter.2012	Valencia	2	Cold	VAL_VAL2F
Winter.2012	Valencia	3	Benign	VAL_VAL3E
Winter.2012	Valencia	3	Cold	VAL_VAL3F
Winter.2013	Gif	1	Benign	GIF1I
Winter.2013	Gif	1	Cold	GIF1L
Winter.2013	Gif	2	Benign	GIF2I
Winter.2013	Gif	2	Cold	GIF2L
Winter.2013	Gif	3	Benign	GIF3I
Winter.2013	Gif	3	Cold	GIF3L
Winter.2013	Uppsala	1	Benign	UPP_UPP1F
Winter.2013	Uppsala	1	Cold	UPP_UPP1G
Winter.2013	Uppsala	2	Benign	UPP_UPP2F
Winter.2013	Uppsala	2	Cold	UPP_UPP2G
Winter.2013	Uppsala	3	Benign	UPP_UPP3F
Winter.2013	Uppsala	3	Cold	UPP_UPP3G
Winter.2013	Uppsala	1	Benign	UPP_VAL1G
Winter.2013	Uppsala	1	Cold	UPP_VAL1H
Winter.2013	Uppsala	2	Benign	UPP_VAL2G
Winter.2013	Uppsala	2	Cold	UPP_VAL2H
Winter.2013	Uppsala	3	Benign	UPP_VAL3G
Winter.2013	Uppsala	3	Cold	UPP_VAL3H
Winter.2013	Valencia	1	Benign	VAL_UPP1E
Winter.2013	Valencia	1	Cold	VAL_UPP1F
Winter.2013	Valencia	2	Benign	VAL_UPP2E
Winter.2013	Valencia	2	Cold	VAL_UPP2F
Winter.2013	Valencia	3	Benign	VAL_UPP3E
Winter.2013	Valencia	3	Cold	VAL_UPP3F
Winter.2013	Valencia	1	Benign	VAL_VAL1I
Winter.2013	Valencia	1	Cold	VAL_VAL1L
Winter.2013	Valencia	2	Benign	VAL_VAL2I
Winter.2013	Valencia	2	Cold	VAL_VAL2L
Winter.2013	Valencia	3	Benign	VAL_VAL3I
Winter.2013	Valencia	3	Cold	VAL_VAL3L

edgeR GLM • 1.3k views

ADD COMMENT • link updated 9.5 years ago by Aaron Lun ★ 28k • written 9.5 years ago by damiano.porcelli • 0

score 4 · Accepted Answer · 2014-12-02

It's not entirely clear what you mean by 'shared differentially expressed genes', so here are a couple of possibilities. You could fit a 'conventional' ANOVA where you estimate the difference between cold and benign after controlling for the other factors. Let's say your data.frame is called 'samples':

> design <- model.matrix(~Treatment+Season+Population+Cage, samples)
> colnames(design)
 [1] "(Intercept)"           "TreatmentCold"         "SeasonWinter.2012"    
 [4] "SeasonWinter.2013"     "PopulationGif"         "PopulationGroningen"  
 [7] "PopulationMontpellier" "PopulationUppsala"     "PopulationValencia"   
[10] "Cage2"                 "Cage3"

In this parameterization, TreatmentCold estimates the difference between cold and benign, after accounting for the other parameters in the model. This is the conventional way to fit a model, and depending on your goals, may be the way to go (especially if you care to know if e.g. the Groningen and Montpellier populations are different, after controlling for the other parameters, etc).

An alternative is to fit a cell means model, where each 'cell' is a combination of season/population/treatment:

> combo <- factor(apply(samples[,c(1,2,4)], 1, paste, collapse = "_"))
> design2 <- model.matrix(~0+combo)
> colnames(design2)
 [1] "comboWinter.2011_Barcelona_Benign"   "comboWinter.2011_Barcelona_Cold"    
 [3] "comboWinter.2011_Gif_Benign"         "comboWinter.2011_Gif_Cold"          
 [5] "comboWinter.2011_Groningen_Cold"     "comboWinter.2011_Montpellier_Benign"
 [7] "comboWinter.2011_Montpellier_Cold"   "comboWinter.2011_Uppsala_Cold"      
 [9] "comboWinter.2011_Valencia_Benign"    "comboWinter.2011_Valencia_Cold"     
[11] "comboWinter.2012_Barcelona_Benign"   "comboWinter.2012_Barcelona_Cold"    
[13] "comboWinter.2012_Gif_Benign"         "comboWinter.2012_Gif_Cold"          
[15] "comboWinter.2012_Groningen_Benign"   "comboWinter.2012_Groningen_Cold"    
[17] "comboWinter.2012_Montpellier_Benign" "comboWinter.2012_Montpellier_Cold"  
[19] "comboWinter.2012_Uppsala_Benign"     "comboWinter.2012_Uppsala_Cold"      
[21] "comboWinter.2012_Valencia_Benign"    "comboWinter.2012_Valencia_Cold"     
[23] "comboWinter.2013_Gif_Benign"         "comboWinter.2013_Gif_Cold"          
[25] "comboWinter.2013_Uppsala_Benign"     "comboWinter.2013_Uppsala_Cold"      
[27] "comboWinter.2013_Valencia_Benign"    "comboWinter.2013_Valencia_Cold"

This parameterization allows you to make any combination you might care about, but you have to construct the contrasts by hand. In this parameterization you would test for differences between the benign and cold using a contrast like

> contrast <- vector(length = ncol(design2))
> contrast[grep("Cold", colnames(design2))] <- 1/15
> contrast[grep("Benign", colnames(design2))] <- -1/13
> contrast
 [1] -0.07692308  0.06666667 -0.07692308  0.06666667  0.06666667 -0.07692308
 [7]  0.06666667  0.06666667 -0.07692308  0.06666667 -0.07692308  0.06666667
[13] -0.07692308  0.06666667 -0.07692308  0.06666667 -0.07692308  0.06666667
[19] -0.07692308  0.06666667 -0.07692308  0.06666667 -0.07692308  0.06666667
[25] -0.07692308  0.06666667 -0.07692308  0.06666667

Note that this testing a slightly different hypothesis. Instead of testing Cold vs Benign after controlling for the other parameters, you are instead testing to see if on average the Cold samples are higher or lower than the Benign samples (essentially ignoring the other parameters). But with this parameterization you can easily ask different questions, like 'Is there an interaction between Benign and Cold with respect to the Valencia and Uppsala populations?'

score 2 · Accepted Answer · 2014-12-02

In addition to James' suggestions, another approach is to set up the design matrix like so:

> Combination <- paste0(samples$Season, "_", samples$Population)
> Cage <- factor(samples$Cage)
> Treatment <- samples$Treatment
> design3 <- model.matrix(~0 + Combination + Combination:Cage + Combination:Treatment)
> qrd <- qr(design3)
> design3 <- design3[,qrd$pivot[1:qrd$rank],drop=FALSE]
> colnames(design3)
 [1] "CombinationWinter.2011_Barcelona"
 [2] "CombinationWinter.2011_Gif"
 [3] "CombinationWinter.2011_Groningen"
# ... and so on
[16] "CombinationWinter.2011_Barcelona:Cage2"
[17] "CombinationWinter.2011_Gif:Cage2"
[18] "CombinationWinter.2011_Groningen:Cage2"
# ... and so on
[31] "CombinationWinter.2011_Barcelona:Cage3"
[32] "CombinationWinter.2011_Gif:Cage3"       
[33] "CombinationWinter.2011_Groningen:Cage3"
# ... and so on
[46] "CombinationWinter.2011_Barcelona:TreatmentCold"  
[47] "CombinationWinter.2011_Gif:TreatmentCold"        
[48] "CombinationWinter.2011_Montpellier:TreatmentCold"
# ... and so on

To understand this set up, think of each season/population combination in isolation. For example, let's consider Barcelona in Winter 2011. Here, we have three pairs of cold/benign samples, with one pair from each cage. This can be analyzed with a paired-sample design where we block on the Cage factor. We have one coefficient describing the change between treatments across all cages; dropping this coefficent will test for the effect of the treatment.

Now, design3 is simply an amped-up version of this paired-sample approach. Each season/population combination has its own blocking factors for the cage effect, as well as its own coefficient for the treatment effect. If you drop one of the treatment coefficients, you can test for the difference between benign and cold treatments for any season/population combination. If you want to find DE genes that are shared across seasons or populations, you can test the treatment effect in each season/population and perform an intersection on the DE lists between seasons/populations.

Note that the QR decomposition just removes the treatment coefficients corresponding to the Uppsala and Groningen samples of Winter 2011. This is because they don't have any benign samples, so there's nothing to compare the cold samples to.