Dear All,
I am having difficulties interpreting the results from a factorial design experimental setup in limma. I sampled 12 individuals repeatedly, before and after treatment (3hrs, 6hrs, 12hrs, and 24hrs post stimulation) in a stimulated group and a control group. Half of the individuals for each treatment were females, and half males.
Sample | Group | sex | individual |
1 | Stimulated.0 | F | 1 |
2 | Stimulated.3 | F | 1 |
3 | Stimulated.6 | F | 1 |
4 | Stimulated.12 | F | 1 |
5 | Stimulated.24 | F | 1 |
6 | Stimulated.0 | M | 2 |
7 | Stimulated.3 | M | 2 |
8 | Stimulated.6 | M | 2 |
9 | Stimulated.12 | M | 2 |
10 | Stimulated.24 | M | 2 |
… | |||
51 | Control.0 | F | 5 |
52 | Control.3 | F | 5 |
53 | Control.6 | F | 5 |
54 | Control.12 | F | 5 |
55 | Control.24 | F | 5 |
56 | Control.0 | M | 6 |
57 | Control.3 | M | 6 |
58 | Control.6 | M | 6 |
59 | Control.12 | M | 6 |
60 | Control.24 | M | 6 |
In my PCA plots the samples clusters according to treatment as well as sex, and individual and hence I have included both fixed (sex) and random effects (individual) in my model for comparative expression. I followed the protocol “RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR” to convert my RNA-seq data to a format that can be analysed with limma.
Some of the questions I want to answer are:
Q1: Which genes respond differently over time in the stimulated group relative to the control?
Q2: What genes respond differently over time for the stimulant in females and males?
For both questions I would like to know what the foldchange of gene expression is between the groups of interest.
Q1:
As I have samples before and after treatment in both the control group and the stimulated group I used the setup from section 9.6 (Time Course Experiments) in the limma user guide to answer Which genes respond differently over time in the stimulated group relative to the control?
lev <- c("Control.0", "Control.3", "Control.6", "Control.12", "Control.24", "Stimulated.0", "Stimulated.3", "Stimulated.6", "Stimulated.12", "Stimulated.24") f <- factor(y$samples$group, levels=lev) sex <- colData$sex design <- model.matrix(~0+f+sex) colnames(design) <- gsub("f", "", colnames(design)) cont.dif <- makeContrasts(Dif3hr =(Stimulated.3-Stimulated.0)-(Control.3-Control.0), Dif6hr =(Stimulated.6-Stimulated.0)-(Control.6-Control.0), Dif12hr =(Stimulated.12-Stimulated.0)-(Control.12-Control.0), Dif24hr =(Stimulated.24-Stimulated.0)-(Control.24-Control.0), levels =colnames(design)) v <- voom(y, design, plot=TRUE) corfit <- duplicateCorrelation(v,design,block=y$samples$ind) vfit2 <- lmFit(v, design, block=y$samples$ind, correlation=corfit$consensus) vfit2 <- contrasts.fit(vfit2, contrasts=cont.dif) efit2 <- eBayes(vfit2) results_limma_summarizeOverlaps <- summary(decideTests(efit2))
Q2:
For this analysis I used the setup from section 9.5.4 (Classic Interaction Models) in the limma used guide to find genes that respond differently to the stimulus in females vs males.
levg <- c("Control.0.F", "Control.3.F", "Control.6.F", "Control.12.F", "Control.24.F", "Stimulated.0.F", "Stimulated.3.F", "Stimulated.6.F", "Stimulated.12.F", "Stimulated.24.F", "Control.0.M", "Control.3.M", "Control.6.M", "Control.12.M", "Control.24.M", "Stimulated.0.M", "Stimulated.3.M", "Stimulated.6.M", "Stimulated.12.M", "Stimulated.24.M") g <- factor(y$samples$group.sex, levels=levg) design <- model.matrix(~0+g) colnames(design) <- gsub("g", "", colnames(design)) cont.dif <- makeContrasts(Dif_F_M_3=(Stimulated.3.F-Stimulated.0.F)-(Stimulated.3.M-Stimulated.0.M), Dif_F_M_6=(Stimulated.6.F-Stimulated.0.F)-(Stimulated.6.M-Stimulated.0.M), Dif_F_M_12=(Stimulated.12.F-Stimulated.0.F)-(Stimulated.12.M-Stimulated.0.M), Dif_F_M_24=(Stimulated.24.F-Stimulated.0.F)-(Stimulated.24.M-Stimulated.0.M), levels =colnames(design)) v <- voom(y, design, plot=TRUE) corfit <- duplicateCorrelation(v,design,block=y$samples$ind) vfit2 <- lmFit(v, design, block=y$samples$ind, correlation=corfit$consensus) vfit2 <- contrasts.fit(vfit2, contrasts=cont.dif) efit2 <- eBayes(vfit2) results_limma_summarizeOverlaps <- summary(decideTests(efit2))
After finding out what genes responded differentially to the stimulant in females and males I also wanted to visualize the foldchanges for the sexes separately for the genes of interest, and therefore I additionally analysed what genes respond differentially to the treatment for females and males separately.
cont.dif <- makeContrasts(Dif3_F =(Stimulated.3.F-Stimulated.0.F)-(Control.3.F-Control.0.F), Dif6_F =(Stimulated.6.F-Stimulated.0.F)-(Control.6.F-Control.0.F), Dif12_F =(Stimulated.12.F-Stimulated.0.F)-(Control.12.F-Control.0.F), Dif24_F =(Stimulated.24.F-Stimulated.0.F)-(Control.24.F-Control.0.F), Dif3_M =(Stimulated.3.M-Stimulated.0.M)-(Control.3.M-Control.0.M), Dif6_M =(Stimulated.6.M-Stimulated.0.M)-(Control.6.M-Control.0.M), Dif12_M =(Stimulated.12.M-Stimulated.0.M)-(Control.12.M-Control.0.M), Dif24_M =(Stimulated.24.M-Stimulated.0.M)-(Control.24.M-Control.0.M), Dif_F_M_3 = (Stimulated.3.F-Stimulated.0.F)-(Stimulated.3.M-Stimulated.0.M), Dif_F_M_6 = (Stimulated.6.F-Stimulated.0.F)-(Stimulated.6.M-Stimulated.0.M), Dif_F_M_12 = (Stimulated.12.F-Stimulated.0.F)-(Stimulated.12.M-Stimulated.0.M), Dif_F_M_24 = (Stimulated.24.F-Stimulated.0.F)-(Stimulated.24.M-Stimulated.0.M), levels =colnames(design))
(I also tried “Dif_F_M_3_B = ((Stimulated.3.F-Stimulated.0.F)-(Control.3.F-Control.0.F))-((Stimulated.3.M-Stimulated.0.M)-(Control.3.M-Control.0.M))” which gave very similar result to “Dif_F_M_3 = (Stimulated.3.F-Stimulated.0.F)-(Stimulated.3.M-Stimulated.0.M)”).
However, when comparing the results (log foldchanges) from the analysis between the females and males and for males and females separately I realized something strange (?). Sometimes the log foldchange is namely very similar in females and males for a particular gene, but still this gene has a significant value and high log foldchange value in the model comparing the response between the sexes.
In the table below I have listed a couple of the genes with a p.adj value < 0.05 from the comparison between the sexes, and I added the foldchanges values from the analyses for each sex separately to show what I mean:
Sex_Comparison | Sex_Comparison | Female | Male | |
logFC | adj.P.Val | logFC | logFC | |
GeneX1 | 3.77 | 0.03 | -3.40 | -1.66 |
GeneX2 | 3.71 | 0.03 | -2.14 | -2.02 |
GeneX3 | 3.29 | 0.05 | -2.57 | -1.84 |
GeneX4 | 3.26 | 0.03 | -1.78 | -2.07 |
GeneX5 | -2.15 | 0.04 | -0.24 | 3.03 |
GeneX6 | -2.44 | 0.05 | -0.76 | 3.30 |
GeneX7 | -5.61 | 0.03 | 3.18 | 4.38 |
GeneX8 | -6.89 | 0.04 | 1.33 | 4.74 |
GeneX9 | 0.91 | 0.05 | -0.39 | -0.73 |
GeneX10 | -0.94 | 0.03 | -0.14 | 1.48 |
For example, the log foldchange in GeneX2 is very similar in the males and females, but comes out as significant with a log foldchange of 3.7 when comparing the change in both sexes. This makes me wonder whether it is not the foldchange that is compared between the groups? Or could it be that when a gene is downregulated in both groups, the minuses used in makeContrast will become plus instead (-X)-(-X), hence leading to a high foldchange between the sexes even though the foldchange is very similar in both groups? Or are there other reason for my results?
I would very much appreciate if someone could give me a hint on what is happening and how I should interpret the data. At the moment I feel very confused and unsure about how to explain what the log foldchange value actually show from this analysis.
Best wishes,
Elinor
PS. Even though my question mainly covers Q2 in this post, the same issue (?) applies to Q1 as I am also comparing the change in two groups there.
Dear Aaron,
Thank you very much for your answer, I really appreciate your feedback! Yes it makes sense that there will be a difference when including the control for one comparison and not the other. So would you say that it is ok to use the following makeContrast?
Regarding your comment on Q1: Thank you for this feedback, I see your point. I assumed that including sex as a fixed factor would be enough and thought that having six samples per treatment/time point would increase the power of the analysis rather than the opposite. But in your opinion this is not a good idea then? I would be happy to exclude this model if you do not see this appropriate, and analyse the sexes separately from the beginning.
Best wishes,
Elinor
Regarding your contrast matrix: yes, this looks fine, provided that you are willing and able to interpret differences of differences of differences (i.e., log-fold changes). I would suggest reporting all the individual log-fold changes as well, e.g., for
Dif_F_M_3B
, you should also show, in the result table, log-fold changes for the following contrasts:You don't need their test statistics, just the log-fold changes. This is necessary to figure out what a positive or negative effect in the reported "log-fold change" for
Dif_F_M_3B
actually means from a scientific/biological perspective.Regarding the models: as it stands, the model that you use in Q1 is only properly specified if you think that there is no difference in the stimulation response between sexes. Now, normally, this might be a reasonable assumption/approximation. However, in Q2, your aim is to go and find differences in the stimulation response between sexes. If you should find any, you will violate your assumption for the model of Q1. Power considerations aside, you can avoid this logical inconsistency by just using the same design for both questions. In fact, explicitly considering sex in Q1 may give even better answers, by ensuring that differences in the time response between stimulated/control conditions are consistent between sexes.