Limma: Contrasts with interaction term
1
1
Entering edit mode
huma ▴ 10
@huma-12568
Last seen 7.7 years ago

Hi

I am stuck with limma design and contrasts. My pdata file consist of 3 variables

   change Cells Days
    SelP G2215   D1
    SelP G2215   D1
    SelN G2215   D1
    SelN G2215   D1
    SelP    G2   D1
    SelP    G2   D1
    SelN    G2   D1
    SelN    G2   D1
    SelP G2215   D2
   SelP G2215   D2
   SelN G2215   D2
   SelN G2215   D2
   SelP    G2   D2
   SelP    G2   D2
   SelN    G2   D2
   SelN    G2   D2
   SelP G2215   D3
   SelP G2215   D3
   SelN G2215   D3
   SelN G2215   D3
   SelP    G2   D3
   SelP    G2   D3
   SelN    G2   D3
   SelN    G2   D3

I want to compare the difference between selP and selN in G2 and G2215 cells and difference between these two cell types and how change in days is regulating this differential expression. I am not very well familiar with statistics so after reading manual and many online forums, i wrote the following code for this purpose

day=as.numeric(pdata$Days)
sc=paste(pdata$change,pdata$Cells,sep='.')
design<-model.matrix(~0+sc+ sc:day)
colnames(design)=gsub(':','_',colnames(design))

ContMatrix<-makeContrasts(G2=scSelP.G2-scSelN.G2,
                          G2215=scSelP.G2215-scSelN.G2215,
                          diff1=((scSelP.G2-scSelN.G2)-(scSelP.G2215-scSelN.G2215)),
                          G2_day=scSelP.G2_day-scSelN.G2_day,
                          G2215_day=scSelP.G2215_day-scSelN.G2215_day,
                          diff2=((scSelP.G2_day-scSelN.G2_day)-(scSelP.G2215_day-scSelN.G2215_day)),
                          levels=design)

fit<-lmFit(data,design)
fit2<-contrasts.fit(fit,ContMatrix)

fit2<-eBayes(fit2)

I am assuming that coef=1 to 3 will give me difference in the group without modulating changes depending on time (days) however coef=4 to 6 will give same results but this time how this differential expression is varying with time.

 Can any one please suggest me if this design is ok or does it need further modification? 

limma interactions continuous term multiple factor design • 2.6k views
ADD COMMENT
3
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 9 hours ago
The city by the bay

Looks fine to me, but let's go through it. First, I'll explain what the coefficients in your design matrix mean:

[1] "scSelN.G2"        "scSelN.G2215"     "scSelP.G2"        "scSelP.G2215"    
[5] "scSelN.G2_day"    "scSelN.G2215_day" "scSelP.G2_day"    "scSelP.G2215_day"

The first four coefficients represent the average log-expression in each "change+cells" combination at a hypothetical day 0 time point. The last four coefficients represent the log-fold change per day for each change+cells combination (i.e., the gradient in log-expression with respect to time, also the time effect).

Okay, let's now have a look at your contrasts. The G2 contrast tests for DE between SelP and SelN for G2 cells at day 0; similarly, the G2215 contrast tests for DE between SelP and SelN for G2215 cells at day 0. The diff1 contrast tests for differences in the SelP/SelN log-fold changes between G2 and G2215 cells - again, at day 0. So far, so good.

Now, the remaining three contrasts are more complicated. G2_day tests for differences in the time effect between SelP and SelN G2 cells; the same for G2215_day for G2215 cells. The results of these contrasts will be difficult to interpret without recording the values of the individual gradients for each change+cells combination involved in each contrast. For example, is a positive log-fold change in G2_day due to a gradient of zero for SelN and a positive gradient for SelP; a negative gradient for SelN and a zero gradient for SelP; or a negative gradient for SelN and a positive gradient for SelP? Each of these scenarios will have different biological interpretations, so make sure you report the gradients for each change+cells combination when you show the results.

The point above is especially applicable for the last diff2 contrast, which is testing for differences in the differences in the time effect between G2 and G2215 cells. What this actually means is unclear to me, so I can only hope that you have a good reason for making such a contrast. In any case, you had better show all the gradients for the individual change+cells combinations as well as the log-fold changes for G2_day and G2215_day, if you want to make sense of the results of this contrast.

P.S. Conversion of a factor via as.numeric may not give you the expected results, depending on the order of the levels. In this case it is fine, but a safer procedure is to extract the day from the string and coerce that instead:

as.numeric(sub("^D", "", pdata$Day))
ADD COMMENT

Login before adding your answer.

Traffic: 950 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6