Hi All
The short version of the question.
I have a study that is basically the same as in section 3.5 page 40 in the EdgeR manual. There it shows how you can extract the differential genes in various comparisons. One comparison that escapes me is what if I would like to know if there is a difference in gene expression between non treated samples in the healthy versus disease patients. Basically, "DiseaseDisease1:TreatmentNone"-"DiseaseHealthy:TreatmentNone". How would I be able to extract that contrast from the design proposed in the manual?
The longer version.
In my study all the patients have the disease, just for a different length of time (long vs short). And I have samples from control tissues and disease tissue from the same patient. Basically it looks like this.
pat<-gl(10,2) time<-gl(2,10, labels=c("short","long")) tissue<-gl(2,1,length=20,labels=c("ctrl","dis")) data.frame(time, pat, tissue) time pat tissue 1 short 1 ctrl 2 short 1 dis 3 short 2 ctrl 4 short 2 dis 5 short 3 ctrl 6 short 3 dis 7 short 4 ctrl 8 short 4 dis 9 short 5 ctrl 10 short 5 dis 11 long 6 ctrl 12 long 6 dis 13 long 7 ctrl 14 long 7 dis 15 long 8 ctrl 16 long 8 dis 17 long 9 ctrl 18 long 9 dis 19 long 10 ctrl 20 long 10 dis
Renumber the patients within the time groups as per section 3.5 in the manual.
patb<-gl(5,2,length=20) data.frame(time, patb, tissue) time patb tissue 1 short 1 ctrl 2 short 1 dis 3 short 2 ctrl 4 short 2 dis 5 short 3 ctrl 6 short 3 dis 7 short 4 ctrl 8 short 4 dis 9 short 5 ctrl 10 short 5 dis 11 long 1 ctrl 12 long 1 dis 13 long 2 ctrl 14 long 2 dis 15 long 3 ctrl 16 long 3 dis 17 long 4 ctrl 18 long 4 dis 19 long 5 ctrl 20 long 5 dis
design<-model.matrix(~time+time:tissue+time:patb) colnames(design)
[1] "(Intercept)" "timelong" "timeshort:tissuedis" "timelong:tissuedis" "timeshort:patb2" "timelong:patb2" "timeshort:patb3" [8] "timelong:patb3" "timeshort:patb4" "timelong:patb4" "timeshort:patb5" "timelong:patb5"
So if I would like to see if there is difference in gene expression between patients that have had the disease long time vs short time I could use coef="timelong" or contrast=c(0,1,0,0,0,0,0,0,0,0,0,0).
For differences in the diseased tissue in short time vs long time patients I could use contrast=c(0,0,1,-1,0,0,0,0,0,0,0,0) .
But what If I would like to see if there is differential expression in the control tissue between the short/long time patients? Would contrast=c(0,1,-1,-1,0,0,0,0,0,0,0,0) work?
Also, if I would like to make the comparison "diseased tissue" - "control tissue", would it be correct to use a contrast like c(0,0,-1,-1,0,0,0,0,0,0,0,0)
Hi Aaron
Thanks a lot for taking the time and for your very helpful answer. I have been avoiding these kind of interaction designs and usually break down everything to smaller subgroups, I have easier time wrapping my head around those. So indeed, subsetting the data was my first thing to do but thought to explore if there are other possibilities.
If you would have the time, one followup question.
Instead of subsetting the data could one make a design like:
and then to get the difference in the control tissue for long vs short time patients use
while for the long_dis vs short_dis
You still need to subset, otherwise you won't be account for the correlations between samples from the same patient. Your design only works if you were using limma with
, which models these correlations explicitly. With edgeR, subsetting keeps only one sample per patient so you don't need to worry about these correlations.OK, great, thanks. I have been using limma with dublicateCorrelation previously in the way you mention so my design thinking is still influenced by that.