I have set up my model matrix following the example 3.3.4 “interaction at any time” in the edgeR user guide. My experimental design has 2 strains: KO and WT, and 3 time points, 0h, 12h, and 36h. I’m interested in genes that respond differently to infection between the KO and WT at 12h or 36h. When I tested for an interaction at any time (coef =5:6), I found 424 genes that were significant at a FDR of 5%. Then, I wanted separate lists of DE genes for each timepoint, and tested coef=5 (StrainKO.Time12h)and coef=6 (StrainKO.Time36h) separately. For coefficient 5, I found 9 DE genes, and coefficient 6 I found 39 DE genes. I was expecting the number of DE genes for both coeff 5,6 to slightly less than the sum of the individual tests (because some genes might be significantly different at both timepoints), however, that does not appear to be the case. Why does testing multiple coefficients together result in many significant genes than the sum of significant genes obtained by testing coefficients separately?
#setting up design matrix Strain<-c(rep("WT",6),rep("KO",4)) Time<-c(rep(c("0h","12h","36h"),each=2),"0h","12h",rep("36h",2)) samples<-rownames(RNA_data$samples) targets<-data.frame(samples,Strain,Time) targets$Strain <- relevel(targets$Strain, ref="WT") design<-model.matrix(~Strain*Time,data=targets) fit<-glmFit(RNA_data_filtered,design) lrt_12h_36h<-glmLRT(fit,coef=5:6]) lrt_12h<-glmLRT(fit,coef=5) lrt_36h<-glmLRT(fit,coef=6) lrts<-list(lrt_12h_36h=lrt_12h_36h,lrt_12h=lrt_12h,lrt_36h=lrt_36h) #extract the results table from the lrt object Results<-lapply(lrts,topTags,n=Inf) #subset significant genes Sig_genes<-lapply(Results,function(x) x$table[x$table$FDR<=0.05,])