Hello everyone
For a group project, we had to process a raw micro array data set using bioconductor and draw some conclusions. We chose this data set for our project. We looked a bit at the data and preprocessed it, but when we wanted to test for differential expressed genes, we got some strange results. Only a couple of genes showed up as DE, where we expected a lot more. The last few days we've searched through our code for errors and possible explanations, but we don't seem to find any differences in method of working from the examples we got. Can anyone could spot an obvious error in our code or give us another way to work? The code shows the part where we tested for DE genes.
#Obtain expression values ########################## biocLite("affyPLM") library("affyPLM") eset<-threestep(affy.data,background.method="RMA.2",normalize.method="quantile", summary.method="median.polish") #obtain the expression values on gene level with the function exprs() e<-exprs(eset) head(e) dim(e) #construction of design matrix library(limma) treatment<-factor(c(rep("Tis11withLPS",3),rep("Tis11",3),rep("GFPwithLPS",3),rep("GFP",3))) X<-model.matrix(~0+treatment) colnames(X)<-c("GFP","GFPwithLPS","Tis11","Tis11withLPS") #fit linear model lm.fit<-lmFit(e,X) #construction of contrasts mc<-makeContrasts("Tis11withLPS-GFPwithLPS","Tis11-GFP","Tis11withLPS-Tis11","GFPwithLPS-GFP", levels=X) c.fit<-contrasts.fit(lm.fit,mc) eb<-eBayes(c.fit) toptable(eb, sort.by = "logFC") topTableF(eb) #Extract p-values & adjustment modFpvalue <- eb$F.p.value indx <- p.adjust(modFpvalue, method = "bonferroni") < 0.05 sig <- modFpvalue[indx] nsiggenes <- length(sig)
Most of the code is basically copy-paste from the examples we got, including the use of Bonferonni. But I don't really think the error lies in the correction method. With Bonferonni we got 10 DE genes and with BH that still was only a meager 20 genes. On such a large dataset, that just seems way too low.
In that case, there are several other possibilities:
arrayWeights
?trend=TRUE
ineBayes
?robust=TRUE
ineBayes
may also be useful.Read the limma user's guide and documentation for more details on each of these functions/settings. If you still get few DE genes... well, maybe that's because you don't have many DE genes. It happens sometimes.
Had few spare minutes left.... so I quickly analyzed this data set. Regarding Aaron's final conclusion: if you create an MDS plot using
limma
'splotMDS
function, you indeed see the treatment groups don't nicely separate. Hence, not surprising that only few probe sets are differentially expressed.... Even though LPS was used as stimulus....Plot is available here:
http://imgur.com/Ykv97b0
Also, realize that by using topTable without speciifying a contrast, you basically check which probe sets are differentially regulated among all 4 tested contrasts. If you specify a coefficient, e.g.
topTableF(eb, coef=4)
, you will get the probe sets that are differentially expressed by LPS in the control-transfected cells. That may still provide some interesting results....Guido's plot also suggests that there is a strong pair effect. It may be worth digging into the metadata and seeing if samples were generated in batches. If you can block on the batch, you can reduce the variance estimates and improve power to detect DE genes within batches.
Thanks for your replies! For the group task we had to analyze a similar RNAseq dataset as well. We finished this yesterday and the results showed little differential expression too. So we already kind of figured that it was indeed because of the data set, not because of some big mistake we made.
We will try reproducing the MDS-figure and write our conclusions from that, and draw what conclusions we can using the topTable function. Again, thanks for the replies and making time to take a deeper look into it.