Dear community,
i would like to address an important question regarding an illumina microarray dataset, i'm currently re-analyzing. It's about the experimantal design of the dataset with GSE32091, which has also been publiced.
In detail, there are 3 different conditions (Irradiated, bystander, control) in 3 different time points( 4h, 8h, 26h) with 3 or 4 biological replicates. The problem is that in some samples there are 3 bystander replicates in 4h and 4 bystander control replicates in 4 hour. Below, im posting the code of an MDS plot and a basic limma DE analysis of some contrasts:
> head(filtered.2$targets) Assay.Name Sample.and.Data.Relationship.Format time Batch 1 GSM795538 irradiated_ctrl 4h 1 2 GSM795539 irradiated_ctrl 4h 2 3 GSM795540 irradiated_ctrl 4h 3 4 GSM795541 irradiated 4h 1 5 GSM795542 irradiated 4h 2 6 GSM795543 irradiated 4h 3 |
> plotMDS(filtered$E,labels=as.factor(paste0(targets$Sample.and.Data.Relationship.Format, ".", targets$time)),col= as.numeric(as.factor(targets$Batch))) |
# here is the link to the MDS plot https://www.dropbox.com/s/ah4v0pem30irjvz/MDSplot.image.png?dl=0 ## Continue with the DE analysis .... samples <- as.factor(paste0(targets$Sample.and.Data.Relationship.Format, ".", targets$time)) # to make a factor including both conditions and 3 time points batch <- filtered.2$targets$Batch # filtered.2 the eSet design <- model.matrix(~0+samples+batch) fit <- lmFit(filtered.2, design) > table(batch) batch 1 2 3 4 12 12 12 6 con <- makeContrasts(test=samplesbystander.4h-samplesbystander_ctrl.4h,levels=design) # one specific example for one specific contrast fit2 <- contrasts.fit(fit, contrasts=con) ebayes <- eBayes(fit2, trend=TRUE) top2 <- topTable(ebayes, coef=1, number=nrow(ebayes), adjust.method="fdr", sort.by="none").. To conclude, most of the contrasts implemented, don't return any DE, after controlling FDR < 0.05. Thus, from a first naive inspection, there are not any DE genes for most of the comparisons. So, my main questions are the following: 1. This is very obvious-evident from the MDS plot i posted above ? In which, we can see that in the majority of the time points, the control samples group together with either of the conditions tested (bystander and/or irradiated) ? 2. Moreover, could the different batch size as illustrated in some of the experimental groups above --even with a relatively small replicate difference-- hamper further the DE analysis, regarding the DE genes ? (Just to add a small note: from the relative publication, the authors used a cutoff of FDR <0.15 combined with a nonimal p-value <0.05, ending with a relatively low number of DE genes...) Any comments or suggestions would be grateful !! |
Indeed, even if a lot of the spread across the plot is due to a batch effect, I would expect that each control sample would be "shifted" in a consistent direction relative to the non-control sample in the same batch, if there were a systematic control/non-control change in expression across all batches. This does not seem to be the case, though you'd get a clearer picture of what's going on if you performed
removeBatchEffect
on the expression matrix prior to runningplotMDS
.Another thing to try would be treatment (i.e., bystander/irradiated)-specific blocking factors for the batch effects. This would model differences in how the bystander/irradiated samples respond to the batch effect, e.g., the irradiated samples might not be affected much by batch but the bystander samples show a greater variation. Adding specific blocking might improve the model fit, reduce the sample variance and increase power. Or maybe it won't, because you'll lose residual d.f.; you'll have to test it to find out.