I am performing an analysis to look at the impact of 3 types of stimulation on 6 stem cell lines (each derived from 1 individual) and I ran the following code:
exprdata<-read.csv("Replicates Merged.csv", row.names=1)
metadata<-read.csv("Metadata.csv",row.names=1)
metadata$ID<-as.factor(metadata$ID)
precursors<-read.csv("Counts.csv",row.names=1)
form <- ~ 0 + Stim + (1 | ID)
compare <- makeContrastsDream(form, metadata, contrasts = c(compare3_1 = "StimFib - StimCon", compare3_2 = "StimFib - StimPDL", compare2_1 = "StimPDL - StimCon"))
fit <- dream(exprdata, form, metadata, compare, ddf = "Kenward-Roger")
fit2 <- variancePartition::eBayes(fit, robust=T, trend=log(precursors$Counts))
FibxCon<-variancePartition::topTable(fit, coef = "compare3_1", number=3469)
FibxPDL<-variancePartition::topTable(fit, coef = "compare3_2", number=3469)
PDLxCon<-variancePartition::topTable(fit, coef = "compare2_1", number=3469)
Of note, each cell line was plated in triplicate before stimulation. I see the following when running a PCA plot:
The most variation is attributed to the different cell lines. However, some of the technical replicates do not cluster well.
I thought it was best to merge the triplicates via median before running the model above. However, I started to wonder whether it be better to keep the triplicates separate? If so, would I have to change the structure of my random effect (ie nested)?
Any insights would be much appreciated!
gabriel.hoffman any insight?