Merge Replicates vs Mixed Model
0
0
Entering edit mode
abadgerw • 0
@5088ef59
Last seen 2 hours ago
United Kingdom

I am performing an analysis to look at the impact of 3 types of stimulation on 6 stem cell lines (each derived from 1 individual) and I ran the following code:

exprdata<-read.csv("Replicates Merged.csv", row.names=1)
metadata<-read.csv("Metadata.csv",row.names=1)
metadata$ID<-as.factor(metadata$ID)
precursors<-read.csv("Counts.csv",row.names=1)

form <- ~ 0 + Stim + (1 | ID)

compare <- makeContrastsDream(form, metadata, contrasts = c(compare3_1 = "StimFib - StimCon", compare3_2 = "StimFib - StimPDL", compare2_1 = "StimPDL - StimCon"))

fit <- dream(exprdata, form, metadata, compare, ddf = "Kenward-Roger")
fit2 <- variancePartition::eBayes(fit, robust=T, trend=log(precursors$Counts))

FibxCon<-variancePartition::topTable(fit, coef = "compare3_1", number=3469)
FibxPDL<-variancePartition::topTable(fit, coef = "compare3_2", number=3469)
PDLxCon<-variancePartition::topTable(fit, coef = "compare2_1", number=3469)

Of note, each cell line was plated in triplicate before stimulation. I see the following when running a PCA plot: enter image description here

The most variation is attributed to the different cell lines. However, some of the technical replicates do not cluster well.

I thought it was best to merge the triplicates via median before running the model above. However, I started to wonder whether it be better to keep the triplicates separate? If so, would I have to change the structure of my random effect (ie nested)?

Any insights would be much appreciated!

limma DifferentialExpression variancePartition dream • 415 views
ADD COMMENT
0
Entering edit mode

gabriel.hoffman any insight?

ADD REPLY
0
Entering edit mode

It is almost always better to retain the full observed data, if the statistical software is able to model the repeated measures. This is a good application for a mixed model.

With count data, it is not clear how to collapse technical replicates, while allowing the statistical model to model count-error, varying library size and overdispersion.

1) In cases where there are multiple sources of expression variation, the variation partitioning plot can give insight not found in the PCA plot.

2) its not clear if you are using a voom-step here. dream() is designed to take the result of voom() or voomWithDreamWeights()

3) Fitting eBayes() with a trend is not recommended. Instead, incorporating precision weights from a voom-step into dream() is the best way to handle counts.

4) It looks like you have biological expression variation due to Donor and Stimulus

ADD REPLY
0
Entering edit mode

Thanks, gabriel.hoffman

The data is actually proteomics so it is intensity data which is why I did not use voom but rather ran ebayes with a trend which included the log of the peptide counts corresponding to the intensity measures. I went this route as it was previously suggested by Gordon Smyth with limma https://www.biostars.org/p/9509937/.

Given the data is proteomics rather than RNAseq and my goal is to understand the impact of stimulus, would it still be advisable to not collapse the technical replicates or would using the median of technical replicates and still use dream with a random effect of cell line ID be appropriate? I was just concerned that not collapsing technical replicates would overinflate p values? Thanks for you help and advice!

ADD REPLY

Login before adding your answer.

Traffic: 519 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6