I have a gene-expression (GE) data for 100 patients at 2 different time (Before & After). So, each patient has 2 samples.
Patients were under 2 treatment (treatA & treatB). I would like to to find the differential gene expression for the following contrast:
con <- makeContrasts(diff=(treatA_before-treatB_Before)-(treatA_After-treatB_After), levels=design)
However, I also would like to adjust my model for 4 covarites (age, sex, center, BMI) and also remove batch effect. So, I used the following design matrix:
treat=factor(rep(pheno$treat, 2),levels=c("A","B"))
time <- rep(c("Before", "After"), c(ncol(GE_before), ncol(GE_after)))
design=model.matrix(~0+age+sex+center+BMI+batch+treat:time)
Could you please let me know that my design model is correct? can I remove the batch effect using this linear model. please advise. Thanks
Thanks Aaron for your great explanation. How about batch effect? My RNAseq data for those 100 patients prepared at 7 different time date.
So, I have batch effects in my GE data (Before & after). How to remove the batch effect? If I consider "patient" as a random effect in the design model, do you think the batch effect is remove automatically? Please advise
For any given patient, do both of his/her samples belong in the same batch? If yes, the batch effect is automatically removed by blocking on the patient ID. If not, you'll need to include the batch as a separate factor in the design matrix.
Also, the patient ID is a fixed effect when you put it in the design matrix, see A: How does edgeR compare to lme4?.
OK, I got it. If I put "0" in the design model, is there any difference for the DGE results.
Then I have no "intercept" in the output.
The presence or absence of an intercept will not affect the interpretation of the two terms of interest.
Thank you very much!
Does the presence of an intercept make it so that each patient term is relative to the first patient term? For example, isn't the column "patientB" in the design matrix is actually patientB relative to patientA? If so, does this still accomplish the intended blocking effect?
Yes, that interpretation is correct. And yes, the blocking still works.
Hi Aaron,
I am just confused why these two contrast are "equivalent" as you mentioned in above:
con <- makeContrasts(diff=(treatA_before-treatB_Before)-(treatA_After-treatB_After), levels=design)
Could you please explain more? thanks
I'll assume that treatment A = X and B = Y. The individual equivalences are:
The result should be obvious after some rearrangement of terms:
... which is just your original contrast multiplied by a factor of -1. In other words, the results of the two contrasts are (conceptually) equivalent, only differing by the sign of the log-fold change.
Ok, great. But if I choose "timeBefore" for any difference contrast, then the results should be change. is that right or I'm wrong?
treamnetX_timeBefore = treatA_After - treatA_Before
treatmentY_timeBefore= treatB_After - treatB_Before
I don't understand your question, nor your equations. In any case, they should be:
... assuming you removed all
timeAfter
coefficients from the design matrix. This will give the same results as in my original response, but with the sign of the log-FC flipped.My 100 patients sequenced at 7 different times (So, I have 7 different class of batch effect in my data). When I put "Batch" as a separate factor in the model as below:
design=model.matrix(~0+Batch+patient+treatment:time)
I Got the following error:
Coefficients not estimable: Batch
Warning message: Partial NA coefficients for 18258 probe(s)
What is wrong with this design model?
Please move this latest comment to a response to my first post, the space is getting too small to answer it effectively here.