Hello, my name is Alex Martinez and I am currently a graduate student at Purdue University. I am using edgeR to conduct differential gene expression for a project wherein I'd like to compare 2 treatment groups (control versus treatment) across 3 population groups (lm, lc, and ct). After reading through the edgeR user guide, I decided to opt for the Blocking method (section 3.4.2) and initially tested for DGE with all three populations combined in the model using the design matrix:
design <- model.matrix(~population+treatment) head(design) (Intercept) lc lm trtmntTrt ctc_1_RSEM 1 0 0 0 ctc_2_RSEM 1 0 0 0 ctc_3_RSEM 1 0 0 0 ctt_1_RSEM 1 0 0 1 ctt_2_RSEM 1 0 0 1 ctt_3_RSEM 1 0 0 1
and then performing the likelihood ratio test coefficients associated with the treatment variable.
lrt <- glmLRT(fit, coef=4)
The list of DEGs was ~50 genes in length. Next, I decided to test for DEGs between control and treatment individuals for each population separately to see if the individual lists roughly corroborated the list of DEGs in the Blocking model. The results were surprising: one population had over 300 DEGs while the other two had fewer than 40 each. Additionally, there were only a small number of genes across the individual population lists (~10) that were present in the Blocking model DEG list.
My questions are:
(a) Is there a good statistical reason to opt for the Blocking method over comparing the three populations separately?
(b) Why might the overlap between individual population DEG lists and the combined Blocking DEG list be so small?
Any thoughts or insight regarding these questions would be greatly appreciated! Thanks in advance.
Alex
Thanks for the quick response Aaron. I was definitely concerned that I might be missing out on DEGs by comparing each population in a separate model and this solution puts those worries at ease. When making the contrast to control all populations, I was planning on doing something like the following ("_c" refers to control individuals and "_t" refers to treatment individuals):
I was just hoping to clarify that my understanding of setting up the contrasts to test differences between treatment groups across populations is correct. Thank you again for the insight.
Alex
You need to divide
allpops_contrast
by 3 to get the average treatment log-fold change across populations.Great, that makes sense. I appreciate your help in clarifying this issue!
Best,
Alex.