Hello, my name is Alex and I am currently a graduate student at Purdue University. I am using edgeR to conduct differential gene expression for a project wherein I'd like to compare 3 treatment groups [control (c), treatment-low (tl), and treatment-high (th)] across 3 population groups (lm, lc, and ct) and I have a question regarding forming contrasts. I want to make two general types of comparisons:
- compare all treated individuals (i.e., combined th + tl) to all control individuals across populations
- compare treated individuals (again, combined th+tl) to control individuals within each population
Using the edgeR User Guide section 3.3 as a guide, I combined population+treatment into a single factor and created the following design matrix:
design <- model.matrix(~0+group) ct_c ct_th ct_tl lc_c lc_th lc_tl lm_c lm_th lm_tl ct_c_1 1 0 0 0 0 0 0 0 0 ct_c_2 1 0 0 0 0 0 0 0 0 ct_c_3 1 0 0 0 0 0 0 0 0 ct_th_1 0 1 0 0 0 0 0 0 0 ct_th_2 0 1 0 0 0 0 0 0 0 ct_th_3 0 1 0 0 0 0 0 0 0 ct_tl_1 0 0 1 0 0 0 0 0 0 ct_tl_2 0 0 1 0 0 0 0 0 0 ct_tl_3 0 0 1 0 0 0 0 0 0 lc_c_1 0 0 0 1 0 0 0 0 0 lc_c_2 0 0 0 1 0 0 0 0 0 lc_c_3 0 0 0 1 0 0 0 0 0 lc_th_1 0 0 0 0 1 0 0 0 0 lc_th_2 0 0 0 0 1 0 0 0 0 lc_th_3 0 0 0 0 1 0 0 0 0 lc_tl_1 0 0 0 0 0 1 0 0 0 lc_tl_2 0 0 0 0 0 1 0 0 0 lc_tl_3 0 0 0 0 0 1 0 0 0 lm_c_1 0 0 0 0 0 0 1 0 0 lm_c_2 0 0 0 0 0 0 1 0 0 lm_c_3 0 0 0 0 0 0 1 0 0 lm_c_4 0 0 0 0 0 0 1 0 0 lm_c_5 0 0 0 0 0 0 1 0 0 lm_th_1 0 0 0 0 0 0 0 1 0 lm_th_2 0 0 0 0 0 0 0 1 0 lm_tl_1 0 0 0 0 0 0 0 0 1 lm_tl_2 0 0 0 0 0 0 0 0 1 lm_tl_3 0 0 0 0 0 0 0 0 1 lm_tl_4 0 0 0 0 0 0 0 0 1 lm_tl_5 0 0 0 0 0 0 0 0 1
With the above design matrix I created the following contrasts to make the comparisons outlined above:
1. allpops_contrast <- c(-1/2,1/4,1/4,-1/2,1/4,1/4,-1/2,1/4,1/4)
2. my.contrasts <- makeContrasts(lm_cvt=(lm_th+lm_tl)/2-lm_c, lc_cvt=(lc_th+lc_tl)/2-lc_c, ct_cvt=(ct_th+ct_tl)/2-ct_c, levels=design)
I then used edgeR's GLM functionality to carry out the comparisons. My question is whether these contrasts are comparing the groups I intend for them to compare based on my objectives listed above. The results for the all population comparison yield a large number of DEGs (> 4,000), whereas each individual population comparison yields a much more modest number (<80 in any population comparison). Am I missing something when it comes to forming contrasts?
Thanks in advance for any insight!
Alex
Thanks Aaron, I see my mistake with the first set of contrasts now. Also, your explanation of the effects of increasing library size on standard error and power to detect DE genes was extremely clear and much appreciated. I was concerned that the all population had nearly 2 orders of magnitude more DEGs than the individual comparisons, but I now see how large differences in the standard error and variance can occur due to differences in 'n' between the all-population and single-population comparisons.
Just wanted to confirm the point you made in your P.S. statement -- If I want to test the null hypothesis within a given population given the contrasts below:
I could achieve this by calling the corresponding set of contrasts within the my.contrasts object (in this example, testing the null hypothesis in the lm population):
Is my understanding correct or were you referring to something else? Thanks for your input.
Yes, that's correct.