I am trying to use limma-voom to determine the effect of a condition between males and females. I have paired samples, and about 70 male and 45 females. At first I organized my data according to section 3.5 of the edgeR users manual (https://www.bioconductor.org/packages/devel/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf). My data therefore looked as such:
sample | condition | sex | nested |
sample1 | pre | M | 1 |
sample1 | post | M | 1 |
sample2 | pre | M | 2 |
sample2 | post | M | 2 |
sample70 | pre | F | 1 |
sample 70 | post | F | 1 |
... | ... | ... | ... |
My code looks as follows:
design = model.matrix(~ sex + sex:nested + sex:condition, data)
colnames(design) <- make.names(colnames(design), unique=TRUE)
v = voomWithQualityWeights(dge, design, plot=FALSE)
fit = lmFit(v, design)
#construct contrast matrix
p <- ncol(fit)
cont <- rep(0, p)
cont[p] <- 1
cont[p-1] <- -1
fit2 = contrasts.fit(fit, cont)
fit2 = eBayes(fit2)
results = decideTests(fit2)
> summary(results)
[,1]
-1 0
0 21457
1 0
Getting 0 sig genes from the contrast seems unlikely because, when I go to see the genes from just the interaction between sex and condition I get the following:
fit_simple = eBayes(fit)
results_simple = decideTests(fit_simple)
summary(results_simple)
#just showing columns of interest
sexMale.conditionpost sexFemale.conditionpost
-1 4500 3859
0 9812 11541
1 6819 5731
Because getting no sig genes seemed suspicious I then tried to incorporate pairing using the duplicateCorrelation function instead of in the design matrix. I also simplified the design, so I did a separate run for male samples, for female samples, and for the interaction between sex and condition and the contrast between the two sexes. My data then looked as such (for just males and females I just had those samples in a table with the patient number starting at 1. Patient is a factor, not numeric):
sample | condition | sex | patient |
sample1 | pre | M | 1 |
sample1 | post | M | 1 |
sample2 | pre | M | 2 |
sample2 | post | M | 2 |
sample70 | pre | F | 70 |
sample 70 | post | F | 70 |
... | ... | ... | ... |
The code for male samples (and female samples, just different count matrix and data table) looks like this:
dge_Male <- DGEList(countdata_filtered_Male)
dge_Male <- calcNormFactors(dge_Male)
design_Male = model.matrix(~ condition, colData_Male)
v_preCor_Male = voomWithQualityWeights(dge_Male, design_Male, plot=FALSE)
first_cor_Male = duplicateCorrelation(v_preCor_Male, design = design_Male, block = colData_Male$nested)
v_postCor_Male = voomWithQualityWeights(dge_Male, design_Male, block = colData_Male$nested, correlation = first_cor_Male$consensus, plot=FALSE)
second_cor_Male = duplicateCorrelation(v_postCor_Male, design = design_Male, block = colData_Male$nested)
fit_Male <- lmFit(v_postCor_Male, design_Male, block = colData_Male$nested, correlation = second_cor_Male$consensus)
fit_Male <- eBayes(fit_Male)
results_Male <- decideTests(fit_Male)
summary(results_Male)
(Intercept) conditionpost
-1 6709 3572
0 575 8409
1 11315 6618
summary(results_Female)
(Intercept) conditionpost
-1 5412 2429
0 929 8720
1 9624 4816
The code for the interaction and contrast is as follows:
dge_pairingCor <- DGEList(countdata_filtered_pairingCor)
dge_pairingCor <- calcNormFactors(dge_pairingCor)
design_pairingCor = model.matrix(~ sex + sex:condition, colData_pairingCor)
colnames(design_pairingCor) <- make.names(colnames(design_pairingCor), unique=TRUE)
v_preCor_pairingCor = voomWithQualityWeights(dge_pairingCor, design_pairingCor, plot=FALSE)
first_cor_pairingCor = duplicateCorrelation(v_preCor_pairingCor, design = design_pairingCor, block = colData_pairingCor$patient)
v_postCor_pairingCor = voomWithQualityWeights(dge_pairingCor, design_pairingCor, block = colData_pairingCor$patient, correlation = first_cor_pairingCor$consensus, plot=FALSE)
second_cor_pairingCor = duplicateCorrelation(v_postCor_pairingCor, design = design_pairingCor, block = colData_pairingCor$patient)
fit_pairingCor <- lmFit(v_postCor_pairingCor, design_pairingCor, block = colData_pairingCor$patient, correlation = second_cor_pairingCor$consensus)
p_pairingCor <- ncol(fit_pairingCor)
cont_pairingCor <- rep(0, p_pairingCor)
cont_pairingCor[p_pairingCor] <- 1
cont_pairingCor[p_pairingCor-1] <- -1
fit_pairingCor2 = contrasts.fit(fit_pairingCor, cont_pairingCor)
fit_pairingCor2 = eBayes(fit_pairingCor2)
results_pairingCor = decideTests(fit_pairingCor2)
#results for contrast
summary(results_pairingCor)
[,1]
-1 45
0 21041
1 45
#results for interaction between sex and condition
fit_pairingCor_simple = eBayes(fit_pairingCor)
results_pairingCor_simple = decideTests(fit_pairingCor_simple)
summary(results_pairingCor_simple)
X.Intercept. sexFemale sexMale.conditionpost sexFemale.conditionpost
-1 7628 260 4694 4022
0 476 20610 8583 10274
1 13027 261 7854 6835
I am confused about several things: 1) Why do I get different results when taking the pairing out of the design matrix and instead use duplicateCorrelation? 2) Which result should I trust? 3) Why are the results for the interaction term different from when I run just male and female samples? Shouldn't the results be the same?
I am clearly confused as to how best run this type of analysis. I apologize if I did not explain my problem clearly enough, and will do my best to clarify promptly. Any help and guidance would be greatly appreciated! Thank you!
Great, thank you very much for the help!
When I run the same analysis in edgeR I get a lot of significantly expressed genes in the contrast. Should I just chalk this up to a difference in the softwares? Or does this indicate that I'm implementing something incorrectly?
Depends on whether you're using
glmLRT
or the QL framework. The former is also a bit liberal, whereas the latter is more accurate and, in fact, should behave much likevoom
(without the sample weighting).I used the glmLRT framework but will try the QL framework. Thank you for the advice!