Hello,
I try to get the list of significant DMCs from RRBS data using the DSS package.
For now, the pipeline I’m using takes into account only one factor (with DMLtest function).
I would like to implement the multifactor analysis of RRBS data into this pipeline with the DMLfit.multifactor and DMLtest.multifactor functions.
To compare the results, I performed these two methods on RRBS data of the DSS package using the cell type as the comparison parameter:
- 1. DMLTest : 2 groups compared : group1 with samples from cell=rN and group2 with samples from cell=aN
- 2. DMLfit.multifactor and DMLtest.multifactor : linear model = ~ cell
My problem is the following: I don’t have the same results when I perform a DMLtest and a DMLtest.multifactor (different p-values) and when I apply a threshold of 0.05, I don't get exactly the same DMCs.
Below, an example of my code:
library(DSS)
data(RRBS)
# I. Single factor:
## Samples:
sampleNames(RRBS)
group1 = c("Sample1","Sample2","Sample3","Sample5","Sample7","Sample9","Sample11","Sample16")
group2 = c("Sample4","Sample6","Sample8","Sample10","Sample12","Sample13","Sample14","Sample15")
## Perform the test
dmlTest_sf = DMLtest(RRBS, group1=group1, group2=group2)
head(dmlTest_sf)
##Significant DMCs:
nrow(dmlTest_sf[dmlTest_sf$pval < 0.05,])
# 435
# II. Multifactor:
## General experimental design:
design
## Model matrix:
model.matrix(~cell, data=design)
## Fit the linear model
DMLfit = DMLfit.multiFactor(RRBS, design=design, formula=~cell)
## Perform the test
dmlTest_mf = DMLtest.multiFactor(DMLfit, coef=2)
##Significant DMCs:
nrow(dmlTest_mf[dmlTest_mf$pvals < 0.05,])
# 695
# III. Correlation between Single and Multifactor p-values:
cor(dmlTest_sf$pval, dmlTest_mf$pvals)
# 0.8655487
How to explain this difference between these two methods ?
Thanks in advance for your answers