Hello!
I am using the DSS package to identify differentially methylated loci (DMLs) and regions (DMRs) between treatment and control samples of a time-series data set. However, I am getting varying results based on how I design my BSobj dataset. If anyone has any insight as to why I am getting varying results, I would greatly appreciate it!
The dataset I am using is a time-series dataset composed of 9-time points (6, 12, 24, 48, 72, 96, 120, 144, 168); with one treatment and control per timepoint. There is only one replicate per sample. The contrasts that I am running are between the treatment and the control of each time point independently, and I am trying to identify differentially methylated loci (DMLs) and regions (DMRs)
The first way I designed my BSobj was:
library(DSS)
BSobj.allhr = makeBSseqData( list(C_6hr_cpg[,c(1:4)], T_6hr_cpg[,c(1:4)],
C_12hr_cpg[,c(1:4)], T_12hr_cpg[,c(1:4)],
C_24hr_cpg[,c(1:4)], T_24hr_cpg[,c(1:4)],
C_48hr_cpg[,c(1:4)], T_48hr_cpg[,c(1:4)],
C_72hr_cpg[,c(1:4)], T_72hr_cpg[,c(1:4)],
C_96hr_cpg[,c(1:4)], T_96hr_cpg[,c(1:4)],
C_120hr_cpg[,c(1:4)], T_120hr_cpg[,c(1:4)],
C_144hr_cpg[,c(1:4)], T_144hr_cpg[,c(1:4)],
C_168hr_cpg[,c(1:4)], T_168hr_cpg[,c(1:4)]),
c("C6", "H6","C12", "H12","C24","H24","C48", "H48",
"C72", "H72","C96", "H96","C120", "H120","C144", "H144","C168", "H168") )
where I incorporated all of my timepoints into a single object called BSobj.allhr. From here I am comparing each timepoint treatment to it's control. As an example, for my 6hr samples, I would run:
dmltest.6hr.sm500 = DMLtest(BSobj.allhr,
group1 = c("H6"),group2 = c("C6"),
smoothing = TRUE, smoothing.span = 500)
dmltest.6hr.sm500.dml = callDML(dmltest.6hr.sm500, delta = 0.1, p.threshold=0.01)
dmltest.6hr.sm500.dmrs = callDMR(dmltest.6hr.sm500, p.threshold=0.05, delta = 0.1, dis.merge = 250, minCG = 3, pct.sig = 0.5, minlen = 50)
The resulting DML and DMR file produced approx. ~800 DML's and 185 DMRs.
However, if I create a reduced BSobj for only the timepoint comparison I am running (e.g. 6hr) and run the DML and DMR analysis as such:
BSobj.6hr = makeBSseqData( list(C_6hr_cpg[,c(1:4)], T_6hr_cpg[,c(1:4)]),
c("C6", "H6") )
dmltest.6hr.sm500.dml.reduced = callDML(dmltest.6hr.sm500.reduced, delta = 0.1, p.threshold=0.01)
dmltest.6hr.sm500.dmrs.reduced = callDMR(dmltest.6hr.sm500.reduced, p.threshold=0.05, delta = 0.1, dis.merge = 250, minCG = 3, pct.sig = 0.5, minlen = 50)
I identify ~1360 DML's and ~380 DMR's, a significant increase from the previous analysis.
Several questions arise: By incorporating all of the timepoints, is that affecting the smoothing approach somehow? Of the two approaches, would you recommend the second? Any advice would be greatly appreciated.
Thank you.