Question

Bumphunting for longitudinal data

0

Entering edit mode

Ashlynn • 0

@ashlynn-15794

Last seen 6.8 years ago

Hello everyone,

I'm working with some DNA methylation data and would like to use either Jaffe et al.'s bumphunter algorithm as implemented in the minfi package or the DMRcate package to identify differentially methylated regions. My study is a bit more complex as I have repeated measures (two time points) and therefore am running into some trouble specifying a design matrix. I would like to use a linear mixed effect model (lmer) to account for the repeated measures. Does anyone have experience with this? Any guidance would be much appreciated!

I would like to specify my design matrix to look something like this...

design<-lmer(cpg_mval~case_status+age+sex+cell_composition+smoking+meds+(1|id), data=data)

Sincerely,

Ash

bumphunter minfi lmer dmrcate dmr analysis • 2.4k views

ADD COMMENT • link updated 6.8 years ago by Tim Peters ▴ 230 • written 6.8 years ago by Ashlynn • 0

score 1 · Accepted Answer · 2018-05-11

1

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 1 hour ago

United States

Do note that lmer isn't a function to generate a design matrix - it's a function intended to fit a linear mixed model that happens to create a design matrix as part of the model fit.

If you have just two time points, then there isn't really a compelling reason to use a repeated measures design. Instead you can just block on patient, removing all the then redundant covariates (which, depending on the timing between observations may include age, smoking, and meds, and will certainly include sex). It then depends on what the research question might be.

If you are doing something like a case/control with treatment and you want to know if the meds cause differential methylation depending on the case/control status, then you have to do some trickeration in order to get a full rank design matrix. See section 3.5 in the edgeR User's Guide, which shows how to do that sort of thing.

ADD COMMENT • link 6.8 years ago James W. MacDonald 68k

0

Entering edit mode

Dear James,

Thank you very much for the clear explanation. I am wondering if you could help me resolve my confusion with a similar matter. I am doing a longitudinal study with repeated measures at three time points. We are studying methylation changes during the course of malaria infection and after treatment in the host. We have three repeated measures at three time points for each individual (before infection T1, after infection T2, and after treatment T3). My initial plan was to treat individual as a "random effect" in a mixed-model, after coming across your post along with reading section 3.5.2 on "Blocking" at the edgeR User's I decided to block on the "individual" in my model. However, after reading a bit in the literature I got more confused and I am wondering if its actually the right thing to do given that I have three repeated measures.

So my question, Can I still block "individual" in my model given that we have 3 repeated measures?

I have provided the script that I used for you just in case you needed to have a look on what I did:

Age <- pData(New.gset.funnorm.addsnp)$Age Individual <- pData(New.gset.funnorm.addsnp)$Individual InfectionStatus <- pData(New.gset.funnorm.addsnp)$InfectionStatus Sex <- pData(New.gset.funnorm.addsnp)$Sex

biolrep<-Individual

designMatrix.snp1 <- model.matrix(~Infection_Status+Age+Sex)

corfit1 <- duplicateCorrelation(myMs, design=designMatrix.snp1, block = biolrep) dmrCate.snp1<-cpg.annotate("array", myMs, what="Beta", arraytype = "EPIC", analysis.type="differential", design=designMatrix.snp1, coef=2, block=biolrep, correlation=corfit1$consensus.correlation,fdr=0.05) dmrcoutputall.snp1 <- dmrcate(dmrCate.snp1,lambda=500, C=5, pcutoff="fdr",min.cpgs=3,mc.cores=1) wgbs.rangesall.snp1 <- extractRanges(dmrcoutputall.snp1, genome = "hg19") write.csv (wgbs.rangesall.snp1,file="DMRCateModel1DifferentailRange.txt")

Thanks a lot for your time

Dareen

ADD REPLY • link 6.0 years ago dareenalmojil • 0

score 1 · Accepted Answer · 2018-05-11

1

Entering edit mode

Tim Peters ▴ 230

@tim-peters-7579

Last seen 11 weeks ago

Australia

Hi Ash,

Yes, like James said, you'll need to sacrifice some of your covariates to get your design matrix full rank. Once you have a full rank matrix, you can then pass it to cpg.annotate() along with your coefficient of interest (and optionally contrast matrix), and then the regular workflow will do the rest with limma under the hood.

Cheers,

Tim

ADD COMMENT • link 6.8 years ago Tim Peters ▴ 230

0

Entering edit mode

Hi Tim,

Thank you for your guidance. I am trying to identify DMRs for cases compared to controls, using age, sex, cell composition and smoking as covariates in my design matrix. As this was a study design where we collected samples from all cases and controls at two time points I would like to account for the repeated measures for each sample. I have identified ids as biological replicates (each case/control has one unique id used at both time points) and have run corfit. My question is whether I have correctly used block and corfit$consenus correlation in cpg.annotate() or if they should have been included in my model.matrix(). One further question I have is whether DMRcate can be used for continuous independent variables of interest and if so how would one interpret the results/plot the DMRs? I appreciate any further help you can offer! Kind regards, Ash

design<-model.matrix(~case_status + Age + Sex + smoking + ccomp)
biolrep<-Target_Pheno_3$id
corfit <- duplicateCorrelation(myMs.noSNPs, design=design, block = biolrep)
my_annotation<-cpg.annotate("array", myMs.noSNPs, what="M", arraytype = "EPIC", analysis.type="differential", design=design, coef=2, block=biolrep, correlation=corfit$consensus.correlation)

ADD REPLY • link 6.8 years ago Ashlynn • 0

0

Entering edit mode

Hi Ash,

Yes, it looks like that will work. The ellipsis argument (...) at the end of cpg.annotate() is matched to extra arguments for limma::lmFit(), so block and correlation will propagate to the linear fitting.

And yes, DMRcate can definitely be used for continuous responses as well. The only caveats I would add are that a) they are not exactly "DMRs" anymore but regions that significantly correlate with the response variable, b) The "betafc" field in the output should be interpreted as the mean regression slope across the region, instead of the mean difference c) You 'll need to make phen.col length 1 (since response is not categorical) and perhaps order the columns of CpGs to that of your response vector in order to make DMR.plot() output look nice (i.e. see the gradation of methylation values vertically within the "DMR").
Cheers,

Tim

ADD REPLY • link 6.8 years ago Tim Peters ▴ 230

0

Entering edit mode

Hi Tim,

Thanks a bunch for your help with this! I appreciate your guidance.

Best

Ash

ADD REPLY • link 6.8 years ago Ashlynn • 0