Question

Analyzing paired pre/post data using limma and adjusting for continuous covariates

1

Entering edit mode

Smeeta Shrestha ▴ 20

@smeeta-shrestha-6393

Last seen 9.2 years ago

Hi Ryan and the limma team

I am working on 450K methylation data. We have two groups A and B. Each group has two time points (pre and post intervention). At present I would just like to analyse within group A.

My data looks like this:

Subject ID	Group	Time point (pre/post)	lymphocyte %	monocyte %	granulocyte %

1	A	Pre	lymp_x1	mo_x1	gr_x1
1	A	Post	lymp_x2	mo_x2	gr_x2
2	A	Pre	lymp_x3	mo_x3	gr_x3
2	A	Post	lymp_x4	mo_x4	gr_x4
3	A	Pre	lymp_x5	mo_x5	gr_x5
3	A	Post	lymp_x6	mo_x6	gr_x6
4	B	Pre	lymp_x7	mo_x7	gr_x7
4	B	Post	lymp_x8	mo_x8	gr_x8
5	B	Pre	lymp_x9	mo_x9	gr_x9
5	B	Post	lymp_x10	mo_x10	gr_x10
6	B	Pre	lymp_x11	mo_x11	gr_x11
6	B	Post	lymp_x12	mo_x12	gr_x12

All the “a-f” and “x1-x12” are variables.

My data set is a paired. As most recommended I use the sibship design matrix for the study. I am following page 42 section 9.4.1 in the limma manual.

However I need to adjust for the blood count information which are my continuous covariates (ly , mo and gr).

I have read a few links in the Bioconductor forum (link: Removing continuous covariate effects in limma analysis).

I would like to adjust my data for these covariates and have attempted my code as follows:

First script is normal sibship without adjustment for cell count (fig 2 part 1)

Second script is adjustment for cell count ( fig2 part 2)

Script 1

#########################################################################################
## 1. Normal SibShip WITHOUT covariate adjustment
#########################################################################################
library("lumi")
library("limma")
Beta <- read.table("Beta.csv",head=TRUE,sep=",",check.names=FALSE)
Beta <- Beta [-1]## remove column 1 with probe ID
Mval <- beta2m(Beta) ## convert beta to M values
Target <- read.table("Target.2.csv",head=TRUE,sep=",",check.names=FALSE)
SibShip <- factor(Target$SibShip)
Treat <- factor(Target$Treatment, levels=c("C","T"))
design <- model.matrix(~SibShip+Treatment)
## with M values
fit.M <- lmFit(Mval, design)
fit.M <- eBayes(fit.M)
topTable(fit.M, coef="TreatT")

Script 2

#########################################################################################
## 2. Adjustment for blood count -ly mo and gr
#########################################################################################
library("lumi")
library("limma")
Beta <- read.table("Beta.csv",head=TRUE,sep=",",check.names=FALSE)
Beta <- Beta [-1]## remove column 1 with probe ID
Mval <- beta2m(Beta) ## convert beta to M values
Target <- read.table("Target.2.csv",head=TRUE,sep=",",check.names=FALSE)
SibShip <- factor(Target$SibShip)
Treat <- factor(Target$Treatment, levels=c("C","T"))
design <- model.matrix(~SibShip+Treat)
ly <- factor (Target$ly)
mo <- factor (Target$mo)
gr <- factor (Target$gr)
design.1 <- model.matrix(~SibShip+Treat+ly+mo,Target)
fit.cov <- lmFit(Mval,design.1)
fit.cov <- eBayes(fit.cov)
topTable(fit.cov, coef="TreatT")
topTable(fit.cov, adjust="BH")

My input files -- My target file looks like this:

fig 2

FileName	SibShip	Treatment	Sample_Group	cov1	cov2	cov3
File 1	26	C	Pre	x1	y1	z1
File 2	26	T	Post	x2	y2	z2
File 3	28	C	Pre	x3	y3	z3
File 4	28	T	Post	x4	y4	z4
File 5	32	C	Pre	x5	y5	z5
File 6	32	T	Post	x6	y6	z6
File 7	34	C	Pre	x7	y7	z7
File 8	34	T	Post	x8	y8	z8
File 9	36	C	Pre	x9	y9	z9
File 10	36	T	Post	x10	y10	z10

And my beta file looks like this:

fig 3

	File 1	File 2	File 3	File 4	File 5	File 6	File 7	File 8
cg13869341	0.81435	0.86286	0.90742	0.880754	0.877508	0.841507	0.714104	0.877911
cg15560884	0.772191	0.787651	0.761191	0.799986	0.747465	0.752523	0.776277	0.801894
cg01014490	0.045361	0.077699	0.058017	0.033827	0.038075	0.055925	0.031437	0.057016
cg17505339	0.913544	0.929492	0.941861	0.917624	0.901202	0.909436	0.938309	0.918753
cg16736630	0.545144	0.517417	0.462861	0.438669	0.44927	0.555143	0.648387	0.531261
cg03128332	0.254491	0.229242	0.32559	0.211406	0.303473	0.282476	0.159184	0.236487
cg18147296	0.892314	0.891493	0.882442	0.8969	0.884726	0.860286	0.876255	0.924667
cg05597748	0.775692	0.786959	0.815256	0.814986	0.809486	0.816575	0.805664	0.845997

I need to compare pre vs post so do I need to make a constrast matrix or the 1st script ok.

cont.matrix <- makeContrasts(prevspost=post-pre,levels=design)

Some warnings were generated in the 2nd script ( figure 1 below)
Is my 2nd script correct or do I need to use the removeBatchEffect()function.
My objective is to adjust for blood count and proceed to do a differential analysis between pre and post for group A.
Then do the same for group B.
Finally use the limma manual page 48 section 9.7 Multi- level experiments.

Figure 5 .Warning on running script 2

http://tinypic.com?ref=2qst1c8" target="_blank">http://i64.tinypic.com/2qst1c8.jpg" border="0" alt="Image and video hosting by TinyPic">

Thank you for your attention and hope to hear from you.

Regards

smeeta

limma pre/post continous covariate • 2.9k views

ADD COMMENT • link 9.3 years ago Smeeta Shrestha ▴ 20

score 1 · Answer 1 · 2016-01-13

For your first question, you haven't put Sample_Group in your design matrix, so the cont.matrix you've constructed doesn't make any sense. In fact, you can't put Sample_Group in your design matrix because it's identical to Treatment, so one would be redundant with the other. Rather, you should be comparing the levels C and T in Treatment, by dropping TreatT as you've done in your first script.

For your second question, the code in your supplied image shows design.1 is constructed as:

design.1 <- model.matrix(~SibShip+Treat+ly+mo+gr,Target)

... but the gr is missing in the code you've posted above. In the image, the warning is because the sum of the cell proportions is equal to 1 for each sample. This results in linear dependencies because the addition of all the SibShip blocking factors will also be equal to 1. There will be an infinite number of solutions by swapping values between coefficients corresponding to SibShip and the cell proportions. Hence, one of the proportion terms is dropped to ensure that everything is estimable.

Your remaining points aren't really questions, so I'll just give some general comments. The first is that you better make sure that the proportions are not correlated with the treatment, i.e., you don't get consistently more or less of one cell type after treatment across all samples. If this occurs, the terms corresponding to the cell proportions may absorb the effect of interest. Conversely, if the cell proportions are the same before/after treatment, then they aren't really necessary as they will be redundant with SibShip.

In addition, your model assumes that you have a linear response between the cell proportion and the observations. This may not be entirely accurate when your observations are M-values that are logit-transformed. For example, if your lymphocytes were heavily methylated (and all other cell types were non-methylated), and you had twice as many of them in one sample compared to another (and all other things being equal), you might expect a doubling of the methylation level, rather than a doubling of the M-values. Normally, this wouldn't be a problem because I'd suggest you use splines that can accommodate arbitrary trends with respect to the covariates; however, this isn't practical here because you've got too many covariates and not enough samples. I don't really see an alternative model that can overcome this, so assuming a linear response is probably better than nothing at all if the cell proportion really has an effect.

score 0 · Answer 2 · 2016-01-13

Hi Aaron,

Thank you so much for the reply.

I would like to discuss the points which you have brought out.

Well for the first point brought up. I agree that “Treatment” is same as Sample_Group. So my question there was do I just proceed as per the script 1 or do I need to add a contrast. My objective is just to find differentially methylated CpGs in pre (c) compared to post (T) for Group A. Do I need to make a constrast matrix ?

cont.matrix <- makeContrasts(CvsT=T-C,levels=design) or will it be redundant to the script 1.

Yes there was an error from my end. My syntax needed to include the “gr” as well.

The general comments

The first is that you better make sure that the proportions are not correlated with the treatment, i.e., you don't get consistently more or less of one cell type after treatment across all samples. I have checked the data, Some individuals do show change of proportion with treatment, albeit not significant. I think we should go for point 5- same cell count before/after treatment.
If this occurs, the terms corresponding to the cell proportions may absorb the effect of interest. Yes, agreed but I need to negate/adjust this change so that I get the change in methylation only from the treatment effect and not contributed by the cell count effect.
Conversely, if the cell proportions are the same before/after treatment, then they aren't really necessary as they will be redundant with SibShip.. So do I understand you correctly that if I use the sibship design, I am already adjusting for the other factors (cell count)
In addition, your model assumes that you have a linear response between the cell proportion and the observations. This may not be entirely accurate when your observations are M-values that are logit-transformed. For example, if your lymphocytes were heavily methylated (and all other cell types were non-methylated), and you had twice as many of them in one sample compared to another (and all other things being equal), you might expect a doubling of the methylation level, rather than a doubling of the M-values. Yes I do agree with the explanation, so would it be better if I used the beta values directly and not the M values ?

Finally

Is there anything wrong with the syntax for script 1 and 2 one for without cell count adjustment (script 1) and one for the cell count adjustment (script2).
I am new to R and it is taking me time to understand the logic beside the different syntax used in limma model design. With the available information online and the limma manual I attempted to address a simple question ie how can I obtain the list of differentially methylated regions between Pre and post in a single group A with and without adjusting for continues cell count covariates. I hope the method is correct.
Or should I do a linear regression to check if there is any significant linear relationship between each covariate (ly,mo and gr) separately with my pre/post methylation data set and IF and ONLY IF I get a significant association of the cell count with the methylation data then I do the adjustment.
Also if I understand your input at point #5. The sibship design takes all these factors into consideration when generating the model.

Last doubt:

Is sibship model applicable for both of these expt .

Expt Case 1. Same individual but different tissue- tumor and Normal.

Expt Case 2. Same individual data at time point 1 and data at time point 2 (1 yr interval).

Mine is the second case, so I was wondering is the sibship from 9.4.1 Paired Samples pg 42 the correct model design or should I be using the section 9.6 Time Course Experiments pg 46

Thank you so much for your attention and hope to hear soon.

Smeeta

score 0 · Answer 3 · 2016-01-15

0

Entering edit mode

Smeeta Shrestha ▴ 20

@smeeta-shrestha-6393

Last seen 9.2 years ago

Hi Aaron,

Thank you for your response.

I will look into this.

regards

smeeta

ADD COMMENT • link 9.3 years ago Smeeta Shrestha ▴ 20