Dear all,
I'm performing a differential gene expression analysis using limma.
My study design is simple in that I only want to compare control and disease within three different groups, however, I want to control for 3 categorical covariates - site, sampling location and gender, and one continuous covariate - age. The contrasts of interest are:
CP_1 to CP_2: DEGs between control CP (CP_1) and diseased CP (CP_2)
C_1 to C_2: DEGs between control C (C_1) and diseased C (C_2)
N_1 to N_2: DEGs between control N (N_1) and diseased N (N_2)
Thus, '_1' represents a control, and '_2' represents a diseased case. I am not interested in genes associated with gender/age or any of the covariates, I just want to identify the disease effect whilst controlling for the covariates.
My targets file is follows:
> targets[1:10,] Sample_ID Group Site Sampling_location Gender Age 1 emiasb1 CP_1 leg S_2 Female 54 2 emiasb2 CP_1 leg S_1 Female 74 3 emiasb3 CP_1 leg S_1 Female 39 4 emiasb4 CP_1 leg S_1 Female 57 5 emiasb5 CP_1 leg S_1 Male 32 6 emiasb6 CP_1 leg S_1 Male 32 7 emiasb7 CP_1 buttocks S_3 Male 63 8 emiasb8 CP_1 leg S_2 Male 46 9 emiasb9 CP_1 leg S_3 Male 72 10 emiasb10 CP_1 leg S_2 Male 58 # sample numbers > table(targets$Group) C_1 C_2 CP_1 CP_2 N_1 N_2 136 57 87 27 87 30 f <- factor(targets$Group) # cell means model with sampling location, age, site, and gender as blocking factors design <- model.matrix(~0 + f + Sampling_location + as.numeric(Age) + Site + Gender, targets) head(design) > head(design) fC_1 fC_2 fCP_1 fCP_2 fN_1 fN_2 Sampling_locationS_2 Sampling_locationS_3 Age Sitebuttocks Siteleg Sitethigh GenderMale 1 0 0 1 0 0 0 1 0 54 0 1 0 0 2 0 0 1 0 0 0 0 0 74 0 1 0 0 3 0 0 1 0 0 0 0 0 39 0 1 0 0 4 0 0 1 0 0 0 0 0 57 0 1 0 0 5 0 0 1 0 0 0 0 0 32 0 1 0 1 6 0 0 1 0 0 0 0 0 32 0 1 0 1 #fit model fit <- lmFit(affydat, design) # define contrast matrix cont.matrix <- makeContrasts(P1="fCP_1-fCP_2",P2 = "fN_1-fN_2", P3 = "fC_1-fC_2", levels=design) > cont.matrix Contrasts Levels P1 P2 P3 fC_1 0 0 1 fC_2 0 0 -1 fCP_1 1 0 0 fCP_2 -1 0 0 fN_1 0 1 0 fN_2 0 -1 0 Sampling_locationS_2 0 0 0 Sampling_locationS_3 0 0 0 Age 0 0 0 Sitebuttocks 0 0 0 Siteleg 0 0 0 Sitethigh 0 0 0 GenderMale 0 0 0 fit2 <- contrasts.fit(fit, cont.matrix) fit3 <- eBayes(fit2, trend = TRUE) # the results for CP_1-CP_2 is in res <- topTable(fit3,coef=1,n = Inf, adjust="fdr", p = 0.05)
My results appear to be consistent with published studies however, I would like to confirm that my design is correct, and that I am appropriately controlling for the confounding variables before taking this any further.
Cheers,
Dave
The design and correction looks fine to me except I don't think eBayes(fit2, trend = TRUE ) is necessary? Why did you add trend=TRUE? Perhaps Gordon or some one similar can advise here also.
The
trend=TRUE
argument will model any mean-variance trend in the data. This is required in cases where you have a mean-variance trend (obviously), otherwise your estimates of the prior d.f. and subsequent EB shrinkage would be confounded by systematic mean-dependent differences from a common prior variance. You can check whether you have a mean-dependent trend withplotSA
- read the limma user's guide for more details.Hello,
I'm trying to do a differential methylation analysis with the Limma package and was wondering if you had any suggestions on how to handle this trend argument with this type of data? By applying plotSA empirically on my data, I don't have any particular trend on my variance but I don't know if this type of analysis is adapted to methylation data? In particular, when I try trend = TRUE or not in my eBayes function I get quite a different number of differentially methylated CpGs without really understanding the subtlety of what this represents from a biological point of view?
Thanks in advance for any suggestion,
Hortense