edgeR design matrix question
2
0
Entering edit mode
Kemal Akat • 0
@kemal-akat-7178
Last seen 4.2 years ago
USA/New York/The Rockefeller University

Dear Bioconductor Community,

I have an RNAseq experiment and was wondering if I could get some input on my design matrix (here used for an analysis in edgeR).

Tissue samples from 7 healthy individuals were evaluated after treatment with a drug (XTC) or placebo at two time points, 1 and 2 h after treatment, for all participants, and after 6 h for 3 of the participants. For every time point, except the 6 h time point, we have matching samples from each patient.

The target frame looks like this:

               Patient     Treatment   TimeHours
P1XTC2         P1           XTC             2
P1XTC1         P1           XTC             1
P1Placebo2     P1           Placebo         2
P1Placebo1     P1           Placebo        1
P2XTC2         P2           XTC              2
P2XTC1         P2           XTC              1
P2XTC6         P2           XTC              6
P2Placebo2     P2           Placebo         2
P2Placebo1     P2           Placebo         1
P3XTC2         P3           XTC               2
P3XTC1         P3           XTC               1
P3XTC6         P3           XTC               6
P3Placebo2     P3           Placebo          2
P3Placebo1     P3           Placebo          1
P4XTC2         P4           XTC               2
P4XTC1         P4           XTC               1
P4XTC6         P4           XTC               6
P4Placebo2     P4           Placebo         2
P4Placebo1     P4           Placebo         1
P5XTC2         P5           XTC              2
P5XTC1         P5           XTC              1
P5XTC6         P5           XTC              6
P5Placebo2     P5           Placebo         2
P5Placebo1     P5           Placebo         1
P6XTC2         P6           XTC              2
P6XTC1         P6           XTC              1
P6Placebo2     P6           Placebo        2
P6Placebo1     P6           Placebo        1
P7XTC2         P7           XTC              2
P7XTC1         P7           XTC              1
P7Placebo2     P7           Placebo         2
P7Placebo1     P7           Placebo         1


The count matrix like this:

Gene    P1XTC2    P1XTC1    P1Placebo2    P1Placebo1    P2XTC2    P2XTC1    P2XTC6

Gene1    130041    294709    693018    392083    271540    149505    450974
Gene2    125    473    548    251    926    327    171
Gene3    0    22    25    16    34    6    12
Gene4    56870    195167    194034    230998    140803    83530    344095
Gene5    256    540    1153    999    808    291    485
Gene6    18813    51631    101162    74585    27923    19618    68625
Gene7    19    27    69    41    88    10    27
Gene8    7817    18459    33485    20042    22944    9367    22605
Gene9    776    1449    2652    1985    1303    911    1448
Gene10    5089    9470    24026    13697    11781    4119    10764


Now, fitting and testing the model:
 

> library(edgeR)
> Group = factor(paste(targets$Treatment, targets$TimeHours, sep = "."))
> Group = relevel(Group, ref = "Placebo.1")
> cbind(targets, Group)
> d = DGEList(counts, group = Group)
> d = calcNormFactors(d)
> cpm_d = cpm(d, normalized.lib.sizes = TRUE)
> d = d[rowSums(cpm_d > 1) >= 2, ]
> d = estimateDisp(d)
> patient = factor(targets$Patient)
> design = model.matrix(~ 0 + patient + Group)
> colnames(design) = gsub("patient|Group", "", colnames(design))
> fit = glmFit(d, design)


If defined like this, the comparisons are based on the 1 h placebo (Placebo.1),
which as expected does not appear as a coefficient in the design matrix.
 

> colnames(fit)

[1] "P1"        "P2"        "P3"        "P4"        "P5"        "P6"        "P7"        "Placebo.2"
[9] "XTC.1"     "XTC.2"     "XTC.6”


Now testing for the different comparisons of interest.

## 1 hour treatment vs. 1 hour placebo

> lrt = glmLRT(fit, coef = 9)

## 2 hour treatment vs. 2 hour placebo

> lrt = glmLRT(fit, contrast = c(-1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0))

## 6 hour treatment vs. 1 hour placebo, no 6 hour placebo available.

> lrt = glmLRT(fit)) # last coefficient, no need to specify.


I have tested a variety of design matrices (with and without intercept, not considering the pairing etc.) and the results are quite robust across all these tests. But I would be thankful for some insightful comments from the community - of course pointing at obvious mistakes/misconceptions, or giving examples of some potentially informative alternative approaches.

Cheers,
Kemal

--
Dr. Kemal Akat
HHMI/The Rockefeller University
Laboratory of RNA Molecular Biology
1230 York Avenue, Box #186
New York, NY 10065

R> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] locfit_1.5-9.1 edgeR_3.8.2    limma_3.22.1   colorout_1.0-3 setwidth_1.0-3

loaded via a namespace (and not attached):
[1] grid_3.1.2      lattice_0.20-29 tools_3.1.2    

edgeR limma rnaseq design and contrast matrix • 1.8k views
ADD COMMENT
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 8 hours ago
The city by the bay

Your call to estimateDisp should be replaced with:

d = estimateDisp(d, design)

This means that the dispersion estimation will account for the patient blocking factors, rather than assuming that corresponding samples from different patients are direct replicates in a one-way layout.

In addition, for your comparison between 2 hour placebo and treatment, it seems that the correct call should be:

lrt = glmLRT(fit, contrast = c(0, 0, 0, 0, 0, 0, 0, -1, 0, 1, 0))

This should give the desired contrast of XTC.2 - Placebo.2 = 0, according to your design matrix.

ADD COMMENT
0
Entering edit mode
Kemal Akat • 0
@kemal-akat-7178
Last seen 4.2 years ago
USA/New York/The Rockefeller University

Thank you Aaron! Much appreciated.

 

ADD COMMENT

Login before adding your answer.

Traffic: 592 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6