Hi all,
I am looking for a bit of enlightenment regarding edgeR. We are
applying
edgeR using ANOVA-like testing. We have 2 conditions (Cases and
Controls)
sampled at 2 treatments (untreated and treated) with 64 biological
replicates per condition (64 X 2 = 128 samples).
We want to test 3 ways:
1. DE between untreated and treated for cases.
2. DE between untreated and treated for controls.
3. DE between untreated and treated responding differently between
Cases and Controls.
I am curious if my design matrix is set up properly and if we are
testing
with the right coefficients and contrasts as seen below to address
these
three approaches.
Any insight is helpful.
Yours,
Michael
rawdata <- read.delim("Matrix.txt", check.names=FALSE,
stringsAsFactors=FALSE)
targets <- read.delim("Described.txt", check.names=FALSE,
stringsAsFactors=FALSE)
head(targets)
Condition Patient Treatments
1 Case 4060 untreated
2 Case 4060 treated
3 Case 4096 untreated
4 Case 4096 treated
5 Case 4099 untreated
6 Case 4099 treated
y <- DGEList(counts=rawdata[,2:129], genes=rawdata[,1:1])
keep <- rowSums(cpm(y)>1) >= 64
y <- y[keep,]
dim (y)
y <- calcNormFactors(y)
y$samples
design <- model.matrix(~Condition+Condition:Patient+Condition:
Treatments,
data=targets)
colnames(design)
[1] "(Intercept)" "ConditionControl"
[3] "ConditionCase:Patient" "ConditionControl:Patient"
[5] "ConditionCase:Treatments " "ConditionControl:Treatments "
y <- estimateGLMCommonDisp(y, design, verbose=TRUE)
y <- estimateGLMTrendedDisp(y, design)
y <- estimateGLMTagwiseDisp(y, design)
fit <- glmFit(y, design)
colnames(fit)
# DE between untreated and treated for cases
lrt <- glmLRT(fit, coef="ConditionCase: Treatments ")
# DE between untreated and treated for cases
lrt2 <- glmLRT(fit, coef="ConditionControl:Treatments ")
# DE between untreated and treated responding differently between
cases and
controls
lrt3 <- glmLRT(fit, contrast=c(0,0,0,0,1,-1)
[[alternative HTML version deleted]]
Gordon,
Any insight on the below?
I am looking for a bit of enlightenment regarding edgeR. We are
applying
edgeR using ANOVA-like testing. We have 2 conditions (Cases and
Controls)
sampled at 2 treatments (untreated and treated) with 64 biological
replicates per condition (64 X 2 = 128 samples).
We want to test 3 ways:
1. DE between untreated and treated for cases.
2. DE between untreated and treated for controls.
3. DE between untreated and treated responding differently between
Cases and Controls.
I am curious if my design matrix is set up properly and if we are
testing
with the right coefficients and contrasts as seen below to address
these
three approaches.
Any insight is helpful.
Yours,
Michael
rawdata <- read.delim("Matrix.txt", check.names=FALSE,
stringsAsFactors=FALSE)
targets <- read.delim("Described.txt", check.names=FALSE,
stringsAsFactors=FALSE)
head(targets)
Condition Patient Treatments
1 Case 4060 untreated
2 Case 4060 treated
3 Case 4096 untreated
4 Case 4096 treated
5 Case 4099 untreated
6 Case 4099 treated
y <- DGEList(counts=rawdata[,2:129], genes=rawdata[,1:1])
keep <- rowSums(cpm(y)>1) >= 64
y <- y[keep,]
dim (y)
y <- calcNormFactors(y)
y$samples
design <- model.matrix(~Condition+Condition:Patient+Condition:
Treatments,
data=targets)
colnames(design)
[1] "(Intercept)" "ConditionControl"
[3] "ConditionCase:Patient" "ConditionControl:Patient"
[5] "ConditionCase:Treatments " "ConditionControl:Treatments "
y <- estimateGLMCommonDisp(y, design, verbose=TRUE)
y <- estimateGLMTrendedDisp(y, design)
y <- estimateGLMTagwiseDisp(y, design)
fit <- glmFit(y, design)
colnames(fit)
# DE between untreated and treated for cases
lrt <- glmLRT(fit, coef="ConditionCase: Treatments ")
# DE between untreated and treated for cases
lrt2 <- glmLRT(fit, coef="ConditionControl:Treatments ")
# DE between untreated and treated responding differently between
cases and
controls
lrt3 <- glmLRT(fit, contrast=c(0,0,0,0,1,-1)
[[alternative HTML version deleted]]
The analysis not correct at the moment because you have not declared Patient to be a factor in R. Because the patient identifiers are numbers ("4060" etc), R by default thinks that Patient is to be treated as a numerical covariate. You need to use the factor() function to tell R that the entries for Patient are just identifiers, not numerical
quantities.
Otherwise the analysis might be fine, but I can't tell for sure without seeing all 128 rows of the targets frame. I think that you may be following Section 3.5 of the edgeR User's Guide. Look carefully at how Patient is defined in that example.
There needs to be a column in the design matrix for each Patient, except for the first in each condition group.