Dear users,
I am struggling to understand if my design is correct. I found edgeR section 3.3.1 similar to my situation but I am not that confident.
Here is my experimental design:
samples_table
sampleId cellLine treatment time IC
s1 a vehicle 0 S
s2 a drug 48 S
s3 a drug 168 S
s4 b vehicle 0 S
s5 b drug 48 S
s6 b drug 168 S
s7 c vehicle 0 S
s8 c drug 48 S
s9 c drug 168 S
s10 d vehicle 0 S
s11 d drug 48 S
s12 d drug 168 S
s13 e vehicle 0 R
s14 e drug 48 R
s15 e drug 168 R
s16 f vehicle 0 R
s17 f drug 48 R
s18 f drug 168 R
I have 6 cell lines treated with a drug and RNA sequenced after 48 and 168 hours of treatment. Last column indicates if the cell line is susceptible or resistant to another compound.
I would like to find how resistant cell lines differentially respond to the drug at 48 and/or at 168 hours compared to resistant ones.
Here is my approach:
group <- factor(paste(samples_table$IC, samples_table$time, sep="."))
y <- DGEList(counts.keep, group=group)
y <- calcNormFactors(y)
design <- model.matrix(~0 + group)
colnames(design) <- levels(group)
y <- estimateDisp(y, design)
# Quasi-likelihood test
fit <- glmQLFit(y, design)
design
R.0 R.168 R.48 S.0 S.168 S.48
1 0 0 0 1 0 0
2 0 0 0 0 0 1
3 0 0 0 0 1 0
4 0 0 0 1 0 0
5 0 0 0 0 0 1
6 0 0 0 0 1 0
7 0 0 0 1 0 0
8 0 0 0 0 0 1
9 0 0 0 0 1 0
10 1 0 0 0 0 0
11 0 0 1 0 0 0
12 0 1 0 0 0 0
13 1 0 0 0 0 0
14 0 0 1 0 0 0
15 0 1 0 0 0 0
attr(,"assign")
[1] 1 1 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$group
[1] "contr.treatment"
my.contrasts <- makeContrasts(
S48 = S.48 - S.0,
S168 = S.168 - S.0,
S168vs48 = S.168 - S.48,
R48 = R.48 - R.0,
R168 = R.168 - R.0,
R168vs48 = R.168 - R.48,
RvsS.0 = R.0 - S.0,
RvsS.48 = (R.48 - R.0) - (S.48 - S.0),
RvsS.168 = (R.168 - R.0) - (S.168 - S.0),
all = (R.48 + R.168 - R.0) - (S.48 + S.168 - S.0),
levels = design
)
# to find genes that differentially respond at 48h between resistant and susceptible cell lines
qlf <- glmQLFTest(fit,
contrast = my.contrasts[ , "RvsS.48"])
topTags(qlf)
What do you think? Shouldn't I account the fact that I have different cell lines as a sort of batch effect?
Thanks a lot
Pietro
PS: cross posted to https://www.biostars.org/p/370900/
Interesting, thank you Aaron.
Unfortunately I get no significant DE genes, probably because of the high variability of the cell lines.
The high variability of the response of each cell line, to be precise.
If you can't treat the cell lines as replicates, an alternative would be to do something like this:
Here, you get one intercept and slope term per cell line. You might then do:
i.e., is there any difference between the average response of the resistant and sensitive cell lines. This will avoid problems with variability across cell lines, but the resulting DE genes are less generalizable (e.g., to new cell lines). There's also an assumption of linearity with respect to time.
Thanks Aaron,
It's amazing how much we learn from guys like you.
Still all FDR == 1, though.
Here is what I meant... poor design?
The strong separation between cells lines is not, in and of itself, poor design, because that just represents the biological truth. The "poorness" of the design stems from the absence of replicates within each cell line, requiring you to treat cell lines as replicates of each other. This is probably iffy, not least because the resistant cell lines are unlikely to be randomly and independently sampled from the "distribution" of all possible resistant cell lines (whatever that means); same for the sensitive cell lines.
If one had replicates within each cell line, there would be more statistical power from more samples. You wouldn't have to make the linearity assumption, which would probably improve the model fit and allow you to test differences at specific time points. Testing the effects of specific cell lines is also more concrete with regards to reproducibility, as anyone attempting to replicate the study would only need to choose the same cell lines (rather than trying to figure out how to sample from the "distribution" of cell lines).