Hi everybody,
Recently I have been doing differential gene expression analysis. I am new to this, and I struggle a bit with my design.
I have 4 different cell lines: A, B, C and D. In previous experiments I found that cell lines A and B are resistant to radiation, while cell lines C and D are sensitive. I am interested in the differential gene expression in the resistant vs sensitive group. However, I want to control for cell line specific differences.
My dataset looks something like this:
# creating example dataframe
df <- data.frame(cline = factor(rep(c("A","C","B", "D"),each=3)),
replicate = factor(rep(rep(c("1","2", "3"),each=1),4)),
group = factor(rep(rep(c("resistant","sensitive"),each=3),2)))
#reorder to increase readability
df <- df[order(df$cline), ]
# show dataframe
print(df)
Initially I tried to model this using:
design = ~ cline + group
However, when I input this design in DESeq2, I get an 'the model matrix is not full rank' error. I know this is probably because the resistant and sensitive groups are uniquely defined by the clines. However I am unsure how to redefine the design column to account for this.
Any help would be really welcome!
Kind regards,
Dear Dr. MacDonald,
Thank you for your insightful explanation. I assume there is also no way to account for this in the experimental design (I don't mean the model matrix but the actual experimental design)? In other words, this would always be a problem in these kinds of experiments and these kind of comparisons?
Kind regards,
No, you can't account for it in the experimental design either. You can get a set of genes that are differentially expressed, but at that point you have a conundrum. You already know the cell lines have different phenotypes (the sensitivity or resistance to radiation), and those differences may be conferred by one or more of the genes that are different (or maybe not - it's always possible that the difference is due to a mutation in one or more genes, and they may well be expressed at the same level). But they are different cell lines, and there may be other differences as well that have nothing to do with the radiation sensitivity.
You can always get the list of differentially expressed genes and then try to infer which are responsible for the radiation sensitivity differences and do knock outs of the resistant lines to check.