Hello,
I am new to the use of DESeq2 and cannot seem to get my head around how to properly set up a baseline adjusted model for an experimental design.
We have conducted an experiment in which 10 subjects were randomized to either treatment or control conditions. Given the small sample size, randomization was incomplete and minor differences remain in read counts between conditions at baseline. I would therefore like to fit an ANCOVA “type” model to adjust for baseline read counts when testing for the effect of treatment assignment at the end of the experiment. Something like: y = b0 + b1(baseline count) + b2(treatment); where y= read count for gene 1 post-treatment, baseline count = read count for gene 1 pre-treatment, and treatment is an indicator variable for treatment assignment. I would like to do this for ~200 genes. Basically, how would I set up this model to get the baseline adjusted treatment effect for each of the 200 genes?
I assume this has been described elsewhere, but I have been unable to find a detailed example (only examples for paired designs where each subject experiences both conditions). I have provided a bit of code below to give an idea of what I would like to.
Thanks in advance for any thoughts or assistance in this matter.
library(DESeq2)
dds.pre <- makeExampleDESeqDataSet(betaSD=1, n=10, m=10)
dds.post <- makeExampleDESeqDataSet(betaSD=2, n=10, m=10)
colData(dds.pre)
colData(dds.post)
rowData(dds.pre)
rowData(dds.post)
#log2fold change for differences in treatment group at end post treatment
dds <- DESeq(dds.post)
res <- results(dds)
res
#What I would like to do (something along the lines of...but unsure how to format up the data to properly join the two matrices and get post treatment gene 1 adjusted for baseline gene 1)
dds.post.adj <- formula (~ baseline_count + condition)
#where baseline_count allows each post treatment gene expression count to be adjusted for its baseline expression count
dds.2 <- DESeq(dds.post.adj)
res.2 <- results(dds.2)
res.2
Can I rephrase your experiment to see if I got this right?
You essentially have 10 samples, each of which comes from a different person/patient/animal. About half of these samples were "treated" and the others were not.
Now you just want to ask what the relative difference of expression is between the treated and untreated group for each gene?
So, you have two RNA-Seq libraries from each subject, one from before and one from after treatment?
Just to echo these other two comments, the best way to convey would just be to give an example of the column data:
subject condition
1 control
1 treat
2 control
2 treat
...