Enter the body of text here Hi, I am analysing a RNA Seq dataset coming from 3 independent cell isolates (isolate1, isolate2, isolate3), each given 3 different treatments (control, drug1, drug2).
We are testing drug 1 against control in the first instance:
We also observed that there is some variation among isolates - i.e., batch effect
We load the data into DESeq2, the coldata looks like this:
drug isolate
Sample 1 control isolate1
Sample 2 control isolate2
Sample 3 control isolate3
Sample 4 drug1 isolate1
Sample 5 drug1 isolate2
Sample 6 drug1 isolate3
Code should be placed in three backticks as shown below
configure = data.frame(drug=factor(c("control","control","control","drug1","drug1","drug1")), isolate=c("isolate1","isolate2","isolate3","isolate1","isolate2","isolate3"))
dds = DESeqDataSetFromMatrix(countData = readcount[-1],colData = configure,design = ~ drug + isolate)
Then I performed DEG calling later
dds <- DESeq(dds)
res <- results(dds)
However it showed this message:
log2 fold change (MLE): isolate isolate1 vs isolate2
Wald test p-value: isolate isolate1 vs isolate2
DataFrame with 15793 rows and 6 columns
I suppose as the formula design ~ drug + isolate suggests, it is going to test the supposed major effet which is the drug, rather than the isolate, and in the same time account for the batch effect brought in by isolates. But it has tested the effect of isolate1 vs isolate 2 instead, even though there are only 2 samples per isolate.
Could you please help how it would happen and how should I correct it? Or should I rearrange the design to ~ isolate + drug.
I am also very curious, given the 3X3 nested design, do we have an option to include all 9 samples under one dds and specify with R which ones we would like to involve in the test each time? Thank you very much!
Thank you very much for your suggestions!