Hi,
I would like to ask for help in the design component of my Deseq2 RNA-seq analysis. My data set consists of 6 samples: 3 treated samples (gene overexpression) and three untreated samples (control_vector). The samples are paired in that the biological reps of each group are the same plant leaf cut in half and exposed to the two different treatments. Another confounding factor in this sequencing experiment seems to be a batch effect. My bio rep #1s comes from a first round of experiments and was sequenced at a depth of ~12.5 mil reads/sample (this is 3' seq). Bio reps #2 and #3 were from an experiment conducted on a seperate day with the sequencing also done later at a depth of ~6 mil reads/ sample. When I look at the PCA of my normalized reads, my data is first separated by leaf. That is bio reps #1 (treated and untreated) pair together, bio reps #2 pair together then bio reps #3 pair together. The data seperates on PCA2 according to batch. That is bio reps #1 are separated from bio reps #2 and #3 along this axis. I'm unsure what causes their seperation on PCA3, but I don't see the effects of my treatment (gene overexpression) until PCA4. Below is my col data and the current design I'm using for my DEG. When analyzing all the data together I get only 4 genes that significantly DE including my OE gene. If I analyze just batch two, I get ~40 genes DE, but many genes including my overexpressed genes now have p-adj values of NA. Any advice is appreciated. Coldata:
SampleName LeafNo Treatment Batch
treated_1 one OE one
treated_2 two OE two
treated_3 three OE two
control_1 one control one
control_2 two control two
control_3 three control two
Current design: design = ~LeafNo + Treatment
HI Michael,
Thanks for taking the time to look at my question. I was just reading back through and realized I didn't explicitly ask (but maybe you did get the gist)--my formula now is controlling for LeafNo. effect only. Can the design be written such that both 'LeafNo' and 'Batch' can be controlled to reduce the background noise in order to see a greater effect of the treatment. Maybe the answer is no!?!
Thank you again for your time.
Leaf number controls for batch if the leaves are nested within batch.
Question...how good a job can you do modeling LeafNo with only 2 samples of each leaf? Is it really fruitful to try and fit two factors with only 6 total samples?