There are maybe two questions going on here.
Normalizing by "replicate"
Unless there is something common among the different replicates in your experiment, I'm not sure that including the replicate term in your model as you are doing and trying to normalize is possible/reasonable. Allow me to outline a few different scenarios that would enable you to use something like a replicate term it sounds like you are thinking about.
You mention that you have 5 replicates for each of your 6 serial dilutions including control, ie. 1:1, 1:2, 1:3, 1:4, 1:5, and control.
You may have done one complete round of serial dilutions + controls and performed RNA-extraction over the course of five days: if so, adding a "replicate" term with values d1, d2, ..., d5 can make sense, and your experimental colData()
might look something like this:
dilution replicate
1:1 d1
1:2 d1
1:3 d1
1:4 d1
ctrl d1
1:1 d2
1:2 d2
...
1:5 d5
ctrl d5
Or perhaps you did full rounds of 1:1, ..., 1:5 at different times during the same day? Like, once before your first coffee, then another before your second coffee, then a third before your third coffee, then you had breakfast, then you figured "I don't have time for all this" and bypassed the whole "brewing the coffee" step and managed to fit a few more rounds of dilution and rna extraction between spoons of ground coffee you were shoveling.
dilution replicate
1:1 coffee1
1:2 coffee1
1:3 coffee1
1:4 coffee1
ctrl coffee1
1:1 coffee2
1:2 coffee2
...
1:5 spoon2
ctrl spoon2
Another scenario might have been that you prepped three rounds of dilution + control samples, then you said "jeez, I could really use a coffee" so you went ahead and drank some, after which you prepared the final two dilutions rounds and two control samples. You might encode your "replicate" covariate with BC (before coffee) and AC (after coffee):
dilution replicate
1:1 BC
1:2 BC
1:3 BC
1:4 BC
ctrl BC
1:1 BC
1:2 BC
...
1:5 AC
ctrl AC
Or, perhaps, you had 5 different people perform a full round of dilutions ... then in this case, the replicate would correspond to the "operator" who processed a complete set of samples.
Is there something along those lines you can try to control for?
Normalization against housekeeping genes
This most likely isn't going to be the way to go ...
So is our design correct? Does the design "~ replicate + condition" mean that when we do a contrast for condition "control" vs condition "treated" that the replicate differences will be normalized? I'm also a little unclear, after reading through the DEseq2 literature, and the R script explanations for design variables, what the " + " sign is telling DEseq2 to do with regards to normalization.
You should consult with a statistician to better understand what is happening.
Yes that is what the design you have does.