For concreteness, suppose that my metadata has the following form
> # the following is a partial listing! > head(metadata, 12) drug dose replicate 1 NONE NaN 1 2 A 1 1 3 A 10 1 4 A 100 1 5 B 1 1 6 B 10 1 7 B 100 1 8 C 1 1 9 C 10 1 10 C 100 1 11 NONE NaN 2 12 A 1 2
In words, for each (biological) replicate, I have one "no-treatment" observation (drug = NONE, dose = NaN), and 9 "treatment" observations, where the latter result from applying 3 drugs at 3 different doses. (The metadata consists of n blocks of rows, one for each replicate number, and identical in every way, except for the value of the replicate
column--which is constant for each block. IOW, if n is the number of replicates, table(metadata$drug, metadata$dose, useNA = "ifany")
would produce something like this:
1 10 100 NaN A n n n 0 B n n n 0 C n n n 0 NONE 0 0 0 n
The excerpt shown earlier includes the block for k=1, plus the first two rows of the block for k=2. Also, each value of the replicate
column corresponds to a different batch of cells; the cells in each batch are grown together, and then divided into wells, and treated with one of the 10 possible combinations of the drug
and dose
factors.)
In its simplest form, the goal of such an experiment is to compare the effect of the various treatments against the untreated condition.
One way to do this would be to add a condition
pseudo-factor to the metadata, to encode each combination of drug and dose:
> # the following is a partial listing! > head(augmented_metadata, 12) drug dose replicate condition 1 NONE NaN 1 0 2 A 1 1 1 3 A 10 1 2 4 A 100 1 3 5 B 1 1 4 6 B 10 1 5 7 B 100 1 6 8 C 1 1 7 9 C 10 1 8 10 C 100 1 9 11 NONE NaN 2 0 12 A 1 2 1
Then I would begin my DESeq2 analysis with
dds <- DESeqDataSetFromMatrix(countData = counts, colData = augmented_metadata, design = ~ condition)
...etc., and would get my results with expressions like
results(dds, contrast = list("condition1", "condition0")) ...
...and so on.
On the upside, this approach is both conceptually straightforward and extensible to more factors (e.g. the case in which, in addtion to drug and dose, one also had the time between dosing and measuring; in this case the condition column would encode each combination of drug, dose, and time; etc.)
On the downside, the relationship among the conditions for each drug gets lost; IOW, all the treatment conditions are treated as unrelated to each other.
I know that, instead of creating a condition
pseudo-factor, I can specify a "composite" design, like this
dds <- DESeqDataSetFromMatrix(countData = counts, colData = metadata, design = ~ dose + agent)
...but I'm not sure how to specify meaningful contrasts given this design.
More specifically, assuming that resultsNames(dds)
returns strings like this
doseNaN agentNONE dose1 agentA dose10 agentB dose100 agentC
...are the the results I want those given by the following 3 expressions?
results(dds, contrast = list("agentA", "agentNONE")) results(dds, contrast = list("agentB", "agentNONE")) results(dds, contrast = list("agentC", "agentNONE"))
If so, how is the dose information taken into account?
Can you say more about the replicates? Do you have multiple replicates for all combinations of drug and dose?
table(dds$drug, dds$dose)
Are the samples with replicate=1 related in any way?
For each k, the observations with replicate=k are related in that they refer to cells coming from the same batch. IOW, each replicate represents a batch of cells that were grown together, then split into wells, and treated with a particular combination of drug and dose.
For this question, it is OK to assume that the metadata consists of n blocks of rows, one for each replicate number, and identical in every way, except for the value of the
replicate
column (which is constant for each block). (In my original description, I show the block for k=1, plus the first two rows of the block for k=2.)(Sorry, I should have made all these points clearer in my original post. I will fix it.)