Question

DESeq2, results(dds, name = <reference>) interpretation results

0

Entering edit mode

Rockbar • 0

@rockbar-9979

Last seen 17 months ago

Germany

Hello together, hello Michael,

I have an understanding problem using results(dds, ...)

I have RNAseq data and a grouping of two conditions with interaction of time with the defined reference levels, see code below. And run the code as below.


coldata$Cond <- factor(coldata$Cond, levels = c("A","B"))
coldata$Time <- factor(coldata$Time, levels = c("2","5","7"))

formula = ~ Cond + Cond:Time - 1

dds <- DESeqDataSetFromMatrix(countData = round(cts),
                              colData = coldata,
                              design = formula)
dds <- DESeq(dds)

> resultsNames(dds)
[1] "CondA"       "CondB"       "CondA.Time5" "CondA.Time7" "CondB.Time5" "CondB.Time7"

"CondA" contains implicitly Time2, as far as I understood. So from my expectation, calling "result(dds, name="CondA")" is calling actually the comparison of reference vs reference, which I expect to cause only 0 logFC. But I get the following head output with 0 << logFC (in this head section; p-values > 0.1 in other sections ).

log2 fold change (MLE): CondA
Wald test p-value: CondA 
DataFrame with 12659 rows and 6 columns
                baseMean log2FoldChange     lfcSE      stat       pvalue         padj
               <numeric>      <numeric> <numeric> <numeric>    <numeric>    <numeric>
gene1 1535.6225       11.16962 0.1156640   96.5696            0            0
gene2 7642.6626       12.91082 0.0487283  264.9556            0            0
gene3   30.0613        4.90618 0.0976447   50.2453            0            0
gene4   36.8394        5.45187 0.1199860   45.4375            0            0
gene5 1533.5904       10.99511 0.0618889  177.6589            0            0

Did I miss something? Is the reference level something, else, so there is a reasonable explanation to that?

At the end I want to compare for each Condition each time-point against each other, and for each time-point the two Conditions against each other

Thank you very much.

DESeq2 • 967 views

ADD COMMENT • link 18 months ago Rockbar • 0

score 0 · Answer 1 · 2023-09-26

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 2 days ago

United States

CondA is the mean expression for the time 2, condition A samples. If you do results(dds, name = "CondA"), you are testing that the mean expression of the time 2, condition A samples is different from zero, which is an uninteresting thing to test (the expectation is that most genes will be significant).

The interesting coefficients are the last four, which are inherently comparisons. For example, CondA.Time5 is the difference in expression between time 5 and 2 for the condition A samples. You could test the interaction by doing

results(dds, contrast = list("CondA.Time5","CondB.Time5"))

Which will test for genes that vary differently between times 5 and 2 depending on the condition.

ADD COMMENT • link 18 months ago James W. MacDonald 68k

0

Entering edit mode

Hello James,

Thank you for answering

testing that the mean expression of the time 2, condition A samples is different from zero

I haved used -1 in the design formula, so my understanding so far was that the comparison is always against reference (some in the groups), at least with contrast. Or is that different with "name"?

results(dds, contrast = list("CondA.Time5","CondB.Time5"))

I am currently testing this Cond difference at Time5 with

results(dds, contrast = list( c("CondB", "CondB.Time5"), c("CondA", "CondA.Time5") ))

But this gives me different results. How does it come?

ADD REPLY • link 18 months ago Rockbar • 0

0

Entering edit mode

No, putting -1 in the design matrix is a cell means model, which is the opposite. If you used ~ Cond + Cond:Time, you would specify a factor effects model in which case there would be a baseline.

The call to results that I suggested gives you the correct interaction. I don't know what your call to results does. I don't see anything in the help page or vignette that says you should use a list of two vectors that are each length two. But I don't really use DESeq2 primarily because I find the contrasts specification mystifying, so maybe it's perfectly fine? Hopefully Mike Love will be along in a bit to explain.

ADD REPLY • link 18 months ago James W. MacDonald 68k

0

Entering edit mode

Yes, I agree, the correct contrast setting is quite challenging. The vignette says

contrast: a list of 2 character vectors: the names of the fold changes for the numerator, and the names of the fold changes for the denominator ... (more general case, can be to combine interaction terms and main effects)

And ?results shows the combination of main effect and interaction term in one character vector. EDIT: Using the formula without -1, would give me an intercept (again reflects CondA at time2) and results(dds, name="Intercept") provides me the same results like for CondA, where reference is unclear.

ADD REPLY • link 18 months ago Rockbar • 0

0

Entering edit mode

When I apply the model.matrix approach for getting the contrasts, the code

mod_mat <- model.matrix(design(dds), colData(dds))

CondA.time5 <- colMeans(mod_mat[(dds$Cond == "A") & (dds$Time == "5"), ])

CondB.time5 <- colMeans(mod_mat[(dds$Cond == "B") & (dds$Time == "5"), ])

results(dds, contrast = CondB.time5 - CondA.time5)

gives me the same result as calling

results(dds, contrast = list( c("CondB", "CondB.Time5"), c("CondA", "CondA.Time5") ))

So I conclude this is the correct call? Still, what is

results(dds, contrast = list("CondA.Time5","CondB.Time5"))

then providing me?

ADD REPLY • link 18 months ago Rockbar • 0