Hi,
The DEseq2 vignettes is very helpful, but I'd still like to confirm if my understanding to the contrast design for correlated dataset is right.
I have 80 samples (from 10 case and 10 control subjects, each subject having 4 different cell types, no tech replicate). My goal is to study (1) differentially expressed genes between case and control; (2) cell-type specific genes.
- My first question is regarding the best way to do DE analysis for multiple group design.
I could simply split the input dataset into subsets, e.g. running DEseq2 separately for each cell type (n=20), and report four results and later combined them e.g. using venn diagram.
design(subset_dds_cellA) = ~ condition
design(subset_dds_cellB) = ~ condition
design(subset_dds_cellC) = ~ condition
design(subset_dds_cellD) = ~ condition
I can also run DEseq2 once by using all 80 samples, e.g.
design(dds) = ~ cell + condition
My understanding is, in the latter way, I will have a better power (because the sample size is larger, 80 vs. 20), but what I will get is a list of genes differentially expressed between case vs. control, accounting for the cell types (meaning only the DE genes common to all cell types). If I want to discover the DE genes specific to one cell type, I need to include the interaction term, e.g. design(dds) = ~ cell + condition + cell * condition, and test the interaction term. Am I right?
- My second question is about correlated data.
Unlike 40 vs. 40 independent samples, my dataset is not totally independent, e.g. samples from the same subject should be more close to each other, similar to the examples in the Regression Analysis for Correlated Data. In that case, should I include subjectID in the design, e.g. design(dds) = ~ subjectID + cell + condition + cell * condition, in order to account for the structural correlation of my dataset?
Thanks for helps.
-Xianjun
Thanks for your always prompt help, Michael.
If I use design = ~cell + cell:condition, how would I detect the overall condition effect (i.e. the case vs. control regardless cell type)? Do you mean I still need to run DEseq2 separately with only design ~ condition to capture the overall effect on condition? A bit confused here.
Following this, I also have a related question: If I have 4 cell types in the experiment (e.g. A, B, C, D, where A is reference level in the initial design), I noticed that the resultsNames(dds) will return 3 comparisons for cell type; all are relative to the reference level (B_vs_A, C_vs_A, D_vs_A). If I also want to get comparison for C_vs_B, can I simply run
results(dds, contrasts=c("cell", "C","B"))
?Thanks
You either have an overall effect (~cell + condition) or your have cell-specific effects (~cell + cell:condition). You can't get the first from the latter with fixed effects. You could average the 4 cell-specific effects using a numeric 'contrast', where you give each cell-specific effect a 1/4 and all other coefs a 0.
Yes to second question.
Hi, Michael, first of all thanks a lot for helping DESeq2 users!
Regarding the correlation between samples from the same individual, do you think that it is always necessary to duplicateCorrelation() and limma? How well do you think DESeq2 performs when there are correlated samples that are not strict technical replicates (resequencing of the same library)?
In my case, the experiment was performed on the same cell line 3 times (in 3 different cell lines), control and treatment, and is therefore a type of technical replication/repeated measures.
Would a paired design be enough?
Thank you!
I know that it's hard to see this without everyone writing out the full sample table, but there is a very important difference between what you describe and the design of the original post.
In your case you have control and treatment for each cell line (it sounds like), and so you can use DESeq2 and fixed effects to account for the baseline differences. Yes, ~cell + condition is sufficient for you. There's also a scenario in the vignette for using fixed effects for even more complex designs, where there are group-specific condition effects, and individual nested within groups. Still, each subject has a control and treated sample and so everything can be fit with fixed effects.
The above post wants to compare across condition, but the subjects are nested within condition, e.g. fully confounded with condition. This requires something like a random effect.
You are right, I have control and treatment for each replicate and cell line.
I have been using ~cell + condition, but I just wanted to make sure I shouldn't be doing something else, so thanks for the clarification.
For others reading this thread in the future, this is the design I have:
Hi Michael,
Thanks so much for your time and helps!
I've tried design ~cell + cell:condition, where I have four cell types (A, B, C, and D, where A is the reference level) and two conditions (HC and MS, where HC is reference level). What I got from resultsNames(dds) are "cell_B_vs_A" "cell_C_vs_A" "cell_D_vs_A" "condition_MS_vs_HC" "cellB.conditionMS" "cellC.conditionMS" "cellD.conditionMS". I have few questions. Wish you could help to clarify.
Q1: I don't see "cellA.conditionMS". Is "condition_MS_vs_HC" the same as "cellA.conditionMS"?
Q2: I understand the four comparisons ("condition_MS_vs_HC" "cellB.conditionMS" "cellC.conditionMS" and "cellD.conditionMS") are cell-specific condition effect. And they are unlike interaction, they actually act EXACTLY same as if we run DESeq2 separately for subset of four cell types, e.g. design(subset_dds_cellA) = ~ condition. There is no interaction (even though we have an interaction term in the design). Am I right?
Q3: And if we want to call interaction using the same design (e.g. ~cell + cell:condition), we can run for example results(dds, contrast = list("cellC.conditionMS", "cellB.conditionMS")) to test if the condition effect is different across cell group B and C. Right?
Q4: Last question: How would I extract subset of samples for a DESeqDataSet object? Do I have to run DESeqDataSetFromMatrix() again? I am wondering if there is a direct function like subset applying to dds based on its colData.
Thanks again!
Can you double check your code? Those coefficients seem to indicate your added a "+ condition" to the design, instead of just "~cell + cell:condition".
No, I didn't. Here is what I copied from the Rstudio console (sorry the variable name is slightly different):
You have to be careful about symbols. "*" and ":" are not the same to R. "*" means to add the main effect and interaction term.
Sorry... I should have read more carefully.
Note for the details from https://stat.ethz.ch/R-manual/R-devel/library/stats/html/formula.html:
An expression of the form
y ~ model
is interpreted as a specification that the responsey
is modelled by a linear predictor specified symbolically bymodel
. Such a model consists of a series of terms separated by+
operators. The terms themselves consist of variable and factor names separated by:
operators. Such a term is interpreted as the interaction of all the variables and factors appearing in the term.In addition to
+
and:
, a number of other operators are useful in model formulae. The*
operator denotes factor crossing:a*b
interpreted asa+b+a:b
. The^
operator indicates crossing to the specified degree. For example(a+b+c)^2
is identical to(a+b+c)*(a+b+c)
which in turn expands to a formula containing the main effects fora
,b
andc
together with their second-order interactions. The%in%
operator indicates that the terms on its left are nested within those on the right. For examplea + b %in% a
expands to the formulaa + a:b
. The-
operator removes the specified terms, so that(a+b+c)^2 - a:b
is identical toa + b + c + b:c + a:c
. It can also used to remove the intercept term: when fitting a linear modely ~ x - 1
specifies a line through the origin. A model with no intercept can be also specified asy ~ x + 0
ory ~ 0 + x
.OK. Now I got the all four cell-specific conditions (e.g. "cellA.conditionMS" "cellB.conditionMS" "cellC.conditionMS" and "cellD.conditionMS"). Thanks.
Could you please help the other three questions?
Q2: Are the cell-specific comparisons ("cellA.conditionMS" "cellB.conditionMS" "cellC.conditionMS" and "cellD.conditionMS") from design ~cell + cell:condition act EXACTLY same as if we run DESeq2 separately for subset of four cell types, e.g. design(subset_dds_cellA) = ~ condition?
Q3: And if we want to call interaction using the same design (e.g. ~cell + cell:condition), we can run for example results(dds, contrast = list("cellC.conditionMS", "cellB.conditionMS")) to test if the condition effect is different across cell group B and C. Right?
Q4: Last question: How would I extract subset of samples for a DESeqDataSet object? Do I have to run DESeqDataSetFromMatrix() again? I am wondering if there is a direct function like subset applying to dds based on its colData.
Q2: no, see the FAQ in the vignette about running all samples together vs splitting into groups
Q3: Yes.
Q4: You can use the square brackets to index rows or columns of a SummarizedExperiment (of which dds is a subclass): `dds[,idx]` where `idx` is a logical, numeric or character vector.