Hi,
I am analysing the dataset with two independent factors - genotype (WT, MUT) and age (pup, adult). I have 4 samples for each condition.
- mouse1 MUT pup
- mouse2 MUT adult
- mouse3 WT pup
- mouse4 WT adult etc.
What I want to check is how being the MUTANT affects gene expression across the age and if the impact of mutation is age dependent.
What I need is to build a model that shows me:
- how age affects the counts
- how genotype affects the counts
- and how the interaction age*genotype affects the counts - is it significant?
However after reading on forums I am confused and I am not sure which approach is best. Also I am not sure if I should use LRT and in which step.
My first idea is the model with interaction, as a 'baseline' I set Wild Type adult:
dds <- DESeqDataSetFromMatrix(countData=countData,
colData=metaData,
design= ~genotype+age+genotype:age,
tidy = TRUE)
dds
dds$genotype
dds$genotype = relevel( dds$genotype, "wt")
dds$genotype
dds$age
dds$age = relevel( dds$age, "adult")
dds$age
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
dds <- estimateSizeFactors(dds)
dds <- DESeq(dds)
resultsNames(dds)
Or is it better to use 'group' design? What is more informative in my case?
dds$group <- factor(paste0(dds$genotype, dds$age))
design(dds) <- ~ group
dds <- DESeq(dds)
resultsNames(dds)
resultsNames(dds) [1] "Intercept" "group_mutpup_vs_mutadult" "group_wtadult_vs_mutadult" [4] "group_wtpup_vs_mutadult"
results(dds, contrast=c("group", "wtpup", "mutpup"))
results(dds, contrast=c("group", "wtadult", "mutadult"))
results(dds, contrast=c("group", "wtpup", "mutadult"))
In the group design how do I interpret the last contrast - is it the same as interaction from the previous model? I will be grateful for any advice, this is the first time I am trying multifactorial analysis in DESeq2.
I have a problem with results interpretation. It looks like my main effect is somehow not consistent with the contrast and I am confused. I set the reference level as WT PUP
and then I set the contrast for the main effect (age in WT)
All seems to be correct...
What I obtained however is log2FoldChanges of opposite sign to what I see with plots (plotCounts) - genes that are upregulated in WT Adult according to my res1 have in fact higher expression in pups. I double checked the counts and it is clearly opposite to my results.
What am I doing wrong here?
You should set the levels before running
DESeq()
I did but maybe there is an error somewhere in my script, please see below:
The log2FoldChange i.e. for this gene Tdg is according to my results - 0.9 while the counts are as below:
Hi again, for these situations, it is best to keep the design formula as simple as possible so that no mis-interpretation is made on the results. If you literally just want adult vs pup in WT, then I would create a new merged variable, called
age_genotype
, fit a formula of type~ age_genotype
, and then fit a contrast of formcontrast=c("age_genotype","adult_wt","pup_wt")
. Then, there can be no mis-interpretation about the meaning.Don't also forget to perform log2 fold-change shrinkage after you run
results()
- see Quick startHi Kevin, yes, I know I can do that but I am confused with why this result is opposite and I want to understand what went wrong. Since I am taking the contrast (age, adult, pup) I expect to see genes that are differentially regulated in WT Adult according to WT pup so I don't understand why I am getting the opposite result.
This is very disturbing for me and I am worried that the Interaction may also be wrong however I am not able to check it.
Is my understanding of "adult vs pup" correct? Or is it just a misunderstanding and this contrast is in fact genes in pup differently expressed according to adult? I am very confused with these results.
Following the logic, what you are deriving with this contrast is the 'age effect for wt', and the order of comparison is adult / pup. What was the accompanying p-value and what happens after you additionally run
lfcShrink()
?Can you clarify which counts you are plotting (normalised or raw)?; Can you also plot the variance-stabilised expression levels?
In your code, should it not be:
...or, perhaps, this makes no difference
I think this must have a been a kind of bug - I restarted R and my computer and everything is ok (the sign now reflects the counts [ 0.9 instead of -0.9]). I don't know why this error occurred - maybe this is a problem with workspace refreshing in R (?)
Sorry for that and thank you for time and help, I really appreciate it.
Can you print:
Yes, my plot was based on normalized counts, I restarted now and everything is fine - probably the workspace was not refreshed properly.
Now it looks like this: (21665 is Tdg Entrez ID)
> dds$genotype
[1] mut mut mut mut wt wt wt mut mut mut wt wt wt wt
Levels: wt mut
> dds$age
[1] pup pup pup pup adult adult adult adult adult adult pup pup pup pup
Levels: pup adult
> res1["21665",]
Great - it is a good idea, in my opinion, to start a new session for every new analysis.
If you enter the contrasts manually it doesn't matter what you set as the reference level. Just do
results(dds, contrast=c("age","pup","adult"))
I am confused with the contrast - I thought the contrast (age, adult, pup) would give me the log2FoldChange of the genes disregulated in adult with pup as a reference, is this correct?
My problem is not that I need this contrast, I noticed it is wrong so I am worried if other comparisons and mainly the Interaction are correct. I am confused with the fact that contrast adult vs pup is giving me result for pup vs adult (if I understand the order of factors in contrast correctly).