Hi there, I have a question regarding RNASeq analysis and the DESeq2 package.
I have a 2x2 full factorial design using Drosophila:
- 2 mating status (virgin, mated)
- 2 diets (A and B)
I started out with looking at an overall model and using contrasts got a list of genes that were DE for each main category
dds <- DESeqDataSetFromMatrix(countData = count.fem,
colData = meta,
design = ~ diet + status + diet:status)
For simplicity here I will just focus on effects due to changing diet.
Question: is this the correct contrasts to look at genes that respond to diet in both mated and virgin flies? (Overall diet effect)
## DIET GENES
comp.d <- results( dds, contrast = c("diet", "B", "A") )
summary(comp.d)
out of 12996 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 248, 1.9%
LFC < 0 (down) : 59, 0.45%
outliers [1] : 31, 0.24%
low counts [2] : 2770, 21%
(mean count < 9)
Now that I have the overall effects, I wanted to do more specific contrasts. For example:
- changing diets in virgin and mated flies separately
To do these comparisons I created another column with both factors and ran the below model:
> meta.t
diet status treat
VF-B B virgin B.v
VF-B.1 B virgin B.v
VF-B.2 B virgin B.v
MF-B B mated B.m
MF-B.1 B mated B.m
MF-B.2 B mated B.m
VF-A A virgin A.v
VF-A.1 A virgin A.v
VF-A.2 A virgin A.v
MF-A A mated A.m
MF-A.1 A mated A.m
MF-A.2 A mated A.m
dds.t <- DESeqDataSetFromMatrix(countData = count.fem,
colData = meta.t,
design = ~ treat)
So far everything seems to be going OK, but when I start checking the individual comparisons I seem to be getting a lot more genes than the overall models:
## GENES THAT RESPOND TO DIET IN MATED FLIES
comparison1 <-results(dds.t, contrast = c("treat", "B.m", "A.m"))
summary(comparison1)
> summary(comparison1)
out of 12996 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 872, 6.7%
LFC < 0 (down) : 920, 7.1%
outliers [1] : 31, 0.24%
low counts [2] : 3265, 25%
(mean count < 16)
## GENES THAT RESPOND TO DIET IN VIRGIN FLIES
comparison2 <-results(dds.t, contrast = c("treat", "B.v", "A.v"))
summary(comparison2)
> summary(comparison2)
out of 12996 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 248, 1.9%
LFC < 0 (down) : 59, 0.45%
outliers [1] : 31, 0.24%
low counts [2] : 2770, 21%
(mean count < 9)
Question: why do I get many more genes that respond to diet when I run these specific comparisons, rather than the overall model. Also why is the overall number of genes that respond to diet the same as the the result when I only look at virgin flies?
Thanks!
Many thanks for the comment! So would you suggest that I use the model ~ diet + status to get the overall effects, and then ~ diet + status + diet:status to get the list of genes with an interaction term?
thanks again!
Depends what contrast you set, and what exact question you are asking.