Dear all,
I'm doing timecourse RNAseq analysis. I'm quite new with R and I would have question concerning DESeqDataSetFromHTSeqCount(...,design=...) and later DESeq(... test= "LRT", reduced = ...) and using results() with Wald test.
I have treated (APO) /untreated (DMSO) sample and 6 timepoint (incl 0) 4 replicate.
When I run my script I've got the same result for every timepoint when I want to compare APO vs DMSO. My aim is to see which are those genes which are regulated by APO from 0 to 24 hrs.
Theoretically, it means that I have exactly the same set of genes if I ask DEGs after 30 min of treatment or 3 hrs of treatment which does not make any sense. I think I'm missing something but I followed the tutorial and some previous discussion and I made everything the same. If I use wald test as an argument after results() than I've got different sets of genes but it does not convince me. I would be happy with any feedback.
My sampleTable is:
> sampleTable
sampleName fileName condition replicate timeP 1 APO_0_A APO_0_A.counts APO A 0 2 APO_0_B APO_0_B.counts APO B 0 3 APO_0_C APO_0_C.counts APO C 0 4 APO_0_D APO_0_D.counts APO D 0 5 APO_05_A APO_05_A.counts APO A 05 6 APO_05_B APO_05_B.counts APO B 05 7 APO_05_C APO_05_C.counts APO C 05 8 APO_05_D APO_05_D.counts APO D 05 9 APO_1_A APO_1_A.counts APO A 1 10 APO_1_B APO_1_B.counts APO B 1 11 APO_1_C APO_1_C.counts APO C 1 12 APO_1_D APO_1_D.counts APO D 1
...
and
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory, design = ~condition+timeP+condition:timeP)
> ddsHTSeq
class: DESeqDataSet dim: 37336 48 metadata(1): version assays(1): counts rownames(37336): AT1G01010 AT1G01020 ... ATMG09960 ATMG09980 rowData names(0): colnames(48): APO_0_A APO_0_B ... DMSO_6_C DMSO_6_D colData names(3): condition replicate timeP
I filter for genes which has low count (for us it's 10) and than using relevel function to set the reference
ddsK$condition <- relevel(ddsK$condition, ref = "DMSO")
than I make the DESeq with LRT test and reduced timeP+condition
dds <- DESeq(ddsK, test="LRT", reduced = ~timeP+condition)
resultsNames(dds)
[1] "Intercept" "condition_APO_vs_DMSO" "timeP_05_vs_0" "timeP_1_vs_0" "timeP_24_vs_0" [6] "timeP_3_vs_0" "timeP_6_vs_0" "conditionAPO.timeP05" "conditionAPO.timeP1" "conditionAPO.timeP24" [11] "conditionAPO.timeP3" "conditionAPO.timeP6"
res <- results(dds, contrast = list(c("timeP_05_vs_0", "conditionAPO.timeP05")), test = "Wald")
res
log2 fold change (MLE): timeP_05_vs_0+conditionAPO.timeP05 effect Wald test p-value: timeP_05_vs_0+conditionAPO.timeP05 effect DataFrame with 24455 rows and 6 columns baseMean log2FoldChange lfcSE stat pvalue padj <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> AT1G01010 383.915799076324 0.089056481257827 0.193991260802991 0.459074707227502 0.64618051895494 0.873859813008998
...
resSig <- subset(res, padj < 0.1) resSig
log2 fold change (MLE): timeP_05_vs_0+conditionAPO.timeP05 effect Wald test p-value: timeP_05_vs_0+conditionAPO.timeP05 effect DataFrame with 2858 rows and 6 columns baseMean log2FoldChange lfcSE stat pvalue padj <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> AT1G01060 48.6720330765987 -0.688897898327615 0.26567978724056 -2.59296315117812 0.00951529650094275 0.0799481384205183
Let me know if you have any suggestion or comment on it.
Best,
Judit
Yes, I tried to follow some of those (e.g. I followed this link: http://bioconductor.org/packages/devel/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#time-course-experiments and this: https://support.bioconductor.org/p/65676/#66860)
What I didn't get that if I use
or
and filter for padj < 0.1 than I've got always the same list. Why does it happen? I thought that they should be different.
Thank you!
This is what I was talking about. Did you read ?results yet? Or the FAQ? Please read those first, because it explains what is going on, exactly answering your question.
If you want to make new p-values you need to switch from test="LRT" to test="Wald".
Dear Michael,
thank you for your help.
I went through on your suggested pages and check other comments from Bioconductor.
I re-arranged my script:
and I didn't use any reduced formula with LRT for DESeq():
and made a comparison:
My question would be that is it right that resSig05 gives me those genes which are DE after 30 min because of the treatment and not because of the time compare to control condition (which is time_0)?
If it's correct than is it correct if I do all the comparison for the other time points, separately and at the end I can merge them for further analysis.
Thanks a lot for your comments!
That gives you the genes that are differentially expressed for condition APO at time 5. It's the main effect for time 5 vs 0 and the interaction for condition APO added together, for this design, that gives you the time difference specifically for that condition group.
The analysis is up to you. If you're having trouble with the LRT and interaction design, and the vignette sections are not sufficient to give some sense of what the models are doing, I'd recommend to consult with a statistician. It's pretty important to have a good sense what you are doing.