Hello,
I've some bulk-RNA-seq data and I want to perform differential gene expression analysis. This data came from a time course experiment where along a set of time points, wild-type and strain individuals were sampled.
I'm familiar with pairwise comparisons in differential gene expression analysis. However, since this data came from a longitudinal study, I think it would be interesting to search for genes differentially expressed over time instead across specific time points through pairwise comparisons of strain vs. WT (Population) at time 0, 4, 8, 12 hours (Time).
Therefore I was reading the section "9 Time course experiments" (from the vignette: http://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#time-course-experiments ) in order to understand what to do and how to interpret the results. I tried my own with some data and it just worked fine. The code that I've run appears below:
condition <- c("S_0H", "WT_0H", "S_4H", "WT_4H", "S_8H", "WT_8H", "S_12H", "WT_12H") # conditions
## Metadata
#
metaData <- data.frame(
"Condition" = c(
rep(condition, each = 3)
),
row.names = colnames(cds_ord)
)
coldata <- data.frame("Condition" = metaData$Condition,
"Time" = stringr::str_remove_all(metaData$Condition, "^S_|^WT_"),
"Population" = gsub("[^SWT]", "", metaData$Condition) ) %>%
`rownames<-`(rownames(metaData)) # get metadata to use in the heatmap
coldata$Condition <- factor( coldata$Condition,
levels = condition ) # order 'Condition' var
coldata$Time <- factor( coldata$Time,
levels = c("0H", "4H", "8H", "12H") ) # order 'Time' var
### DESeq2 pipeline - time dependent model
## Metadata
#
metaData <- coldata
metaData$Population <- factor(metaData$Population,
levels = c("WT", "S"))
## DESeq2 pipeline
#
set.seed(2020) # set seed
deseq <- DESeqDataSetFromMatrix(countData = cds_ord,
colData = metaData,
design = ~ Population + Time + Population:Time) # import data to DESeq2
deseq_time <- DESeq( object = deseq, test = "LRT", reduced = ~ Population + Time )
dge_time <- results( object = deseq_time,
alpha = 0.05 )
## Format the DGE tbl
#
dge_time <- dge_time %>%
as.data.frame(.) %>%
mutate("Geneid" = rownames(.)) %>% # add 'Geneid' col
select(c("Geneid", colnames(.)[colnames(.) != "Geneid"])) %>% # order cols
arrange(padj)
## Save the DGE tbl
#
write_tsv(x = dge_time,
path = "../results/tables/dge_pop_time_exp.tsv")
comps <- resultsNames(deseq_time)[-1]
for ( ele in comps ) {
data_tbl <- results( object = deseq_time, name = ele, test = "Wald" )
data_tbl <- data_tbl %>%
as.data.frame(.) %>%
mutate("Geneid" = rownames(.)) %>% # add 'Geneid' col
select(c("Geneid", colnames(.)[colnames(.) != "Geneid"])) %>% # order cols
arrange(padj)
write_tsv( x = data_tbl, path = paste0("../results/tables/", ele, "_pop_time_exp_dge.tsv") )
}
However I'm not sure how I should interpret the results. At this point I have the table dge_pop_time_exp.tsv
that results from dge_time
object and 7 tables where I performed the Wald tests for each coefficient (I did not include the interception): "Population_S_vs_WT", "Time_4H_vs_0H", "Time_8H_vs_0H", "Time_12H_vs_0H", "PopulationS.Time4H", "PopulationS.Time8H", "PopulationS.Time12H"
.
So, my questions are:
(1) What contains the dge_time
/dge_pop_time_exp.tsv
?
When I performed this I thought that included all the genes differentially expressed which at one or more time points after time 0 showed a population-specific effect. However when I print the dge_time
appears:
`log2 fold change (MLE): PopulationS.Time12H
LRT p-value: '~ Population + Time + Population:Time' vs '~ Population + Time' `
Since it appeared the interaction I'm a bit confusing.
(2) Which comparisons were made in the other tables obtained with Wald tests?
From the vignette and this post ( https://www.biostars.org/p/188808/ ) I would say that tables "Time_4H_vs_0H", "Time_8H_vs_0H", "Time_12H_vs_0H"
contain the genes differentially expressed between the time points highlighted, i.e., 4H vs. 0H, 8H vs. 0H, 12H vs. 0H, for the WT population. I would say that the table "Population_S_vs_WT
contains the genes differentially expressed between populations, i.e., population S vs. pop. WT, at time 0H.
The interaction is more difficult to interpret. From what I've read in the documentation and Biostar post, for instance in the case of "PopulationS.Time4H"
it contains the genes differentially expressed across population S vs. WT, at time point 4H and also the "additional difference" between population S vs. WT, at time point 0H. Is that right?
Sorry for the long post.
Thank you in advance.
António