I am analyzing RNA-Seq data from a drug response dataset. There, we have a timecourse experiment with the following experimental design:
3 timepoints: 4h, 24h, 48h
3 drugs (A,B,C) + 1 untreated control
3 replicates per drug-timepoints
I would like to get differentially expressed genes for each timepoint compared to the untreated control. I have followed the limma-voom tutorial, and I get close to 300.000 differentially expressed genes (that go down to 10.000 if I apply an FDR and logFC threshold). I am expecting to find differences compared to the untreated samples, however, I am not sure whether the p-values are inflatted due to fewer changes in the earlier time points, as it has been suggested in another question in BioConductor regarding limma. Is there a difference on how limma performs differential analysis if fewer or more contrasts are specified?
My code can be found below:
# samples: df with sample metadata. Condition column combines both Drug + Timepoint (e.g. A_4h, B_4h, C_4h, ...., untreated_48h)
# gene_counts: df with gene counts
lev <- unique(samples$Condition)
fac <- factor(samples$Condition, levels=lev)
design <- model.matrix(~ 0 + fac)
colnames(design) <- lev
x <- DGEList(counts=count_df)
cpm <- cpm(x)
lcpm <- cpm(x, log=TRUE)
keep.exprs <- rowSums(cpm>1)>=3
x <- x[keep.exprs, keep.lib.sizes=FALSE]
dim(x)
contrast_list <- makeContrasts(
Diff_A_4h = A_4h - untreated_4h,
Diff_B_4h = B_4h - untreated_4h,
Diff_C_4h = C_4h - untreated_4h,
Diff_A_24h = A_24h - untreated_24h,
Diff_A_24h = A_24h - untreated_24h,
Diff_A_24h = A_24h - untreated_24h,
Diff_A_48h = A_48h - untreated_48h,
Diff_A_48h = A_48h - untreated_48h,
Diff_A_48h = A_48h - untreated_48h,
levels = design)
vfit <- lmFit(v, design)
vfit <- contrasts.fit(vfit, contrasts=contrasts_list)
efit <- eBayes(vfit)
# And to extract the DEGs per contrast
extract_DEGs <- function(x1){
contr_name <- colnames(efit$coefficients)[[x1]]
tt <- topTable(efit, x1, number = Inf) %>%
rownames_to_column("Gene_name") %>%
mutate(Contrast = contr_name) %>%
as_tibble()
tt
}
Coeffs <- dimnames(efit$coefficients)[["Contrasts"]]
Thank you so much for the quick reply and recommendations regarding data pre-processing!