Hi,
Im aiming to use aggregated single cell data to perform a pseudobulk analysis to assess differential expression between those with sarcopenia and those without, termed "status_binary" with the levels "yes" and "no".
# DESeq2 ------------------------------------------------------------------
dds <- DESeqDataSetFromMatrix(y$counts,
colData = y$samples,
design = ~Sex+age_scaled+status_binary)
# Transform counts for data visualization
rld <- vst(dds, blind=TRUE)
# Plot PCA
DESeq2::plotPCA(rld, intgroup = "status_binary")
# Extract the rlog matrix from the object and compute pairwise correlation values
rld_mat <- assay(rld)
rld_cor <- cor(rld_mat)
# Plot heatmap
pheatmap::pheatmap(rld_cor, annotation = y$samples[, c("status_binary"), drop=F])
# Run DESeq2 differential expression analysis
dds <- DESeq(dds)
# Plot dispersion estimates
plotDispEsts(dds)
# Check the coefficients for the comparison
resultsNames(dds)
[1] "Intercept" "Sex_Male_vs_Female" "age_scaled" "status_binary_Yes_vs_No"
res <- results(dds,
contrast = c("status_binary", "Yes", "No"), # For catagorical variables
alpha = 0.05)
# Shrink the log2 fold changes to be more appropriate using the apeglm method - should cite [paper]() when using this method
res <- lfcShrink(dds,
coef = "status_binary_Yes_vs_No",
res=res,
type = "apeglm")
hist(res$pvalue, xlab="p value", main="Histogram of nominal p values")
# Turn the results object into a tibble for use with tidyverse functions
res_tbl <-
res %>%
data.frame() %>%
rownames_to_column(var="gene") %>%
as_tibble() %>%
arrange(padj, pvalue)
The above code works great, however when following the advice from the most recent vignette 1 with respect to handling single-cell data with DESeq2, I get the below message when trying to run lfcShrink:
dds <- DESeqDataSetFromMatrix(y$counts,
colData = y$samples,
design = ~Sex+age_scaled+status_binary)
# Transform counts for data visualization
rld <- vst(dds, blind=TRUE)
# Plot PCA
DESeq2::plotPCA(rld, intgroup = "status_binary")
# Extract the rlog matrix from the object and compute pairwise correlation values
rld_mat <- assay(rld)
rld_cor <- cor(rld_mat)
# Plot heatmap
pheatmap::pheatmap(rld_cor, annotation = y$samples[, c("status_binary"), drop=F])
design <- model.matrix(~Sex+age_scaled+status_binary, y$samples)
reduced <- model.matrix(~Sex+age_scaled, y$samples)
design
reduced
dds <- scran::computeSumFactors(dds)
# Run DESeq2 differential expression analysis
dds <- DESeq(dds,
test="LRT",
fitType = "glmGamPoi",
useT=TRUE,
full = design,
reduced = reduced,
minmu=1e-6,
minReplicatesForReplace=Inf)
# Plot dispersion estimates
plotDispEsts(dds)
# Check the coefficients for the comparison
resultsNames(dds)
[1] "X.Intercept." "SexMale" "age_scaled" "status_binaryYes"
res <- results(dds,
contrast = list("status_binaryYes"), # For catagorical variables
alpha = 0.05)
res_img <- lfcShrink(dds,
coef = "status_binaryYes",
res=res,
type="apeglm"
)
Error in lfcShrink(dds, coef = "status_binaryYes", res = res, type = "apeglm") :
'coef' should specify same coefficient as in results 'res'
Is there a reason the name of the coeffficient is no longer working? Also im unclear on the role of lfcShrink - should it be used as our primary results when assesing logFC difference in expression or just to aid visulaisation?
Thanks, Matt
Perfect, thanks very much.