Question

Volcano plot of scRNA-seq data not "centered"

0

Entering edit mode

tjl • 0

@4e30bfdb

Last seen 11 months ago

United States

I analyzed a single-cell RNA-seq dataset including two groups using Seurat and then DESeq2 for differential expression. No problems with the standard workflow. However, the resulting log2FoldChanges for treated vs. untreated groups within clusters are almost all negative. This is clear in the corresponding volcano plots, which are not "centered" at a log2FoldChange of ~zero, but rather at about -0.5 (although still with a classic volcano shape). I know I can use "estimateSizeFactors" with "controlGenes" in DESeq2, but what is the most likely upstream explanation for this? It seems like a problem with normalization or scaling(?).

DESeq2 • 1.8k views

ADD COMMENT • link updated 11 months ago by ATpoint ★ 4.8k • written 11 months ago by tjl • 0

score 0 · Answer 1 · 2024-05-23

0

Entering edit mode

ATpoint ★ 4.8k

@atpoint-13662

Last seen 1 hour ago

Germany

Sounds like a normalization problem, but I am not sure how to help with that. Indeed you can use control genes, for example use the top 20% of genes with highest baseMean, or only genes that are expressed by at least 25% of cells. This could help enrich for more robust genes to properly center the bulk of genes along y=0 in the MA-plot (MA-plots are better fir diagnostics than Volcanos). Without more details on setup and code or plots I don't know what more to say.

ADD COMMENT • link 11 months ago ATpoint ★ 4.8k

0

Entering edit mode

Agree with ATpoint

Depending on the "groups" in single cell, the question of DE can be not well defined without inserting additional information (controlGenes). E.g. what defines LFC = 0 across disparate cell types with very different distribution across the transcriptome.

ADD REPLY • link 11 months ago Michael Love 43k

0

Entering edit mode

Thank you both. The code is below, simplified a bit for simplicity. ATpoint, your suggestion re: the top 20% of genes by baseMean makes me wonder if the shift in LFC=0 is an artifact of the experiment, which should include an enrichment for certain RNA types in the treated group (grey points in the volcano plot). If these are robustly increased in most cells whereas most other genes in the single-cell data are non-uniformly expressed, perhaps they are driving the normalization?

volcano plot

# Standard Seurat workflow
merged_obj <- merge(x = obj_1, y = c(obj_2, obj_3, obj_4, obj_5, obj_6, obj_7, obj_8, obj_9, obj_10), project = "projectID")
merged_obj$percent_mt <- PercentageFeatureSet(merged_obj, pattern="^mt-")
filtered_obj <- subset(merged_ob, subset = nFeature_RNA >= 200 & nFeature_RNA <= 2500 & percent_mt <= 5)
filtered_obj <- NormalizeData(filtered_obj)
filtered_obj <- FindVariableFeatures(filtered_obj)
filtered_obj <- ScaleData(filtered_obj)
filtered_obj <- RunPCA(filtered_obj)
int_obj <- IntegrateLayers(object = filtered_obj, method = CCAIntegration, orig.reduction = "pca", new.reduction = "integrated.cca", verbose = FALSE)
int_obj[["RNA"]] <- JoinLayers(int_obj[["RNA"]])
int_obj <- FindNeighbors(int_obj, reduction = "integrated.cca", dims = 1:30)
int_obj <- FindClusters(int_obj, resolution = 0.8, cluster.name = "integrated_clusters")
int_obj <- RunUMAP(int_obj, dims = 1:30, reduction = "integrated.cca")

# Annotated cells manually with RenameIdents()
ann_obj$cell_type <- Idents(ann_obj)
ann_obj$cell_type[Cells(ann_obj)] <- paste(Idents(ann_obj))

# Pseudobulk DEG analysis
pseudo_obj <- AggregateExpression(ann_obj, assays = "RNA", return.seurat = TRUE, group.by = c("celltype", "group", "sample"))
pseudo_obj$pseudo_celltype.group <- paste(pseudo_obj$cell_type, pseudo_obj$group, sep = "_")
Idents(pseudo_obj) <- "pseudo_celltype.group"
pseudobulk_DEG <- FindMarkers(object = pseudo_obj, ident.1 = "celltype_group2", ident.2 = "celltype_group1", test.use = "DESeq2")

ADD REPLY • link 11 months ago tjl • 0

0

Entering edit mode

My comment was generic. If your experiment causes drastic shifts then the situation might be different. The advise is the same: Try to use genes you think are non-DE as controlGenes. For more guidance I would need to actually see the data and do inspection myself.

ADD REPLY • link 11 months ago ATpoint ★ 4.8k