Question

Combining DESeq2 and Within-Group Variability to Identify Stochastic Genes

0

Entering edit mode

DL • 0

@0075aed2

Last seen 5 hours ago

United States

Hi DESeq2 Community,

I have a bulk RNA-seq dataset consisting of four groups: each group has an n=4 and is genetically identical/isogenic. All groups are at the same embryonic developmental stage. This setup allows me to minimize the contribution of genetics and environmental variability, focusing instead on identifying genes with high variability due to stochasticity/noise.

My approach involves:

Using DESeq2 for differential expression (DE) analysis with each group as the reference:

reference_groups <- c("PRENATAL_A", "PRENATAL_B", "PRENATAL_C", "PRENATAL_D")
de_results_list <- list()

for (ref_group in reference_groups) {
    dds <- DESeqDataSetFromMatrix(countData = filtCounts.pn, colData = mtd.pn, design = ~Group)
    dds$Group <- relevel(dds$Quadruplet, ref = ref_group)
    dds <- DESeq(dds)

    de_res <- results(dds)
    de_res_df <- as.data.frame(de_res) %>% rownames_to_column(var = "Gene")
    de_res_df_sig <- de_res_df %>% filter(padj < 0.05)

    de_results_list[[ref_group]] <- de_res_df_sig
}

combined_de_results <- bind_rows(de_results_list, .id = "Reference_Group")

Calculating within-group variability (SD of VST counts) for each gene.
Combining DE results (padj < 0.05) with high variability (SD above 95th percentile).

With this approach, a gene is considered to have variable expression due to stochastic reasons if it is differentially expressed across groups and exhibits high variability within a group. I view DESeq2 as measuring horizontal differences (across groups) and standard deviation (SD) as measuring vertical differences (within groups), aiming to identify genes that are changing due to biological factors rather than technical noise.

Question:

Is this strategy of combining DESeq2 with within-group variability valid for identifying genes with stochastic expression, or is it conceptually flawed?
Are there better methods within DESeq2 to integrate mean differences and variability?

I have limited feedback from my advisor and community, so any insights on refining this methodology would be greatly appreciated.

With appreciation of your time, DL

DESeq2 • 632 views

ADD COMMENT • link 7 months ago DL • 0

score 1 · Answer 1 · 2024-08-09

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 4 days ago

United States

Another option instead of SD of VST would be to look at the plotDispEsts plot and think of high variability as a dispersion estimate that is far above the trend line for genes with the same mean.

You can obtain this with:

dist_from_fit <- with(mcols(dds), dispGeneEst - dispFit)

This measure of high variance is often used in single cell.

ADD COMMENT • link 7 months ago Michael Love 43k

0

Entering edit mode

That's interesting! I will explore this option. Thank you, Michael!

ADD REPLY • link 7 months ago DL • 0