Comparison of results obtained from a subset of the data versus the complete dataset for a few selected genes.
1
0
Entering edit mode
@hrishi27n-11821
Last seen 33 minutes ago
United States

Hi,

I am conducted a Differential expression analysis using DESeq2 to investigate the impact of treatment across sexes. The dataset includes two treatment groups, A and B. To assess sex-specific treatment effects, I am using these two approaches: (1) using the full dataset with a model that included sex and treatment as factors, followed by extraction of sex-specific contrasts; and (2) subsetting the dataset by sex and performing DESeq2 analysis separately for each group before extracting treatment-specific results. My PCA analysis does not show any separation by sex.

When comparing the log2 fold changes obtained from both approaches, the results were largely consistent, with most genes showing similar values. However, a few genes exhibited notable differences. Interestingly, one of these genes was the top hit in the DE analysis using the full dataset, yet it was not even statistically significant when using the subset-based approach. This discrepancy is not entirely unexpected, as p-values can vary between approaches due to differences in sample size and model structure. I also see for a few genes the baseMean is drastically different between both approaches.

I would like to understand what approach should be taken in such cases.

Whole Data

pheno <- pheno %>%
  dplyr::mutate(sex_specific = paste(cohort,updated_sex, sep='_')) %>%
  dplyr::mutate(sex_specific = as.factor(sex_specific))

dds_sex <- DESeqDataSetFromMatrix(countData =  round(filtered_data),
                                    colData = pheno,
                                    design = ~0 + sex_specific)

dds_sex <- DESeq(dds_sex, parallel = TRUE)

male_compare <- results(dds_sex, contrast = c('sex_specific','treated_A_M','treated_B_M')) %>%
      data.frame() %>% 
      dplyr::filter(!is.na(padj)) %>%
      dplyr::arrange(padj) %>%
      rownames_to_column(var='ensembl') 

Subset Data

dds <- DESeqDataSetFromMatrix(countData =  round(filtered_data),
                                    colData = pheno,
                                    design = ~0 + updated_sex + cohort)

dds <- DESeq(dds, parallel = TRUE)

dds_men_sub <- subset(dds, select=colData(dds)$updated_sex == 'M')
design(dds_men_sub) <- formula(~0 + cohort)

dds_men_sub <- DESeq(dds_men_sub, parallel = TRUE)

subset_method <- results(dds_men_sub, contrast = c('cohort','treated_A','treated_B')) %>%
  data.frame() %>%
  dplyr::filter(!is.na(padj)) %>%
  dplyr::arrange(padj) 

subset_method <- subset_method[male_compare$ensembl,]

plot(subset_method$log2FoldChange, male_compare$log2FoldChange)

enter image description here

DESeq2 • 361 views
ADD COMMENT
0
Entering edit mode
swbarnes2 ★ 1.4k
@swbarnes2-14086
Last seen 1 hour ago
San Diego

In general, using the entire dataset and then just comparing the subgroups you want within the whole dataset is preferable.

ADD COMMENT

Login before adding your answer.

Traffic: 402 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6