DESeq: Gene is not flagged as differentially expressed despite count data showing differences
1
0
Entering edit mode
ktz • 0
@04e46025
Last seen 50 minutes ago
United Kingdom

Hi everyone,

I am analysing an RNAseq dataset where despite the count data showing differences between groups of interest (and not much within group), a gene is not being flagged as differentially expressed. I would like to understand why this is and what I can do!

The coldata is as follows:

enter image description here

My code for finding DE genes between groups while controlling for replicates is as follows:


dds <- DESeqDataSetFromTximport(txi, colData = meta, 
                                             design = ~Rep + Group, 
                                             rowData = rowdata)

keep <- filterByExpr(dds, group = dds$Group)
dds <- dds[keep,]

dds <- estimateSizeFactors(dds)

deseq <- DESeq(dds)

res <- results(deseq, contrast = c("Group", "V_VC", "V_VC_CD45"), alpha = 0.05, tidy = TRUE)

sessionInfo( )
R version 4.4.2 (2024-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)

Matrix products: default


locale:
[1] LC_COLLATE=English_United Kingdom.utf8  LC_CTYPE=English_United Kingdom.utf8    LC_MONETARY=English_United Kingdom.utf8
[4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.utf8    

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] ashr_2.2-63                 edgeR_4.4.2                 limma_3.62.2                tximport_1.34.0             biomaRt_2.62.1             
 [6] pheatmap_1.0.12             DESeq2_1.46.0               SummarizedExperiment_1.36.0 Biobase_2.66.0              MatrixGenerics_1.18.1      
[11] matrixStats_1.5.0           GenomicRanges_1.58.0        GenomeInfoDb_1.42.3         IRanges_2.40.1              S4Vectors_0.44.0           
[16] BiocGenerics_0.52.0         here_1.0.1                  tibble_3.2.1                readr_2.1.5                 openxlsx_4.2.8             
[21] dplyr_1.1.4                

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.1        farver_2.1.2            blob_1.2.4              filelock_1.0.3          Biostrings_2.74.1       fastmap_1.2.0          
 [7] BiocFileCache_2.14.0    digest_0.6.37           lifecycle_1.0.4         statmod_1.5.0           KEGGREST_1.46.0         invgamma_1.1           
[13] RSQLite_2.3.9           magrittr_2.0.3          compiler_4.4.2          rlang_1.1.5             progress_1.2.3          tools_4.4.2            
[19] labeling_0.4.3          prettyunits_1.2.0       S4Arrays_1.6.0          bit_4.5.0.1             curl_6.2.1              DelayedArray_0.32.0    
[25] xml2_1.3.6              RColorBrewer_1.1-3      abind_1.4-8             BiocParallel_1.40.0     purrr_1.0.4             withr_3.0.2            
[31] grid_4.4.2              colorspace_2.1-1        ggplot2_3.5.1           scales_1.3.0            cli_3.6.4               crayon_1.5.3           
[37] generics_0.1.3          httr_1.4.7              tzdb_0.4.0              DBI_1.2.3               cachem_1.1.0            stringr_1.5.1          
[43] zlibbioc_1.52.0         parallel_4.4.2          AnnotationDbi_1.68.0    XVector_0.46.0          vctrs_0.6.5             Matrix_1.7-1           
[49] jsonlite_1.9.0          hms_1.1.3               mixsqp_0.3-54           bit64_4.6.0-1           irlba_2.3.5.1           locfit_1.5-9.11        
[55] glue_1.8.0              codetools_0.2-20        stringi_1.8.4           gtable_0.3.6            UCSC.utils_1.2.0        munsell_0.5.1          
[61] pillar_1.10.1           rappdirs_0.3.3          truncnorm_1.0-9         GenomeInfoDbData_1.2.13 R6_2.6.1                dbplyr_2.5.0           
[67] httr2_1.1.0             rprojroot_2.0.4         vroom_1.6.5             lattice_0.22-6          png_0.1-8               SQUAREM_2021.1         
[73] memoise_2.0.1           renv_1.0.11             Rcpp_1.0.14             zip_2.3.2               SparseArray_1.6.1       pkgconfig_2.0.3

However, when I looked at my gene of interest I was surprised to see that it was not flagged as significantly differentially expressed. Here are the stats for the gene: enter image description here

This was surprising to me since the normalised count data for this gene looks like it is significantly DE between my groups of interest ("V_VC" vs "V_VC_CD45") and in a fairly consistent manner between the replicates (columns 1,2 and 7,8):

enter image description here

Here is the unnormalised count data just FYI:

enter image description here

Here is an example of a gene with similar count differences (both between and within groups) but has been flagged as significant:

enter image description here

normalised count data for this additional gene:

enter image description here

I saw this post ( enter link description here ) about a similar issue but the conclusion, if I understood correctly, was that the gene was interest was not flagged due to high within group count variability. In my case, since there doesn't seem to be as much count variability, I don't understand why that gene was not flagged as differentially expressed.

I would appreciate any help in understanding what's going on and what I can do about this.

Thank you!

DifferentialExpression DESeq2 • 746 views
ADD COMMENT
0
Entering edit mode

since there doesn't seem to be as much count variability,

No? Your CD45 samples range on normalized scale from 2 to 3255. That's 3 orders of magnitude. I would call that quite some heavy variability. I hope I typed that all down correctly, next time please use dput to provide data for reproduction:

d <- data.table::fread(
  text = "CD45  value   batch
n   18710   a
y   3255    a
n   23586   a
y   1277    a
y   5   a
y   27  a
n   18478   b
y   2905    b
y   40  b
n   13706   b
y   23  b
y   2   b"
)

library(tidyverse)

ggplot(data = d, aes(x = CD45, y = log2(value+1), color = batch)) +
  geom_boxplot(outlier.shape = NA) +
  geom_point(position = position_jitterdodge(.2, 0, .75, 1), size = 3)

enter image description here

ADD REPLY
1
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

The simple answer is that the within-group variability is much higher for the first gene as compared to the second. The Wald statistic is just the log fold change divided by the logSE, which heuristically is asking the question 'Are the differences between groups large as compared to the differences within groups?'. And in the case of the first statistic, the answer is no, not really.

Login before adding your answer.

Traffic: 461 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6