Question

Unexpectedly low number of DE genes?

0

Entering edit mode

mhalache • 0

@e91d524f

Last seen 2.3 years ago

United Kingdom

Hello!

This is my first attempt to do DESeq2 analysis. I followed the vignettes and tutorials I could find online as best, but got unexpectedly low number of DE genes. In the experiment, there are 2 subjects (X90 and X92) and 2 paired samples from each (Tom=control and Venus=treated). The counts were obtained using salmon, and for the DESeq2 analysis (DESeq2_1.36.0) I have used the salmon.merged.gene_counts_length_scaled.tsv, as recommended.

salmon_table<-read.table("results/star_salmon/salmon.merged.gene_counts_length_scaled.tsv",
                          sep="\t",header=TRUE,row.names="gene_id")
cts <- salmon_table[,c(2:5)]      # sub-setting to exclude a subject with problematic sequencing quality  
cts <- as.matrix(round(cts))

coldata<-read.csv("metadata.csv",header=TRUE,row.names=1)
coldata<-coldata[1:4,]              # sub-setting to exclude a subject with problematic sequencing quality

dds<- DESeqDataSetFromMatrix(countData = cts,
                                         colData = coldata,
                                         design= ~ mouse+genotype)

dds <- dds[ rowSums(counts(dds)) > 1, ]

vsd<-vst(dds, blind=TRUE)
pcaData <- plotPCA(vsd, intgroup=c("mouse","genotype"), returnData=TRUE)
percentVar <- round(100 * attr(pcaData, "percentVar"))
ggplot(pcaData, aes(PC1, PC2, color=genotype, shape=mouse)) +
  geom_point(size=3) +
  xlab(paste0("PC1: ",percentVar[1],"% variance")) +
  ylab(paste0("PC2: ",percentVar[2],"% variance")) +
  coord_fixed() +
  geom_label_repel(aes(label = name), colour = I(alpha("black", 0.85)), 
                    size = 3,box.padding = 1.5,fill="white",max.overlaps = 30 );

enter image description here

dds<-DESeq(dds)

res_genotype_Venus_vs_Tom_noshrink<- results(dds, contrast=c("genotype","Venus","Tom"), independentFiltering=TRUE)

summary(res_genotype_Venus_vs_Tom_noshrink)
out of 30674 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 12, 0.039%
LFC < 0 (down)     : 57, 0.19%
outliers [1]       : 0, 0%
low counts [2]     : 16057, 52%
(mean count < 20)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

About half (52%) of the genes were filtered due to low counts (<20). Looking at the MA plot, it appears that there are some genes high-ish log2FC exhibiting such average counts.

plot(log2(res_genotype_Venus_vs_Tom_noshrink$baseMean), 
     res_genotype_Venus_vs_Tom_noshrink$log2FoldChange)

enter image description here

I had to plot it as above, since the default plotting command plotMA() produced

plotMA(res_genotype_Venus_vs_Tom_noshrink)

enter image description here

I also performed shrinking and it looks like the genes with < 2^^5 = 32 counts have shrunk log2FC set to ~0 (which, from what I read online can be due to either "low counts" or "high dispersion")

res_genotype_Venus_vs_Tom <- lfcShrink(dds, coef="genotype_Venus_vs_Tom", 
   type="apeglm",res=res_genotype_Venus_vs_Tom_noshrink)

plot(log2(res_genotype_Venus_vs_Tom$baseMean), 
  res_genotype_Venus_vs_Tom$log2FoldChange)

enter image description here

My questions are: am I doing something wrong? Am I missing some DE genes from the mean count < 20 region? And how DESeq2 arrived at the cut-off value of 20?

Thank you in advance for any insight/suggestions!

sessionInfo( )
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.5 (Nitrogen)

Matrix products: default
BLAS:   /exports/eddie3_apps_local/apps/community/roslin/R/4.2.0/lib64/R/lib/libRblas.so
LAPACK: /exports/eddie3_apps_local/apps/community/roslin/R/4.2.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
 [1] apeglm_1.18.0               PCAtools_2.8.0
 [3] tximport_1.24.0             readr_2.1.3
 [5] tzdb_0.3.0                  RColorBrewer_1.1-3
 [7] edgeR_3.38.4                limma_3.52.4
 [9] patchwork_1.1.2             viridis_0.6.2
[11] viridisLite_0.4.0           EnhancedVolcano_1.14.0
[13] ggrepel_0.9.1               cowplot_1.1.1
[15] org.Mm.eg.db_3.15.0         AnnotationDbi_1.58.0
[17] DOSE_3.22.1                 clusterProfiler_4.4.4
[19] pheatmap_1.0.12             ggplot2_3.3.6
[21] DESeq2_1.36.0               memoise_2.0.1
[23] SummarizedExperiment_1.26.1 Biobase_2.56.0
[25] MatrixGenerics_1.8.1        matrixStats_0.62.0
[27] GenomicRanges_1.48.0        GenomeInfoDb_1.32.4
[29] IRanges_2.30.0              S4Vectors_0.34.0
[31] BiocGenerics_0.42.0         Matrix_1.5-1

loaded via a namespace (and not attached):
  [1] shadowtext_0.1.2          fastmatch_1.1-3
  [3] plyr_1.8.7                igraph_1.3.5
  [5] lazyeval_0.2.2            splines_4.2.0
  [7] BiocParallel_1.30.3       digest_0.6.29
  [9] yulab.utils_0.0.5         GOSemSim_2.22.0
 [11] GO.db_3.15.0              fansi_1.0.3
 [13] magrittr_2.0.3            ScaledMatrix_1.4.1
 [15] Biostrings_2.64.1         annotate_1.74.0
 [17] graphlayouts_0.8.2        bdsmatrix_1.3-6
 [19] enrichplot_1.16.2         colorspace_2.0-3
 [21] blob_1.2.3                dplyr_1.0.10
 [23] crayon_1.5.1              RCurl_1.98-1.9
 [25] jsonlite_1.8.2            scatterpie_0.1.8
 [27] genefilter_1.78.0         survival_3.3-1
 [29] ape_5.6-2                 glue_1.6.2
 [31] polyclip_1.10-0           gtable_0.3.0
 [33] zlibbioc_1.42.0           XVector_0.36.0
 [35] DelayedArray_0.22.0       BiocSingular_1.12.0
 [37] scales_1.2.0              mvtnorm_1.1-3
 [39] DBI_1.1.3                 Rcpp_1.0.9
 [41] emdbook_1.3.12            xtable_1.8-4
 [43] gridGraphics_0.5-1        tidytree_0.4.1
 [45] dqrng_0.3.0               bit_4.0.4
 [47] rsvd_1.0.5                httr_1.4.4
 [49] fgsea_1.22.0              ellipsis_0.3.2
 [51] pkgconfig_2.0.3           XML_3.99-0.11
 [53] farver_2.1.1              locfit_1.5-9.6
 [55] utf8_1.2.2                labeling_0.4.2
 [57] ggplotify_0.1.0           tidyselect_1.1.2
 [59] rlang_1.0.4               reshape2_1.4.4
 [61] munsell_0.5.0             tools_4.2.0
 [63] cachem_1.0.6              downloader_0.4
 [65] cli_3.3.0                 generics_0.1.3
 [67] RSQLite_2.2.18            stringr_1.4.0
 [69] fastmap_1.1.0             ggtree_3.4.4
 [71] bit64_4.0.5               tidygraph_1.2.2
 [73] purrr_0.3.5               KEGGREST_1.36.3
 [75] ggraph_2.0.6              nlme_3.1-158
 [77] sparseMatrixStats_1.8.0   aplot_0.1.7
 [79] DO.db_2.9                 compiler_4.2.0
 [81] png_0.1-7                 treeio_1.20.2
 [83] tibble_3.1.7              tweenr_2.0.2
 [85] geneplotter_1.74.0        stringi_1.7.8
 [87] lattice_0.20-45           vctrs_0.4.1
 [89] pillar_1.8.0              lifecycle_1.0.1
 [91] data.table_1.14.2         bitops_1.0-7
 [93] irlba_2.3.5.1             qvalue_2.28.0
 [95] R6_2.5.1                  gridExtra_2.3
 [97] codetools_0.2-18          MASS_7.3-58
 [99] withr_2.5.0               GenomeInfoDbData_1.2.8
[101] parallel_4.2.0            hms_1.1.2
[103] grid_4.2.0                ggfun_0.0.7
[105] beachmat_2.12.0           coda_0.19-4
[107] tidyr_1.2.1               DelayedMatrixStats_1.18.1
[109] bbmle_1.0.25              ggforce_0.4.1
[111] numDeriv_2016.8-1.1

DESeq2 • 1.0k views

ADD COMMENT • link 2.3 years ago mhalache • 0

score 0 · Answer 1 · 2022-10-17

0

Entering edit mode

swbarnes2 ★ 1.4k

@swbarnes2-14086

Last seen 2 days ago

San Diego

1) Looking at your PCA, I am not surprised that you don't have many DE genes. You have very little power with so few samples, and the first PC is clearly associated with mouse, not genotype. 2) I think your MAPlot looks like that because limma has an MAplot command as well, and R is using that, not DESeq's. Specify that you want to use DESeq's version, and it should work the way you expect.

I see no reason to think that DESeq is not giving you the correct results for the experiment you ran. You have very few samples, and your read counts are low.