I'm getting different p-values and FCs from DESeq2 by simply changing the order of the columns in the count table. I can't understand why. I made sure the cols of count table match the rows of colData in both.
The differences with betaPrior=T
are small ([-0.003, 0.003] for p-value, and [-0.002, 0.001] for FC). But if I do betaPrior=F
and then use apeglm
in lfcShrink
to shrink the FC, the difference of log2FC can be up to 0.8 (and the baseMean for that gene is 300), which is concerning.
The difference of log2FoldChange before shrinking is much smaller: [-4.5e-05, 4.5e05]
Column Genotype
in colData is the only tested variable in the model. It contains WT
and Mutant
as factors, with WT
as the 1st level.
Actually if I change column Genotype
to character type and rerun DESeq2 (so DESeq2 will convert them to factors with the warning message), I get the same results independent of the col/row orders in the input. So it seems the difference is somehow related to the factorization of the columns in colData, but I don't know why, and which results I should trust.
The input data (data.rds) can be downloaded here: https://github.com/weishwu/test_data/blob/main/data.rds?raw=true
data <- readRDS('data.rds')
dds1 <- DESeqDataSetFromMatrix(countData = data[['counts']], colData = data[['colData']], design = ~ Genotype)
dds1 <- dds1[rowSums(counts(dds1)) > 1, ]
dds1 <- DESeq(dds1, betaPrior = FALSE)
res1 <- as.data.frame(lfcShrink(dds1, coef='Genotype_Mutant_vs_WT', type='apeglm'))
# only reorder the inputs, but the columns of counts still match the rows of colData:
dds2 <- DESeqDataSetFromMatrix(countData = data[['counts']][,c(4:6,1:3)], colData = data[['colData']][c(4:6,1:3),], design = ~ Genotype)
dds2 <- dds2[rowSums(counts(dds2)) > 1, ]
dds2 <- DESeq(dds2, betaPrior = FALSE)
res2 <- as.data.frame(lfcShrink(dds2, coef='Genotype_Mutant_vs_WT', type='apeglm'))
# difference of log2FC after shrinking
range(res1$log2FoldChange-res2$log2FoldChange)
# difference of log2FC before shrinking
range(results(dds1)$log2FoldChange-results(dds2)$log2FoldChange)
sessionInfo( )
[1] -0.82441013 0.01714274
[1] -4.515867e-05 4.509163e-05
R version 4.1.3 (2022-03-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)
Matrix products: default
BLAS/LAPACK: /opt/conda/envs/WAT_diffex/lib/libopenblasp-r0.3.20.so
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] forcats_0.5.1 stringr_1.4.0
[3] dplyr_1.0.9 purrr_0.3.4
[5] readr_2.1.2 tidyr_1.2.0
[7] tibble_3.1.7 ggplot2_3.3.6
[9] tidyverse_1.3.1 DESeq2_1.34.0
[11] SummarizedExperiment_1.24.0 Biobase_2.54.0
[13] MatrixGenerics_1.6.0 matrixStats_0.62.0
[15] GenomicRanges_1.46.1 GenomeInfoDb_1.30.0
[17] IRanges_2.28.0 S4Vectors_0.32.3
[19] BiocGenerics_0.40.0
loaded via a namespace (and not attached):
[1] bitops_1.0-7 fs_1.5.2 lubridate_1.8.0
[4] bit64_4.0.5 RColorBrewer_1.1-3 httr_1.4.3
[7] numDeriv_2016.8-1.1 tools_4.1.3 backports_1.4.1
[10] utf8_1.2.2 R6_2.5.1 DBI_1.1.2
[13] colorspace_2.0-3 apeglm_1.16.0 withr_2.5.0
[16] tidyselect_1.1.2 bit_4.0.4 compiler_4.1.3
[19] cli_3.3.0 rvest_1.0.2 xml2_1.3.3
[22] DelayedArray_0.20.0 scales_1.2.0 mvtnorm_1.1-3
[25] genefilter_1.76.0 XVector_0.34.0 pkgconfig_2.0.3
[28] bbmle_1.0.25 dbplyr_2.2.0 fastmap_1.1.0
[31] rlang_1.0.2 readxl_1.4.0 rstudioapi_0.13
[34] RSQLite_2.2.8 generics_0.1.2 jsonlite_1.8.0
[37] BiocParallel_1.28.3 RCurl_1.98-1.7 magrittr_2.0.3
[40] GenomeInfoDbData_1.2.7 Matrix_1.4-1 Rcpp_1.0.8.3
[43] munsell_0.5.0 fansi_1.0.3 lifecycle_1.0.1
[46] stringi_1.7.6 MASS_7.3-57 zlibbioc_1.40.0
[49] plyr_1.8.7 grid_4.1.3 blob_1.2.3
[52] parallel_4.1.3 bdsmatrix_1.3-6 crayon_1.5.1
[55] lattice_0.20-45 Biostrings_2.62.0 haven_2.5.0
[58] splines_4.1.3 annotate_1.72.0 hms_1.1.1
[61] KEGGREST_1.34.0 locfit_1.5-9.5 pillar_1.7.0
[64] geneplotter_1.72.0 reprex_2.0.1 XML_3.99-0.9
[67] glue_1.6.2 modelr_0.1.8 png_0.1-7
[70] vctrs_0.4.1 tzdb_0.3.0 cellranger_1.1.0
[73] gtable_0.3.0 assertthat_0.2.1 cachem_1.0.6
[76] emdbook_1.3.12 xtable_1.8-4 broom_0.8.0
[79] coda_0.19-4 survival_3.3-1 AnnotationDbi_1.56.1
[82] memoise_2.0.1 ellipsis_0.3.2
ATpoint Thanks a lot for your reply. I cannot reproduce the problem using
makeExampleDESeqDataSet
either. However, if I replace the count table with my real data, I got the differences again.Also, if you look at the difference of
log2FoldChange
betweendds1
anddds2
even in the data produced bymakeExampleDESeqDataSet
, you will still see a tiny bit difference. The difference is on the scale of 1e-15 whenn=1000
(default) inmakeExampleDESeqDataSet
, but goes up to 1e-9 whenn=50000
(the number that is closer to real data). Somehow this difference gets much larger (1e-3) with my input data.Would it be possible that I send you a count table?
By the way, I have other columns in addition to
Genotype
in my colData, so I didn't have to usedrop=F
. But I only usedGenotype
in the model.I am not the DESeq2 maintainer, just a guy who answers questions to improve myself. But yes, please feel free to upload a reproducible example so the developer or others can have a look.
I edited my post to include the link to my input data (de-identified). I also changed the code to use
betaPrior=F
and then uselfcShrink(type='apeglm')
to shrink the FC. This produced up to 0.8 difference for the shrunken log2FC betweendds1
anddds2
. Thanks for your time.I get only a small numerical difference on shrunken with your code and data:
Michael Love Thanks a lot for helping me troubleshoot. I can see that we are both using version
1.34.0
, so I'm really puzzled where the discrepancy could come from. I reinstalled DESeq2 and apeglm using conda: https://anaconda.org/bioconda/bioconductor-deseq2 https://anaconda.org/bioconda/bioconductor-apeglm And ran my code with the same input, and got the same results as I got before:This is so weird.
What does the delta LFC ~0.8 gene look like in terms of its counts? Aside from the fact that we get different numbers, which I don't understand, can you just remove genes like this (maybe mostly 0 counts) at the top of your script?
It has normal counts:
I'm also using the same apeglm. No idea.
FWIW I get:
If it's not too troublesome, I wonder if you could make a docker/singularity image file containing your DESeq2 (if it's not from the same conda source) and share it with me?
I'm a little limited on bandwidth right now, so don't think I can put an image together. I did just run on a Linux machine and also I get low differences in LFC from convergence:
Why do the results differ in different OS?
Could you post the content of
to get more info on that gene? Maybe the OS dependency comes from different BLAS/LAPACK versions?
I tried the above (with betaPrior FALSE and apeglm) on macOS, two different Linux machines (Ubuntu and CentOS), both inside and outside of Docker (Ubuntu), and via conda, and the maximum differences I get are in the range of somewhatish in the range of e-8 - e-13. I also tried disabling the implicit BLAS multithreading that is often default on Linux machines
RhpcBLASctl::blas_set_num_threads(1)
(so one thread=disabled) which makes no difference, nor makes settings seeds before executing all steps. Can this be some kind of machine-precision thing is specific to your machine?Thanks a lot for spending time trying it in so many ways. I also tried it on a local PC and got a 1e-8 scale difference. It seems I can only reproduce the issue on our server (RedHat 7.9 3.10.0-1160.62.1.el7.x86_64). No matter which version of DESeq2 (1.26, 1.34 or 1.36) or which source (conda/docker) I tried, I got the same up to 0.8 difference as long as I ran it on our server. I had one of my colleagues run the same code with the same input and she observed basically the same pattern. How this can be specific to the machine/OS is mysterious.
apeglm uses
optim_lbfgs
from RcppNumerical for optimization, and that could give different convergence on different backends, but I'm still surprised that it seems limited to this one setup.Can you try something else, can you remove low count genes before running DESeq():
Just to see if it has to do with sensitivity to these genes (the prior should not be sensitive to these as we've constructed it...).
I added the filtering as you suggested, and the differences became smaller (both before and after shrinking using apeglm):
I would recommend this step. Ideally it wouldn't be necessary, but anyway it's a common first step for many statistical routines to remove the very low counts that have no power.
Thanks. I'm incorporating this filter in all my comparisons. Thanks for your help!