Entering edit mode
Hi,
Getting an unexpected error with DESeq2.
> dds = DESeq(dds)
using pre-existing normalization factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
-- replacing outliers and refitting for 792 genes
-- DESeq argument 'minReplicatesForReplace' = 7
-- original counts are preserved in counts(dds)
estimating dispersions
Error in `rownames<-`(`*tmp*`, value = names(x)) :
duplicate rownames not allowed
Of course I checked rownames are unique:
> rn = rownames(dds.ed)
> rn[duplicated(rn)]
character(0)
Also tried setting new rownames like rownames(dds) = 1:length(dds)
, but I still get this error.
I've tried installing the binary for OSX and compiling source, and same result.
I must be missing something obvious. Any ideas?
thanks,
Ashley
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.14
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] splines parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] cqn_1.26.0 quantreg_5.36 SparseM_1.77 preprocessCore_1.42.0 nor1mix_1.2-3
[6] mclust_5.4.1 edgeR_3.22.5 limma_3.36.5 DESeq2_1.20.0 SummarizedExperiment_1.10.1
[11] DelayedArray_0.6.6 BiocParallel_1.14.2 matrixStats_0.54.0 Biobase_2.40.0 forcats_0.3.0
[16] dplyr_0.7.6 purrr_0.2.5 tidyr_0.8.1 tibble_1.4.2 ggplot2_3.0.0.9000
[21] tidyverse_1.2.1 readr_1.1.1 stringr_1.3.1 rtracklayer_1.40.6 GenomicRanges_1.32.7
[26] GenomeInfoDb_1.16.0 IRanges_2.14.12 S4Vectors_0.18.3 BiocGenerics_0.26.0 BiocInstaller_1.30.0
loaded via a namespace (and not attached):
[1] colorspace_1.3-2 htmlTable_1.12 XVector_0.20.0 base64enc_0.1-3 rstudioapi_0.8 MatrixModels_0.4-1
[7] bit64_0.9-7 AnnotationDbi_1.42.1 lubridate_1.7.4 xml2_1.2.0 geneplotter_1.58.0 knitr_1.20
[13] Formula_1.2-3 jsonlite_1.5 Rsamtools_1.32.3 broom_0.5.0 annotate_1.58.0 cluster_2.0.7-1
[19] compiler_3.5.1 httr_1.3.1 backports_1.1.2 assertthat_0.2.0 Matrix_1.2-14 lazyeval_0.2.1
[25] cli_1.0.1 acepack_1.4.1 htmltools_0.3.6 tools_3.5.1 bindrcpp_0.2.2 gtable_0.2.0
[31] glue_1.3.0 GenomeInfoDbData_1.1.0 Rcpp_0.12.19 cellranger_1.1.0 Biostrings_2.48.0 nlme_3.1-137
[37] rvest_0.3.2 XML_3.98-1.16 zlibbioc_1.26.0 scales_1.0.0 hms_0.4.2 RColorBrewer_1.1-2
[43] yaml_2.2.0 memoise_1.1.0 gridExtra_2.3 rpart_4.1-13 latticeExtra_0.6-28 stringi_1.2.4
[49] RSQLite_2.1.1 genefilter_1.62.0 checkmate_1.8.5 rlang_0.2.2 pkgconfig_2.0.2 bitops_1.0-6
[55] lattice_0.20-35 bindr_0.1.1 GenomicAlignments_1.16.0 htmlwidgets_1.3 bit_1.1-14 tidyselect_0.2.4
[61] plyr_1.8.4 magrittr_1.5 R6_2.3.0 Hmisc_4.1-1 DBI_1.0.0 pillar_1.3.0
[67] haven_1.1.2 foreign_0.8-71 withr_2.1.2 survival_2.42-6 RCurl_1.95-4.11 nnet_7.3-12
[73] modelr_0.1.2 crayon_1.3.4 locfit_1.5-9.1 grid_3.5.1 readxl_1.1.0 data.table_1.11.8
[79] blob_1.1.1 digest_0.6.17 xtable_1.8-3 munsell_0.5.0
I’m not sure if I can figure out what’s going on because it doesn’t throw this error in our tests. I’ll take a look at the code, but may not find the issue.
I’d say you can also just assess outlier by eye with a few example of peaks with large value of maxCooks in mcols(dds), rather than using the outlier replacement heuristic.
Can you show mcols(dds) before you run DESeq()? Are there any additional columns there?
Hello,
Just wanted to add that I had the same issue: https://www.biostars.org/p/343037/
Setting minRep=Inf also fixed the problem for me, and it does look like I had a few outliers in the post-DESeq dds.
When I tried to graph outliers in the pre-DESeq dds using the method described in that post (bottom), I got this error:
Error in apply(assays(dds_kal_agg)[["cooks"]], 1, max) :
dim(X) must have a positive length
Hope this helps.
Kristin
(edit - realizing that the error message is because Cooks has not been calculated for dds_kal_agg, being pre-DESeq - is there another feature of mcols I should check out? It looks pretty empty:)
Can you send me the dds to maintainer(“DESeq2”) ? And I’ll try to hunt down the bug.
Thank you, I was able to reproduce with v1.20.
The problem is that you have duplicate columns of colData(dds), which breaks some code where replaceOutliers adds a column to colData(dds) and adds some metadata about that column.
sum(duplicated(colnames(colData(dds))))
"Line" and "DESeqAnalysisID" columns both have duplicates.
So a solution is to only have unique column names for colData(dds), which is probably a good idea anyway.
I noticed that the error isn't thrown anway in the development version, which will be released in a few weeks as v1.22.