Hi all,
I am an undergraduate researcher using DESeq2 for the first time. I am running into an issue with DESeqDataSetFromMatrix where my R is telling me my count data needs to be numeric.
When I check the class of my countData, it is a matrix. When I try to coerce the matrix with as.numeric, I get NAs. I've attached my code below and the error message, does anyone have any ideas on how to trouble shoot this?
Error in DESeqDataSet(se, design = design, ignoreRank) : counts matrix should be numeric, currently it has mode: character
Thanks, Molly
Samples<-c("MM-0017-RNA-T-07", "MM-0623-RNA-T-01", "MM-0039-RNA-T-06")
inputs<-list()
for (i in 1:length(Samples)){
inputs[[i]] <- paste0("/data1/users/molly/", "ciri_out/", Samples[i], ".ciri.output")
}
names(inputs) <- Samples
combined.df <- ldply(inputs, function(x){
a <- read.table(file=x, sep="\t", header=T, comment.char="",
stringsAsFactors=F)[,c(1:5)]; a})
colnames(combined.df)[1] <- "sampleID"
counts <- as.matrix(combined.df)
coldata<- data.frame(sample_id=Samples, condition=c("B","B","B","M","M","M"))
coldata$sample_id <- paste0(coldata$sample_id,"-",coldata$condition) #MR
rownames(coldata) <- coldata$sample_id
coldata$condition <- factor(coldata$condition)
dds <- DESeqDataSetFromMatrix(countData = counts,
colData = coldata,
design = ~ condition)
# Error in DESeqDataSet(se, design = design, ignoreRank) :
#counts matrix should be numeric, currently it has mode: character
sessionInfo( )
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS
Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/lapack/liblapack.so.3.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] DESeq2_1.26.0 SummarizedExperiment_1.16.1 DelayedArray_0.12.3 BiocParallel_1.20.1
[5] matrixStats_0.62.0 Biobase_2.46.0 GenomicRanges_1.38.0 GenomeInfoDb_1.22.1
[9] IRanges_2.20.2 S4Vectors_0.24.4 BiocGenerics_0.32.0 data.table_1.14.4
[13] forcats_0.5.2 stringr_1.4.1 purrr_0.3.5 readr_2.1.3
[17] tidyr_1.2.1 tibble_3.1.8 ggplot2_3.4.0 tidyverse_1.3.2
[21] dplyr_1.0.10 plyr_1.8.8 BiocManager_1.30.19
loaded via a namespace (and not attached):
[1] googledrive_2.0.0 colorspace_2.0-3 deldir_1.0-6 ellipsis_0.3.2 htmlTable_2.4.1
[6] XVector_0.26.0 base64enc_0.1-3 fs_1.5.2 rstudioapi_0.14 bit64_4.0.5
[11] AnnotationDbi_1.48.0 fansi_1.0.3 lubridate_1.9.0 xml2_1.3.3 splines_3.6.3
[16] cachem_1.0.6 geneplotter_1.64.0 knitr_1.40 Formula_1.2-4 jsonlite_1.8.3
[21] broom_1.0.1 annotate_1.64.0 cluster_2.1.4 dbplyr_2.2.1 png_0.1-7
[26] compiler_3.6.3 httr_1.4.4 backports_1.4.1 assertthat_0.2.1 Matrix_1.5-3
[31] fastmap_1.1.0 gargle_1.2.1 cli_3.4.1 htmltools_0.5.3 tools_3.6.3
[36] gtable_0.3.1 glue_1.6.2 GenomeInfoDbData_1.2.2 Rcpp_1.0.9 cellranger_1.1.0
[41] vctrs_0.5.0 xfun_0.34 rvest_1.0.3 timechange_0.1.1 lifecycle_1.0.3
[46] XML_3.99-0.3 googlesheets4_1.0.1 zlibbioc_1.32.0 scales_1.2.1 hms_1.1.2
[51] RColorBrewer_1.1-3 yaml_2.3.6 memoise_2.0.1 gridExtra_2.3 rpart_4.1.19
[56] latticeExtra_0.6-30 stringi_1.7.8 RSQLite_2.2.18 genefilter_1.68.0 checkmate_2.1.0
[61] rlang_1.0.6 pkgconfig_2.0.3 bitops_1.0-7 lattice_0.20-45 htmlwidgets_1.5.4
[66] bit_4.0.4 tidyselect_1.2.0 magrittr_2.0.3 R6_2.5.1 generics_0.1.3
[71] Hmisc_4.7-1 DBI_1.1.3 pillar_1.8.1 haven_2.5.1 foreign_0.8-71
[76] withr_2.5.0 survival_3.4-0 RCurl_1.98-1.9 nnet_7.3-12 modelr_0.1.10
[81] crayon_1.5.2 interp_1.1-3 utf8_1.2.2 tzdb_0.3.0 jpeg_0.1-9
[86] locfit_1.5-9.4 grid_3.6.3 readxl_1.4.1 blob_1.2.3 reprex_2.0.2
[91] digest_0.6.30 xtable_1.8-4 munsell_0.5.0
Thanks for your reply! Just to clarify - which column are you suggesting I remove?
The non-numeric one, seems to be the first one. What is the output of
combined.df[1:3,1:3]
?The output is:
Well, as said above, the count data must be a numeric matrix, that means only integers are allowed. Remove all non-numeric columns. All shown columns here are non-numeric.