Hello,
I am trying to use the Bioconductor DESeq2 package but I keep running into errors. I wanted to reach out and ask for any advice on how to fix my code to use the DESeq2 package. The error message that I keep getting from DESeqDataSetFromMatrix function is "Error in .rowNamesDF<-
(x, value = value) : duplicate 'row.names' are not allowed." I appreciate any advice you can provide.
Code should be placed in three backticks as shown below
# Loading libraries
library( "DESeq2" )
library(ggplot2)
# I am reading in the file I downloaded locally
library("readxl")
metadata_original <- read_excel("C:/Users/kevin/Downloads/MAYO_TCX_METADATA.xlsx")
TCX_original <- read_excel("C:/Users/kevin/Downloads/MAYO_TCX_Pipeline.xlsx")
# Here is how the metadata looks
individualID individualIdSource species sex race ethnicity yearsEducation ageDeath causeDeath mannerDeath apoeGenotype pmi pH brainWeight diagnosis diagnosisCriteria CERAD Braak thal
11492 MayoBrainBank Human male White NA NA 73 NA NA 33 1 NA NA progressive supranuclear palsy NA NA 3 0
6810 MayoBrainBank Human male White NA NA 74 NA NA 33 1 NA NA progressive supranuclear palsy NA NA 2 2
1046 MayoBrainBank Human female White NA NA 72 NA NA 33 2 NA NA Alzheimer Disease NA NA 6 5
1924 MayoBrainBank Human female White NA NA 90+ NA NA 33 2 NA NA control NA NA 2 NA
# Here is how some of the TCX data looks
ensembl_gene_id 11492 6810 1046 1924 1926 6913 892
ENSG00000227232 95 128 150 52 102 151 143
ENSG00000279457 242 407 204 367 409 510 196
ENSG00000228463 207 100 184 1 49 40 61
# The purpose of using tibble is to convert the column names into row names.
# From tutorials I read, some converted the data into this format whereas some did not.
# I am not sure if this is necessary. An example of what this does is that it takes
# the entire gene column from TCX file and make it into row name.
library(tibble)
metadata <- data.frame(column_to_rownames(metadata_original, var = "individualID"))
TCX <- data.frame(column_to_rownames(TCX_original, var = "ensembl_gene_id"))
# For TCX and Metadata they just made one column into its row names
dds <- DESeqDataSetFromMatrix(countData=TCX,
colData=metadata,
design=~sex, tidy = TRUE)
# The error I get is:
Error in `.rowNamesDF<-`(x, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘0’, ‘1’, ‘10’, ‘100’, ‘1000’, ‘10007’, ‘1001’, ‘1002’, ‘1003’, ‘1004’, ‘1005’, ‘10055’, ‘1007’, ‘10078’, ‘1008’, ‘1009’, ‘101’, ‘1010’, ‘1011’, ‘1012’, ‘1013’, ‘10135’, ‘1014’, ‘1016’, ‘10165’, ‘1017’, ‘10175’, ‘1018’, ‘1019’, ‘102’, ‘1020’, ‘1021’, ‘1022’, ‘1023’, ‘1024’, ‘1025’, ‘10262’, ‘10269’, ‘1027’, ‘1028’, ‘10289’, ‘1029’, ‘103’, ‘1030’, ‘1032’, ‘1034’, ‘1035’, ‘1036’, ‘1037’, ‘1038’, ‘1039’, ‘104’, ‘1040’, ‘1041’, ‘1042’, ‘1043’, ‘1044’, ‘1045’, ‘1046’, ‘10461’, ‘10466’, ‘1048’, ‘1049’, ‘105’, ‘1050’, ‘1051’, ‘1053’, ‘1054’, ‘1055’, ‘1056’, ‘1057’, ‘10585’, ‘1059’, ‘106’, ‘1061’, ‘1062’, ‘10636’, ‘1064’, ‘1065’, ‘1066’, ‘10660’, ‘1067’, ‘1068’, ‘10681’, ‘1069’, ‘107’, ‘1070’, ‘1071’, ‘1072’, ‘1073’, ‘1074’, ‘1075’, ‘1077’, ‘1078’, ‘10782’, ‘1079’, ‘108’, ‘1080’, ‘1081’, ‘10826’, ‘1083’, ‘1084’, ‘1085’, ‘10865’, ‘1087’, ‘1088’, ‘1089’, ‘109’, ‘1090’, ‘1091’, ‘1092’, ‘10926’, ‘1093’, ‘1094’, ‘1097’, ‘1099’, ‘11’, ‘110’, ‘1100’, ‘11004 [... truncated]
> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
system code page: 936
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] tibble_3.1.6 readxl_1.4.0 ggplot2_3.3.5
[4] DESeq2_1.34.0 SummarizedExperiment_1.24.0 Biobase_2.54.0
[7] MatrixGenerics_1.6.0 matrixStats_0.61.0 GenomicRanges_1.46.1
[10] GenomeInfoDb_1.30.0 IRanges_2.28.0 S4Vectors_0.32.3
[13] BiocGenerics_0.40.0
loaded via a namespace (and not attached):
[1] locfit_1.5-9.5 Rcpp_1.0.8.3 lattice_0.20-45 png_0.1-7
[5] Biostrings_2.62.0 assertthat_0.2.1 utf8_1.2.2 cellranger_1.1.0
[9] R6_2.5.1 RSQLite_2.2.11 httr_1.4.2 pillar_1.7.0
[13] zlibbioc_1.40.0 rlang_1.0.2 rstudioapi_0.13 annotate_1.72.0
[17] blob_1.2.2 Matrix_1.4-1 splines_4.1.2 BiocParallel_1.28.3
[21] geneplotter_1.72.0 RCurl_1.98-1.6 bit_4.0.4 munsell_0.5.0
[25] DelayedArray_0.20.0 compiler_4.1.2 pkgconfig_2.0.3 tidyselect_1.1.2
[29] KEGGREST_1.34.0 GenomeInfoDbData_1.2.7 XML_3.99-0.9 fansi_1.0.3
[33] withr_2.5.0 crayon_1.5.1 dplyr_1.0.8 bitops_1.0-7
[37] grid_4.1.2 xtable_1.8-4 gtable_0.3.0 lifecycle_1.0.1
[41] DBI_1.1.2 magrittr_2.0.2 scales_1.1.1 cli_3.2.0
[45] cachem_1.0.6 XVector_0.34.0 genefilter_1.76.0 ellipsis_0.3.2
[49] vctrs_0.3.8 generics_0.1.2 RColorBrewer_1.1-2 tools_4.1.2
[53] bit64_4.0.5 glue_1.6.2 purrr_0.3.4 parallel_4.1.2
[57] fastmap_1.1.0 survival_3.3-1 AnnotationDbi_1.56.2 colorspace_2.0-3
[61] memoise_2.0.1
Is
length(unique(rownames(TCX))==length(rownames(TCX))
TRUE
?Thank you so much for the response. Yes, I just checked right now that both length(unique(rownames(TCX)) and length(rownames(TCX)) return 17011 so they are in fact equal. I'm not sure why I still keep getting the error "non-unique values when setting 'row.names'"