Hello,
I am trying to annotate an SCEset using getBMFeatureAnnos where the filter column contains values such as "MTCE.31" and "MTCE.23". In making the SCEset these are recognized as different row names, however when running the getBM function the ".31" and ".23" are ignored and they are interpreted as duplicate row.names. Is there another way to format the column to get around this?
Thank you for any advice you may have.
#TestData loaded as a .csv file
TestData <- read.csv("testdata.csv", colClasses = c(list("character"), rep("numeric", 8)), row.names = 1)
TestData
# X cell.1a cell.1b cell.1c cell.2a cell.2b cell.3a cell.3b cell.3c
#1 2RSSE.1 866 1404 898 129 1053 141 33 70
#2 2RSSE.2 58 171 65 17 70 36 11 17
#3 MTCE.23 14911 27132 10405 82033 117449 57775 11544 14426
#4 MTCE.25 1888 3615 1453 5891 40047 9144 2396 2947
#5 MTCE.31 20818 38746 12289 235235 211993 109575 19117 20580
#6 cct-6 1488 2236 1274 487 6430 1006 2311 381
#7 cct-8 1113 1679 1099 530 3727 1012 1135 130
#8 CD4.3 58 70 64 45 122 19 59 70
#9 CD4.7 34 37 27 56 400 11 53 88
sce <- newSCESet(countData = TestData)
sce <- getBMFeatureAnnos(sce, filters = "external_gene_name", attributes = c("wormbase_gene", "ensembl_gene_id","external_gene_name", "chromosome_name", "transcript_biotype", "go_id", "kegg_enzyme", "entrezgene"), feature_symbol = "external_gene_name", feature_id = "wormbase_gene", biomart = "ENSEMBL_MART_ENSEMBL", dataset = "celegans_gene_ensembl", host = "www.ensembl.org")
Error in `row.names<-.data.frame`(`*tmp*`, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘2RSSE’, ‘CD4’, ‘MTCE’
sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] biomaRt_2.30.0 scater_1.2.0 ggplot2_2.2.1 Biobase_2.34.0
[5] BiocGenerics_0.20.0 gplots_3.0.1 RColorBrewer_1.1-2 edgeR_3.16.5
[9] limma_3.30.13 openxlsx_4.0.17 BiocInstaller_1.24.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.11 locfit_1.5-9.1 lattice_0.20-34 GO.db_3.4.0
[5] gtools_3.5.0 assertthat_0.2.0 digest_0.6.12 mime_0.5
[9] R6_2.2.2 plyr_1.8.4 stats4_3.3.2 RSQLite_2.0
[13] zlibbioc_1.20.0 rlang_0.1.1 lazyeval_0.2.0 data.table_1.10.4
[17] gdata_2.18.0 blob_1.1.0 S4Vectors_0.12.2 stringr_1.2.0
[21] RCurl_1.95-4.8 bit_1.1-12 munsell_0.4.3 shiny_1.0.3
[25] httpuv_1.3.5 vipor_0.4.5 pkgconfig_2.0.1 ggbeeswarm_0.5.3
[29] htmltools_0.3.6 tximport_1.2.0 tibble_1.3.3 gridExtra_2.2.1
[33] IRanges_2.8.2 matrixStats_0.52.2 XML_3.98-1.9 viridisLite_0.2.0
[37] dplyr_0.7.1 bitops_1.0-6 grid_3.3.2 xtable_1.8-2
[41] gtable_0.2.0 DBI_0.7 magrittr_1.5 scales_0.4.1
[45] KernSmooth_2.23-15 stringi_1.1.5 reshape2_1.4.2 viridis_0.4.0
[49] bindrcpp_0.2 org.Ce.eg.db_3.4.0 rjson_0.2.15 tools_3.3.2
[53] bit64_0.9-7 glue_1.1.1 beeswarm_0.2.3 AnnotationDbi_1.36.2
[57] colorspace_1.3-2 rhdf5_2.18.0 caTools_1.17.1 shinydashboard_0.6.1
[61] memoise_1.1.0 bindr_0.1
Ah, thanks for finding that.