Question

Duplicate row.names when building SummarizedExperiment with make_se in DEP

0

Entering edit mode

hele7 ▴ 20

@hele7-15035

Last seen 3.8 years ago

Estonia

Hi,

I'm analyzing label-free proteomics data with DEP, but have run into an error of having duplicate row names.

Here's what I have done: I have generated unique identifiers as indicated in the DEP tutorial:

> head(data)
# A tibble: 6 x 26
  Protein.IDs    Gene.names LNP34_i2_1 LNP34_i2_2 LNP34_i2_3 LNP34_n_1 LNP34_n_2
  <chr>          <chr>           <dbl>      <dbl>      <dbl>     <dbl>     <dbl>
1 P08226;A0A1B0~ Apoe          2.93e10    6.03e 9    2.79e10   3.60e10   2.63e10
2 P07724         Alb           6.36e 9    1.20e10    5.84e 9   6.28e 9   3.17e 9
3 A8DUK4;E9Q223~ Hbb-bs        2.20e 9    2.70e 9    2.85e 9   5.09e 9   2.13e 9
4 Q91VB8;P01942~ Hba-a1        1.51e 9    2.84e 9    2.00e 9   2.86e 9   2.12e 9
5 A0A075B5P6;A0~ Ighm          1.82e 9    1.28e 9    1.77e 9   4.15e 8   2.44e 9
6 Q921I1;F7BAE9~ Tf            5.06e 8    1.59e 9    5.44e 8   4.20e 8   2.53e 8
# ... with 19 more variables: LNP34_n_3 <dbl>, LNP35_i2_1 <dbl>,
#   LNP35_i2_2 <dbl>, LNP35_i2_3 <dbl>, LNP35_n_1 <dbl>, LNP35_n_2 <dbl>,
#   LNP35_n_3 <dbl>, LNP36_i2_1 <dbl>, LNP36_i2_2 <dbl>, LNP36_i2_3 <dbl>,
#   LNP36_n_1 <dbl>, LNP36_n_2 <dbl>, LNP36_n_3 <dbl>, LNP37_i2_1 <dbl>,
#   LNP37_i2_2 <dbl>, LNP37_i2_3 <dbl>, LNP37_n_1 <dbl>, LNP37_n_2 <dbl>,
#   LNP37_n_3 <dbl>

> data$Gene.names %>% duplicated() %>% any()
[1] TRUE

> data %>% group_by(Gene.names) %>% summarize(frequency = n()) %>% arrange(desc(frequency)) %>% filter(frequency > 1)
# A tibble: 6 x 2
  Gene.names frequency
  <chr>          <int>
1 _                 19
2 H2-K1              2
3 Itih4              2
4 Kng1               2
5 Sptb               2
6 Tpm3               2

> data_unique <- make_unique(data, "Gene.names", "Protein.IDs", delim = ";")

> data_unique$name %>% duplicated() %>% any()
[1] FALSE

....and would now like to generate a SummarizedExperiment by using my own experimental design. Yet, receive an error on duplicate row names and non-unique values

> experimental_design
        label condition LNPcondition replicate
1  LNP34_i2_1     LNP34     LNP34_i2         1
2  LNP34_i2_2     LNP34     LNP34_i2         2
3  LNP34_i2_3     LNP34     LNP34_i2         3
4   LNP34_n_1     LNP34      LNP34_n         1
5   LNP34_n_2     LNP34      LNP34_n         2
6   LNP34_n_3     LNP34      LNP34_n         3
7  LNP35_i2_1     LNP35     LNP35_i2         1
8  LNP35_i2_2     LNP35     LNP35_i2         2
9  LNP35_i2_3     LNP35     LNP35_i2         3
10  LNP35_n_1     LNP35      LNP35_n         1
11  LNP35_n_2     LNP35      LNP35_n         2
12  LNP35_n_3     LNP35      LNP35_n         3
13 LNP36_i2_1     LNP36     LNP36_i2         1
14 LNP36_i2_2     LNP36     LNP36_i2         2
15 LNP36_i2_3     LNP36     LNP36_i2         3
16  LNP36_n_1     LNP36      LNP36_n         1
17  LNP36_n_2     LNP36      LNP36_n         2
18  LNP36_n_3     LNP36      LNP36_n         3
19 LNP37_i2_1     LNP37     LNP37_i2         1
20 LNP37_i2_2     LNP37     LNP37_i2         2
21 LNP37_i2_3     LNP37     LNP37_i2         3
22  LNP37_n_1     LNP37      LNP37_n         1
23  LNP37_n_2     LNP37      LNP37_n         2
24  LNP37_n_3     LNP37      LNP37_n         3

> LFQ_columns <- grep("^LNP", colnames(data_unique))
> data_se <- make_se(data_unique, LFQ_columns, experimental_design)
Error in `.rowNamesDF<-`(x, value = value) : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘LNP34_1’, ‘LNP34_2’, ‘LNP34_3’, ‘LNP35_1’, ‘LNP35_2’, ‘LNP35_3’, ‘LNP36_1’, ‘LNP36_2’, ‘LNP36_3’, ‘LNP37_1’, ‘LNP37_2’, ‘LNP37_3’

I'm not sure where the error comes from (would guess that from the "condition" column of experimental_design). Though, when checking, no duplicates are found:

> any(duplicated(rownames(experimental_design)))
[1] FALSE
> any(duplicated(rownames(data_unique)))
[1] FALSE
> any(duplicated(rownames(LFQ_columns)))
[1] FALSE
> any(duplicated(colnames(experimental_design)))
[1] FALSE
> any(duplicated(colnames(data_unique)))
[1] FALSE
> any(duplicated(colnames(LFQ_columns)))

Can someone please help me on this? I have a limited knowledge in R, hence details are highly appreciated.

Sessioninfo below. Thanks!

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252 
[2] LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.0.5  DEP_1.10.0   readxl_1.3.1 readr_1.4.0 

loaded via a namespace (and not attached):
  [1] ProtGenerics_1.20.0         bitops_1.0-6               
  [3] matrixStats_0.58.0          doParallel_1.0.16          
  [5] RColorBrewer_1.1-2          GenomeInfoDb_1.24.2        
  [7] MSnbase_2.14.2              tools_4.0.2                
  [9] DT_0.17                     utf8_1.2.1                 
 [11] R6_2.5.0                    affyio_1.58.0              
 [13] tmvtnorm_1.4-10             BiocGenerics_0.34.0        
 [15] colorspace_2.0-0            GetoptLong_1.0.5           
 [17] tidyselect_1.1.0            compiler_4.0.2             
 [19] preprocessCore_1.50.0       cli_2.4.0                  
 [21] Biobase_2.48.0              DelayedArray_0.14.1        
 [23] sandwich_3.0-0              scales_1.1.1               
 [25] mvtnorm_1.1-1               affy_1.66.0                
 [27] digest_0.6.27               XVector_0.28.0             
 [29] htmltools_0.5.1.1           pkgconfig_2.0.3            
 [31] fastmap_1.1.0               limma_3.44.3               
 [33] htmlwidgets_1.5.3           rlang_0.4.10               
 [35] GlobalOptions_0.1.2         rstudioapi_0.13            
 [37] impute_1.62.0               shiny_1.6.0                
 [39] shape_1.4.5                 generics_0.1.0             
 [41] zoo_1.8-9                   mzID_1.26.0                
 [43] BiocParallel_1.22.0         RCurl_1.98-1.3             
 [45] magrittr_2.0.1              GenomeInfoDbData_1.2.3     
 [47] MALDIquant_1.19.3           Matrix_1.2-18              
 [49] Rcpp_1.0.6                  munsell_0.5.0              
 [51] S4Vectors_0.26.1            fansi_0.4.2                
 [53] imputeLCMD_2.0              lifecycle_1.0.0            
 [55] vsn_3.56.0                  MASS_7.3-51.6              
 [57] SummarizedExperiment_1.18.2 zlibbioc_1.34.0            
 [59] plyr_1.8.6                  grid_4.0.2                 
 [61] promises_1.2.0.1            parallel_4.0.2             
 [63] shinydashboard_0.7.1        crayon_1.4.1               
 [65] lattice_0.20-41             circlize_0.4.12            
 [67] hms_1.0.0                   mzR_2.22.0                 
 [69] ComplexHeatmap_2.4.3        pillar_1.5.1               
 [71] GenomicRanges_1.40.0        rjson_0.2.20               
 [73] codetools_0.2-16            stats4_4.0.2               
 [75] XML_3.99-0.6                glue_1.4.2                 
 [77] pcaMethods_1.80.0           BiocManager_1.30.12        
 [79] httpuv_1.5.5                png_0.1-7                  
 [81] vctrs_0.3.7                 foreach_1.5.1              
 [83] cellranger_1.1.0            tidyr_1.1.3                
 [85] gtable_0.3.0                purrr_0.3.4                
 [87] norm_1.0-9.5                clue_0.3-58                
 [89] assertthat_0.2.1            ggplot2_3.3.3              
 [91] xfun_0.22                   mime_0.10                  
 [93] xtable_1.8-4                later_1.1.0.1              
 [95] ncdf4_1.17                  tibble_3.1.0               
 [97] iterators_1.0.13            gmm_1.6-6                  
 [99] tinytex_0.31                IRanges_2.22.2             
[101] cluster_2.1.0               ellipsis_0.3.1

DEP • 1.7k views

ADD COMMENT • link updated 5 months ago by Anni • 0 • written 4.0 years ago by hele7 ▴ 20

0

Entering edit mode

did you get your answer? I've been using DEP several times but the error only shows up now for some reason?

ADD REPLY • link 2.1 years ago Simran • 0

0

Entering edit mode

I encountered the same problem. Problem seems to be "make_se()" function that does not use variable 'label' to bind colData to assay, but instead it creates a new id column from all combinations of variables "condition" and "replicate". This does result ID's that are not unique, if there is more than one condition variable. My solution was to create single condition variable, with all combinations of conditions as levels. To me it seems that your "LNPcondition" would allready be such variable. Just rename "conditionLNP" as "condition" and it should work. Though I would have prefered, if I could have kept the condition columns separate.

ADD REPLY • link 5 months ago Anni • 0