TPP and NPARC: A problem about the conversion of data frame.
yangwanqi
Last seen 3.9 years ago

# tppData <- readRDS("../data/tppData.Rds")

sessionInfo( )

Please help us to know how to convert the TPP result to a "tidy format".

A first comment upfront: It would have been hard for me to understand your question without the email you've sent before. To clarify: the problem is concerning the conversion of the ExpressionSet structure obtained after data import of a "Thermal proteome profiling" experiment using the TPP package and then converting this to a long format data frame which the NPARC package needs as input. (

Secondly, it would help if you also post the out put appearing in your console upon typing sessionInfo().

Assuming you have imported data using the functions from the TPP package similar to:

trData <- tpptrImport(configTable = hdacTR_config, data = hdacTR_data)

you can convert the obtained ExpressionSet into a long format data frame by using the Bioconductor package biobroom:

trTidyData <- bind_rows(lapply(names(trData), function(eset_names){
   biobroom::tidy.ExpressionSet(trData[[eset_names]], addPheno = TRUE) %>% 
      mutate(dataset = eset_names)

This results in the following data frame:


# # A tibble: 20,340 x 7
# gene   sample      label temperature normCoeff value dataset  
# <chr>  <chr>       <chr>       <dbl> <lgl>     <dbl> <chr>    
#     1 AAK1   rel_fc_131L 131L           37 NA            1 Vehicle_1
# 2 AAMDC  rel_fc_131L 131L           37 NA            1 Vehicle_1
# 3 ACACA  rel_fc_131L 131L           37 NA            1 Vehicle_1
# 4 ACAP2  rel_fc_131L 131L           37 NA            1 Vehicle_1
# 5 ACBD6  rel_fc_131L 131L           37 NA            1 Vehicle_1
# 6 ACO2   rel_fc_131L 131L           37 NA            1 Vehicle_1
# 7 ACTR1B rel_fc_131L 131L           37 NA            1 Vehicle_1
# 8 ADI1   rel_fc_131L 131L           37 NA            1 Vehicle_1
# 9 AIMP1  rel_fc_131L 131L           37 NA            1 Vehicle_1
# 10 AIMP2  rel_fc_131L 131L           37 NA            1 Vehicle_1
# # … with 20,330 more rows

To adapt the column names to match the ones from the NPARC example:


## A tibble: 307,080 x 7
#   dataset uniqueID relAbundance temperature compoundConcent… replicate
#   <chr>   <chr>           <dbl>       <dbl>            <dbl>     <int>
# 1 Stauro… 15 KDA …        1.00           40               20         1
# 2 Stauro… 15 KDA …        1.39           43               20         1
# 3 Stauro… 15 KDA …        0.987          46               20         1
# 4 Stauro… 15 KDA …        1.33           49               20         1
# 5 Stauro… 15 KDA …        0.959          52               20         1
# 6 Stauro… 15 KDA …        0.789          55               20         1
# 7 Stauro… 15 KDA …        0.807          58               20         1
# 8 Stauro… 15 KDA …        1.27           61               20         1
# 9 Stauro… 15 KDA …        0.688          64               20         1
# 10 Stauro… 15 KDA …        0.655          67               20         1
## … with 307,070 more rows, and 1 more variable: uniquePeptideMatches <dbl>

you can now use dplyr:

trTidyData %>% dplyr::select(dataset, uniqueID = gene, relAbundance = value, temperature) # and so on

# # A tibble: 20,340 x 4
# dataset   uniqueID relAbundance temperature
# <chr>     <chr>           <dbl>       <dbl>
#     1 Vehicle_1 AAK1                1          37
# 2 Vehicle_1 AAMDC               1          37
# 3 Vehicle_1 ACACA               1          37
# 4 Vehicle_1 ACAP2               1          37
# 5 Vehicle_1 ACBD6               1          37
# 6 Vehicle_1 ACO2                1          37
# 7 Vehicle_1 ACTR1B              1          37
# 8 Vehicle_1 ADI1                1          37
# 9 Vehicle_1 AIMP1               1          37
# 10 Vehicle_1 AIMP2               1          37
# # … with 20,330 more rows

Based on this data frame you should be able to perform the NPARC analysis as described in the vignette.


#R version 4.0.0 Patched (2020-05-04 r78358)
#Platform: x86_64-apple-darwin17.0 (64-bit)
#Running under: macOS Mojave 10.14.6
#Matrix products: default
#BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
#LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
#[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#attached base packages:
#[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     
#other attached packages:
#[1] TPP_3.17.0          tidyr_1.1.0         magrittr_1.5        dplyr_1.0.0        
#[5] Biobase_2.48.0      BiocGenerics_0.34.0 NPARC_1.1.1        
#loaded via a namespace (and not attached):
# [1] pkgload_1.1.0        VGAM_1.1-3           splines_4.0.0       
# [4] foreach_1.5.0        assertthat_0.2.1     stats4_4.0.0        
# [7] nls2_0.2             cellranger_1.1.0     yaml_2.2.1          
# [10] remotes_2.1.1        sessioninfo_1.1.1    pillar_1.4.4        
# [13] backports_1.1.7      lattice_0.20-41      glue_1.4.1          
# [16] limma_3.44.1         digest_0.6.25        RColorBrewer_1.1-2  
# [19] colorspace_1.4-1     htmltools_0.5.0      plyr_1.8.6          
# [22] pkgconfig_2.0.3      devtools_2.3.0       broom_0.5.6         
# [25] purrr_0.3.4          scales_1.1.1         processx_3.4.2      
# [28] VennDiagram_1.6.20   openxlsx_4.1.5       BiocParallel_1.22.0 
# [31] tibble_3.0.1         generics_0.0.2       ggplot2_3.3.2       
# [34] usethis_1.6.1        ellipsis_0.3.1       withr_2.2.0         
# [37] cli_2.0.2            crayon_1.3.4         readxl_1.3.1        
# [40] evaluate_0.14        memoise_1.1.0        ps_1.3.3            
# [43] fs_1.4.1             fansi_0.4.1          doParallel_1.0.15   
# [46] nlme_3.1-148         MASS_7.3-51.6        pkgbuild_1.0.8      
# [49] tools_4.0.0          data.table_1.12.8    prettyunits_1.1.1   
# [52] formatR_1.7          lifecycle_0.2.0      stringr_1.4.0       
# [55] munsell_0.5.0        zip_2.0.4            lambda.r_1.2.4      
# [58] callr_3.4.3          compiler_4.0.0       rlang_0.4.6         
# [61] RCurl_1.98-1.2       futile.logger_1.4.3  grid_4.0.0          
# [64] iterators_1.0.12     rstudioapi_0.11      bitops_1.0-6        
# [67] rmarkdown_2.2        testthat_2.3.2       gtable_0.3.0        
# [70] codetools_0.2-16     reshape2_1.4.4       R6_2.4.1            
# [73] gridExtra_2.3        knitr_1.28           utf8_1.1.4          
# [76] rprojroot_1.3-2      futile.options_1.0.1 desc_1.2.0          
# [79] stringi_1.4.6        Rcpp_1.0.4.6         biobroom_1.20.0     
# [82] vctrs_0.3.0          tidyselect_1.1.0     xfun_0.14

