Errors when plotting data using heatmap_GO (GOexpress)
2
0
Entering edit mode
grolo • 0
@2a04504a
Last seen 2.1 years ago
Australia

I'm analysing a couple of RNA-Seq datasets using GOexpress. I can successfully run the analysis and peruse the results within R, however plotting the heatmap returns the following error:

heatmap_GO(go_id = "GO:0006796", result = bp, eSet=expressionSet,gene_names=FALSE)

Error in exprs(eSet)[gene_ids, ] : invalid subscript type 'list'
In addition: Warning message:
In brewer.pal(n = length(unique(pData(eSet)[, f])), name = row.col.palette) :
  minimal value for n is 3, returning requested palette with 3 different levels

The additional warning is a similar problem to this post and I've run the checks suggested by Kevin:

> list_genes(go_id="GO:0006796", result= bp, data.only=TRUE)
# A tibble: 14 × 0

and

colnames(bp$genes)
[1] "Score"              "Rank"               "external_gene_name" "description"

However setting the gene_names parameter to false (as suggested in Kevin's response) did not, in my case, fix the issue.

I am not sure how to interpret the first warning:

Error in exprs(eSet)[gene_ids, ] : invalid subscript type 'list'

Have I somehow stuffed up my ExpressionSet object?

str(expressionSet)
Formal class 'ExpressionSet' [package "Biobase"] with 7 slots
  ..@ experimentData   :Formal class 'MIAME' [package "Biobase"] with 13 slots
  .. .. ..@ name             : chr ""
  .. .. ..@ lab              : chr ""
  .. .. ..@ contact          : chr ""
  .. .. ..@ title            : chr ""
  .. .. ..@ abstract         : chr ""
  .. .. ..@ url              : chr ""
  .. .. ..@ pubMedIds        : chr ""
  .. .. ..@ samples          : list()
  .. .. ..@ hybridizations   : list()
  .. .. ..@ normControls     : list()
  .. .. ..@ preprocessing    : list()
  .. .. ..@ other            : list()
  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. .. .. ..@ .Data:List of 2
  .. .. .. .. .. ..$ : int [1:3] 1 0 0
  .. .. .. .. .. ..$ : int [1:3] 1 1 0
  .. .. .. .. ..$ names: chr [1:2] "MIAxE" "MIAME"
  ..@ assayData        :<environment: 0x7fc3ffd9f500> 
  ..@ phenoData        :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
  .. .. ..@ varMetadata      :'data.frame': 1 obs. of  1 variable:
  .. .. .. ..$ labelDescription: chr "Anti-pathogen treatment"
  .. .. ..@ data             :'data.frame': 10 obs. of  1 variable:
  .. .. .. ..$ condition: Factor w/ 2 levels "A","C": 1 1 1 1 1 2 2 2 2 2
  .. .. ..@ dimLabels        : chr [1:2] "sampleNames" "sampleColumns"
  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. .. .. ..@ .Data:List of 1
  .. .. .. .. .. ..$ : int [1:3] 1 1 0
  .. .. .. .. ..$ names: chr "AnnotatedDataFrame"
  ..@ featureData      :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
  .. .. ..@ varMetadata      :'data.frame': 0 obs. of  1 variable:
  .. .. .. ..$ labelDescription: chr(0) 
  .. .. ..@ data             :'data.frame': 19981 obs. of  0 variables
  .. .. ..@ dimLabels        : chr [1:2] "featureNames" "featureColumns"
  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. .. .. ..@ .Data:List of 1
  .. .. .. .. .. ..$ : int [1:3] 1 1 0
  .. .. .. .. ..$ names: chr "AnnotatedDataFrame"
  ..@ annotation       : chr(0) 
  ..@ protocolData     :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
  .. .. ..@ varMetadata      :'data.frame': 0 obs. of  1 variable:
  .. .. .. ..$ labelDescription: chr(0) 
  .. .. ..@ data             :'data.frame': 10 obs. of  0 variables
  .. .. ..@ dimLabels        : chr [1:2] "sampleNames" "sampleColumns"
  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. .. .. ..@ .Data:List of 1
  .. .. .. .. .. ..$ : int [1:3] 1 1 0
  .. .. .. .. ..$ names: chr "AnnotatedDataFrame"
  ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. ..@ .Data:List of 4
  .. .. .. ..$ : int [1:3] 4 1 1
  .. .. .. ..$ : int [1:3] 2 54 0
  .. .. .. ..$ : int [1:3] 1 3 0
  .. .. .. ..$ : int [1:3] 1 0 0
  .. .. ..$ names: chr [1:4] "R" "Biobase" "eSet" "ExpressionSet"

Any advice gratefully received!

Cheers

sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.5.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] GOexpress_1.28.0            BiocManager_1.30.18         biomaRt_2.50.3              readr_2.1.3                
 [5] Glimma_2.4.0                ViSEAGO_1.8.0               biclust_2.0.3               lattice_0.20-45            
 [9] colorspace_2.0-3            MASS_7.3-58.1               tidyr_1.2.1                 topGO_2.46.0               
[13] SparseM_1.81                GO.db_3.14.0                AnnotationDbi_1.56.2        graph_1.72.0               
[17] tximport_1.22.0             DESeq2_1.34.0               SummarizedExperiment_1.24.0 Biobase_2.54.0             
[21] MatrixGenerics_1.6.0        matrixStats_0.62.0          GenomicRanges_1.46.1        GenomeInfoDb_1.30.1        
[25] IRanges_2.28.0              S4Vectors_0.32.4            BiocGenerics_0.40.0        

loaded via a namespace (and not attached):
  [1] fastmatch_1.1-3         BiocFileCache_2.2.1     plyr_1.8.7              igraph_1.3.5           
  [5] lazyeval_0.2.2          splines_4.1.2           BiocParallel_1.28.3     ggplot2_3.3.6          
  [9] digest_0.6.30           foreach_1.5.2           htmltools_0.5.3         GOSemSim_2.20.0        
 [13] viridis_0.6.2           fansi_1.0.3             magrittr_2.0.3          memoise_2.0.1          
 [17] tzdb_0.3.0              limma_3.50.3            Biostrings_2.62.0       annotate_1.72.0        
 [21] vroom_1.6.0             R.utils_2.12.0          bdsmatrix_1.3-6         prettyunits_1.1.1      
 [25] blob_1.2.3              rappdirs_0.3.3          apeglm_1.16.0           dplyr_1.0.10           
 [29] crayon_1.5.2            RCurl_1.98-1.9          jsonlite_1.8.3          genefilter_1.76.0      
 [33] survival_3.4-0          iterators_1.0.14        glue_1.6.2              registry_0.5-1         
 [37] gtable_0.3.1            zlibbioc_1.40.0         XVector_0.34.0          webshot_0.5.4          
 [41] UpSetR_1.4.0            DelayedArray_0.20.0     additivityTests_1.1-4.1 scales_1.2.1           
 [45] mvtnorm_1.1-3           DBI_1.1.3               edgeR_3.36.0            Rcpp_1.0.9             
 [49] viridisLite_0.4.1       xtable_1.8-4            progress_1.2.2          emdbook_1.3.12         
 [53] bit_4.0.4               DT_0.26                 AnnotationForge_1.36.0  htmlwidgets_1.5.4      
 [57] httr_1.4.4              DiagrammeR_1.0.9        fgsea_1.20.0            gplots_3.1.3           
 [61] RColorBrewer_1.1-3      ellipsis_0.3.2          modeltools_0.2-23       pkgconfig_2.0.3        
 [65] XML_3.99-0.11           R.methodsS3_1.8.2       dbplyr_2.2.1            locfit_1.5-9.6         
 [69] utf8_1.2.2              dynamicTreeCut_1.63-1   tidyselect_1.2.0        rlang_1.0.6            
 [73] munsell_0.5.0           tools_4.1.2             visNetwork_2.1.2        cachem_1.0.6           
 [77] cli_3.4.1               generics_0.1.3          RSQLite_2.2.18          stringr_1.4.1          
 [81] fastmap_1.1.0           yaml_2.3.6              heatmaply_1.4.0         bit64_4.0.5            
 [85] caTools_1.18.2          randomForest_4.7-1.1    purrr_0.3.5             KEGGREST_1.34.0        
 [89] dendextend_1.16.0       R.oo_1.25.0             xml2_1.3.3              flexclust_1.4-1        
 [93] compiler_4.1.2          rstudioapi_0.14         plotly_4.10.0           filelock_1.0.2         
 [97] curl_4.3.3              png_0.1-7               tibble_3.1.8            geneplotter_1.72.0     
[101] stringi_1.7.8           Matrix_1.5-1            vctrs_0.5.0             pillar_1.8.1           
[105] lifecycle_1.0.3         data.table_1.14.4       bitops_1.0-7            seriation_1.4.0        
[109] R6_2.5.1                TSP_1.2-1               KernSmooth_2.23-20      gridExtra_2.3          
[113] codetools_0.2-18        gtools_3.9.3            assertthat_0.2.1        withr_2.5.0            
[117] GenomeInfoDbData_1.2.7  parallel_4.1.2          hms_1.1.2               coda_0.19-4            
[121] class_7.3-20            bbmle_1.0.25            numDeriv_2016.8-1.1
GOexpress • 1.4k views
ADD COMMENT
0
Entering edit mode
> list_genes(go_id="GO:0006796", result= bp, data.only=TRUE)
# A tibble: 14 × 0

It indicates that none of the genes participating in the GO term are present in your dataset, so this is normal the heatmap cannot be drawn because there is no gene to plot.

ADD REPLY
0
Entering edit mode

Thanks for the response. I believe that should be the top ranked filtered GO term, so I suppose I've got bigger problems than just plotting :/.

ADD REPLY
2
Entering edit mode
kevin.rue ▴ 350
@kevinrue-6757
Last seen 6 months ago
University of Oxford

Looking at ?list_genes it says

Given a Gene Ontology (GO) identifier represented in the dataset, returns a character vector listing the feature identifiers annotated to it.

Emphasis on _character vector_. You're getting a tibble, which under the hood is a list. Hence the error message:

Error in exprs(eSet)[gene_ids, ] : invalid subscript type 'list'

I'm somewhat confused as to why you get a tibble. I can only assume that you provided inputs as tibbles to the GO_analyse function. GOexpress does not check the type of inputs, and I wrote it with data.frame in mind, so I would not be surprised if it keeps input tibbles as they are, which likely works for several steps in the workflow, but ultimately causes issues with the different behaviour of tibbles and data.frames. In particular, I am thinking about indexing, where tibble[, 1] returns a tibble (i.e., does not drop dimensions), while data.frame[, 1] returns a vector (i.e., drops dimensions).

Bottom line being: before I add a fix/failsafe, can i instead ask you to convert you input data to base R classes where relevant (in particular data.frame instead of tibble), and run the analysis again? I have a feeling that might be enough to put you back on track.

As to why the genes seems to disappear, I'm not entirely sure yet. Might be related to tibble behaviour, but it's hard to predict without your data (or reproducible example) in my hands.

PS: GOexpress has been long in need of a complete rewrite to benefit of all the Bioconductor and package development best practices that I've learned over the years, but it's just such a huge task compared to my little time available that I can never find the contiguous span of time to do it properly. Sorry about that :/

PS2: OK, on top of the tibble issue, I think I guessed what's wrong. When you write "top ranked filtered GO term" make sure that you filter subset(results$GO, data_count > 1), because you want at least 2 rows to make a heatmap (the heatmap function crashes for a single row). I just ran into it while using GOexpress naively on the AlvMac example.

ADD COMMENT
0
Entering edit mode

can i instead ask you to convert you input data to base R classes where relevant (in particular data.frame instead of tibble), and run the analysis again?

Ok, that was my error. I was loading GO_genes and all_go from Rda files - they were importing as tibbles apparently. I converted both to data.frame and heatmap_go works perfectly :).

Many thanks for your help!

ADD REPLY
0
Entering edit mode

Awesome! Can I ask you to mark my answer as accepted then? Just in case others come to this post later, it will help them find the useful answer faster. Thanks!

ADD REPLY
0
Entering edit mode
kevin.rue ▴ 350
@kevinrue-6757
Last seen 6 months ago
University of Oxford

To be honest, I'm confused by the error message:

Error in exprs(eSet)[gene_ids, ] : invalid subscript type 'list'

Sounds like gene_ids is a list when it should be a character.

That seems more like an internal issue than anything you could control in the code. I'd have to check that myself, ideally on a reproducible example that fails for both you and me, but there's a chicken and egg issue here as I can't make such an example without knowing the source of the issue for you.

ADD COMMENT
0
Entering edit mode

If you have the time, I could send you the data?

ADD REPLY
0
Entering edit mode

Before we go down that road, please check my other answer below.

ADD REPLY

Login before adding your answer.

Traffic: 713 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6