GSEABase: how to extract shortDescription from GeneSetCollection object
Guido Hooiveld
Last seen 2 hours ago
Wageningen University, Wageningen, the …

Hi, I am using the package GSEAbase to import a GMT file (file with gene sets). This is working fine, but I don't understand how I can extract the 'shortDescription' of the gene sets from the GeneSetCollection-object. I assumed this could be done using the function shortDescription, but apparently this is not the case. Extracting the [ugly] setName, geneIds, or setIdentifier go fine.

Could someone please point me to the right direction? Thanks, G

Note: I know the first gene set (mmu02020) happens to be 'empty'.

    > library(GSEABase)
    > GeneSets <- getGmt("my.genesets.gmt")
    > class(GeneSets)
    [1] "GeneSetCollection"
    [1] "GSEABase"
    > str(GeneSets)
    Formal class 'GeneSetCollection' [package "GSEABase"] with 1 slot
      ..@ .Data:List of 47
      .. ..$ :Formal class 'GeneSet' [package "GSEABase"] with 13 slots
      .. .. .. ..@ geneIdType      :Formal class 'NullIdentifier' [package "GSEABase"] with 2 slots
      .. .. .. .. .. ..@ type      : chr "Null"
      .. .. .. .. .. ..@ annotation: chr ""
      .. .. .. ..@ geneIds         : chr(0) 
      .. .. .. ..@ setName         : chr "mmu02020.Two.component.system.KEGG"
      .. .. .. ..@ setIdentifier   : chr "D0147357:13308:Wed Apr 24 17:16:14 2019:2"
      .. .. .. ..@ shortDescription: chr "KEGG: Two-component system"
      .. .. .. ..@ longDescription : chr ""
      .. .. .. ..@ organism        : chr ""
      .. .. .. ..@ pubMedIds       : chr(0) 
      .. .. .. ..@ urls            : chr(0) 
      .. .. .. ..@ contributor     : chr(0) 
      .. .. .. ..@ version         :Formal class 'Versions' [package "Biobase"] with 1 slot
      .. .. .. .. .. ..@ .Data:List of 1
      .. .. .. .. .. .. ..$ : int [1:3] 0 0 1
      .. .. .. ..@ creationDate    : chr(0) 
      .. .. .. ..@ collectionType  :Formal class 'NullCollection' [package "GSEABase"] with 1 slot
      .. .. .. .. .. ..@ type: chr "Null"
      .. ..$ :Formal class 'GeneSet' [package "GSEABase"] with 13 slots
      .. .. .. ..@ geneIdType      :Formal class 'NullIdentifier' [package "GSEABase"] with 2 slots
      .. .. .. .. .. ..@ type      : chr "Null"
      .. .. .. .. .. ..@ annotation: chr ""
      .. .. .. ..@ geneIds         : chr [1:294] "Gm5741" "Mapkapk3" "Arrb1" "Braf" ...
      .. .. .. ..@ setName         : chr "mmu04010.MAPK.signaling.pathway.KEGG"
      .. .. .. ..@ setIdentifier   : chr "D0147357:13308:Wed Apr 24 17:16:14 2019:3"
      .. .. .. ..@ shortDescription: chr "KEGG: MAPK signaling pathway"
      .. .. .. ..@ longDescription : chr ""
    > setName(GeneSets[[2]])
    [1] "mmu04010.MAPK.signaling.pathway.KEGG"
    > shortDescription(GeneSets[[2]])
    Error in shortDescription(GeneSets[[1]]) : 
      could not find function "shortDescription"
    > setIdentifier(GeneSets[[2]])
    [1] "D0147357:13308:Wed Apr 24 17:16:14 2019:3"
    > head(geneIds(GeneSets[[2]]))
    [1] "Gm5741"   "Mapkapk3" "Arrb1"    "Braf"     "Rap1a"    "Raf1"    

    > sessionInfo()
    R version 3.5.1 Patched (2018-11-24 r75665)
    Platform: x86_64-w64-mingw32/x64 (64-bit)
    Running under: Windows 7 x64 (build 7601) Service Pack 1

    Matrix products: default

    [1] LC_COLLATE=English_United States.1252 
    [2] LC_CTYPE=English_United States.1252   
    [3] LC_MONETARY=English_United States.1252
    [4] LC_NUMERIC=C                          
    [5] LC_TIME=English_United States.1252    

    attached base packages:
    [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
    [8] methods   base     

    other attached packages:
    [1] GSEABase_1.44.0      graph_1.60.0         annotate_1.60.1     
    [4] XML_3.98-1.19        AnnotationDbi_1.44.0 IRanges_2.16.0      
    [7] S4Vectors_0.20.1     Biobase_2.42.0       BiocGenerics_0.28.0 

    loaded via a namespace (and not attached):
     [1] Rcpp_1.0.1      digest_0.6.18   bitops_1.0-6    xtable_1.8-4   
     [5] DBI_1.0.0       RSQLite_2.1.1   blob_1.1.1      bit64_0.9-7    
     [9] RCurl_1.95-4.12 bit_1.1-14      compiler_3.5.1  memoise_1.1.0  
GSEABase
Last seen 5 days ago
United States

Instead of looking at the structure of the object, I looked at the methods available (not completely foolproof...)

methods(class = class(gss[[1]]))

and then found


I guess I was hoping that ?"description,GeneSet-method" would actually be helpful, but I guess the author of the package didn't do a good enough job on documentation :(. The vignette lead me to details(gss[[1]]), and there I see some further hints

> details(gss[[1]])
setName: chr5q23
geneIds: ZNF474, CCDC100, ..., LOC728586 (total: 86)
geneIdType: Symbol
collectionType: Broad
  bcCategory: c1 (Positional)
  bcSubCategory: NA
setIdentifier: c1:101
description: Genes in cytogenetic band chr5q23
organism: Human
urls: file://Users/ma38727/Library/R/3.6/Bioc/3.9/GSEABase/extdata/Broad.xml
contributor: Broad Institute
setVersion: 0.0.1

where each of the keys corresponds to a function.

Thanks Martin for your hints. Using the function description indeed does the trick!

> description(GeneSets[[2]])
[1] "KEGG: MAPK signaling pathway"

