GOstats - subsetting termGraphs with terms from the html report and missing nodes
1
0
Entering edit mode
@matthew-thornton-5564
Last seen 10 weeks ago
USA, Los Angeles, USC

Hello! I am using GOstats to do gene set enrichment and I would like to make subset GO term graphs (graphNEL) to then import into Cytoscape. My problem is that when I select desirable nodes from the html report or a CSV file made from the summary I get an error that the nodes aren't present. The nodes could be in any of the other graphNEL made by termGraphs Maybe this is not the best way to do this. Is there a way to get the graphNEL for a group of nodes (GOBPIDs) in an easier way?

params1 <- new("GOHyperGParams", geneIds=q_selectedEntrezIds, universeGeneIds=chipEntrezUniverse, annotation="org.Rn.eg.db", ontology="BP", pvalueCutoff=hgCutoff, conditional=FALSE, testDirection="over")

hgOver1 <- hyperGTest(params1)

## Generate hmtl result files
htmlReport(hgOver1, file=paste(dt,"_MH_malevfemale1_hgOver_BP_q_v_all_over.html", sep=""))

# Get the tables out
dat <- as.data.frame(summary(hgOver1))

write.table(dat, file=paste(dt, "_MH_malevfemale1_hgOver_BP_q_v_all_over.txt", sep=""), sep="\t", col.names=T, row.names=T)

# Ok get the graph out
y1 = termGraphs(hgOver1, use.terms=FALSE, pvalue=hgCutoff)

a1 <- y1$`1`

sub_list <- read.csv("GOBP_list_subset.txt", sep="\t", header=T)

sub <- as.vector(unlist(sub_list$GOBPID))

## I've tried a number of ways to subset and the common error (if it works) is snodes missing. I can confirm that the nodes exist as results in the html report.

test <- subGraph(sub, a1)
Error in subGraph(sub, a1) : 
  'snodes' contains nodes not in graph: ‘GO:0051649’, ‘GO:0051654’, ‘GO:0051668’, ‘GO:0071294’, ‘GO:0071420’, ‘GO:0071702’, ‘GO:0071705’, ‘GO:0072384’, ‘GO:0072657’, ‘GO:0097106’, ‘GO:0097112’, ‘GO:0098693’, ‘GO:0098814’, ‘GO:0098930’, ‘GO:0099185’, ‘GO:0099538’, ‘GO:0099552’, ‘GO:0099553’, ‘GO:0140056’, ‘GO:0140238’, ‘GO:1900273’, ‘GO:1901385’, ‘GO:1902632’, ‘GO:1903421’, ‘GO:1905874’, ‘GO:1990778’, ‘GO:1903937’, ‘GO:1904071’, ‘GO:1990708’, ‘GO:0098953’, ‘GO:0098970’, ‘GO:0099628’, ‘GO:0007196’, ‘GO:0099011’, ‘GO:0099525’, ‘GO:0045162’, ‘GO:0098943’, ‘GO:0006682’, ‘GO:0019375’, ‘GO:0031632’, ‘GO:1901634’, ‘GO:1990926’, ‘GO:1990504’, ‘GO:0046684’, ‘GO:0099502’, ‘GO:0106020’, ‘GO:1902474’, ‘GO:0007158’, ‘GO:0048791’, ‘GO:2000969’, ‘GO:0014051’, ‘GO:1990709’, ‘GO:0021578’,

## OR this

testgr <- inducedTermGraph(hgOver1, id=sub)
Error in subGraph(wantedNodes, g) : 
  'snodes' contains nodes not in graph: ‘GO:0051654’, ‘GO:0071294’, ‘GO:0071420’, ‘GO:0072384’, ‘GO:0097112’, ‘GO:0098930’, ‘GO:0099185’, ‘GO:0099538’, ‘GO:0099552’, ‘GO:0099553’, ‘GO:1901385’, ‘GO:1903421’, ‘GO:1905874’, ‘GO:1903937’, ‘GO:1904071’, ‘GO:0099011’, ‘GO:0099525’, ‘GO:0006682’, ‘GO:0019375’, ‘GO:0031632’, ‘GO:1901634’, ‘GO:1990926’, ‘GO:1990504’, ‘GO:0099502’, ‘GO:0106020’, ‘GO:1902474’, ‘GO:0048791’, ‘GO:2000969’, ‘GO:1990709’, ‘GO:0046476’, ‘GO:0099541’, ‘GO:0099542’, ‘GO:0031630’, ‘GO:1901632’, ‘GO:0006681’, ‘GO:0019374’, ‘GO:0048790’, ‘GO:0061669’, ‘GO:0016082’, ‘GO:0014049’, ‘GO:0048172’, ‘GO:0098917’, ‘GO:0021516’, ‘GO:0034776’, ‘GO:0099151’, ‘GO:0031340’, ‘GO:0033604’, ‘GO:0045956’, ‘GO:0097091’

sessionInfo( )
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] Rgraphviz_2.40.0     xtable_1.8-4         genefilter_1.78.0   
 [4] GOstats_2.62.0       graph_1.74.0         Category_2.62.0     
 [7] Matrix_1.5-3         annotate_1.74.0      XML_3.99-0.12       
[10] GO.db_3.15.0         org.Rn.eg.db_3.15.0  AnnotationDbi_1.58.0
[13] IRanges_2.30.1       S4Vectors_0.34.0     Biobase_2.56.0      
[16] BiocGenerics_0.42.0 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9             compiler_4.2.1         GenomeInfoDb_1.32.4   
 [4] XVector_0.36.0         bitops_1.0-7           tools_4.2.1           
 [7] zlibbioc_1.42.0        bit_4.0.5              RSQLite_2.2.19        
[10] memoise_2.0.1          lattice_0.20-45        pkgconfig_2.0.3       
[13] png_0.1-7              rlang_1.0.6            DBI_1.1.3             
[16] cli_3.4.1              fastmap_1.1.0          GenomeInfoDbData_1.2.8
[19] httr_1.4.4             Biostrings_2.64.1      vctrs_0.5.1           
[22] bit64_4.0.5            GSEABase_1.58.0        R6_2.5.1              
[25] survival_3.4-0         RBGL_1.72.0            blob_1.2.3            
[28] splines_4.2.1          KEGGREST_1.36.3        AnnotationForge_1.38.1
[31] RCurl_1.98-1.9         cachem_1.0.6           crayon_1.5.2

Thanks!

GOstats • 1.6k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 14 hours ago
United States

It's not clear to me what a 'subset GO term graph' is, but if you are trying to make plots for significant GO terms, it's straightforward.

## fake0
> library(org.Hs.eg.db)
> library(GOstats)
> univ <- keys(org.Hs.eg.db)
> egids <- univ[sample(1:length(univ), 200)]
> p <- new("GOHyperGParams", geneIds = egids, universeGeneIds = univ, ontology = "BP", conditional = FALSE, annotation = "org.Hs.eg.db")
> hyp <- hyperGTest(p)
> top <- head(summary(hyp, categorySize = 10, pvalue = 0.05))[,1:4]
> top
      GOBPID      Pvalue OddsRatio   ExpCount
1 GO:0097094 0.001021080 49.054688 0.04761149
2 GO:1900101 0.002844854 28.013393 0.07935248
3 GO:0033137 0.003035927 27.045977 0.08199757
4 GO:0003254 0.006312986 18.226744 0.11902873
5 GO:0033135 0.006408379  8.531763 0.37824684
6 GO:0043933 0.007763014  2.617356 4.88017775
> inducedTermGraph(hyp, top[1,1])
A graphNEL graph with directed edges
Number of Nodes = 7 
Number of Edges = 6 
> plot(inducedTermGraph(hyp, top[1,1]))
> termGraphs(hyp, top$GOBPID)
$`1`
A graphNEL graph with directed edges
Number of Nodes = 1 
Number of Edges = 0 

$`2`
A graphNEL graph with directed edges
Number of Nodes = 1 
Number of Edges = 0 

$`3`
A graphNEL graph with directed edges
Number of Nodes = 2 
Number of Edges = 1 

$`4`
A graphNEL graph with directed edges
Number of Nodes = 1 
Number of Edges = 0 

$`5`
A graphNEL graph with directed edges
Number of Nodes = 1 
Number of Edges = 0
ADD COMMENT
0
Entering edit mode

Hi James! Thank you! So interestingly if you make an html report or csv from the summary it will have GO terms that you would like as nodes in a directed acyclic graph. The subset comes from the summary. when I try to make a subset graph, I get 'contains nodes not in graph error' even though the nodes were taken from the summary. Is this not possible. Is there another way to make DAG from GO IDs using a bioconductor resource (GO.db)?

ADD REPLY
0
Entering edit mode

Did the code I provided not do what you are asking?

ADD REPLY
0
Entering edit mode

Hello! that is the code I used, except I am looking at Rat data. Its hard without real data. So there are GOBPID that are in the html reports and summary that are not accessible in the graphNEL. There can be a number of subgraphs 1 2 and so on. Which there isn't really a good explanation for them, but in the one with the largest numbers of nodes you would expect to have nodes with the largest odds ratio, but they are not there AND they are not in any of the subgraphs. Basically, I was wondering if there was a way to 'hack' it to have all of the nodes for GO BP in one graphNEL to then subset however. I really appreciate your help.

ADD REPLY
0
Entering edit mode

I still have no idea what you are talking about. It works identically for rat.

> library(GOstats)
> library(org.Rn.eg.db)

> univ <- keys(org.Rn.eg.db)
> set.seed(0xabeef)
> egids <- univ[sample(1:length(univ), 200)]
> p <- new("GOHyperGParams", geneIds = egids, universeGeneIds = univ, ontology = "BP", conditional = FALSE, annotation = "org.Rn.eg.db")
> hyp <- hyperGTest(p)
> top <- head(summary(hyp, categorySize = 10, pvalue = 0.05))[,1:6]
> top
      GOBPID       Pvalue OddsRatio    ExpCount Count  Size
1 GO:0001977 5.593472e-05 49.904541  0.07624113     3    16
2 GO:0003071 3.489971e-04 24.934198  0.13818706     3    29
3 GO:0008152 7.700477e-04  2.235504 52.70168440    67 11060
4 GO:2000785 1.048586e-03 16.610751  0.20013298     3    42
5 GO:0035437 1.435987e-03 42.742857  0.05718085     2    12
6 GO:0044088 1.455908e-03 14.719058  0.22395833     3    47
> inducedTermGraph(hyp,top[1,1])
A graphNEL graph with directed edges
Number of Nodes = 7 
Number of Edges = 6 
> termGraphs(hyp, top[,1])
$`1`
A graphNEL graph with directed edges
Number of Nodes = 2 
Number of Edges = 1 

$`2`
A graphNEL graph with directed edges
Number of Nodes = 1 
Number of Edges = 0 

$`3`
A graphNEL graph with directed edges
Number of Nodes = 2 
Number of Edges = 1 

$`4`
A graphNEL graph with directed edges
Number of Nodes = 1 
Number of Edges = 0 

>

Those are either the graphNEL for the first GO term, or the graphNEL for all of the GO terms in the 'top' object.

ADD REPLY
0
Entering edit mode

Hi James! I apologize for not being clear. Is there a way to get all of the potential nodes into a single graphNEL which can subsequently be subset? In your example above it splits the top 6 into a number of graphs.

ADD REPLY
0
Entering edit mode

OK, maybe I get what you want. I think you want all of the nodes from a summary object in one graph, right?

> gos <- top[,1]
> gos
[1] "GO:0001977" "GO:0003071" "GO:0008152" "GO:2000785" "GO:0035437"
[6] "GO:0044088"
> dag <- reverseEdgeDirections(goDag(hyp))
> wantedNodes <- c(gos, unlist(edges(goDag(hyp))[gos], use.names = FALSE), unlist(edges(dag)[gos], use.names = FALSE))
> wantedNodes <- unique(wantedNodes)
> gsub <- subGraph(wantedNodes, dag)
> plot(gsub)
ADD REPLY
0
Entering edit mode

The above assumes you have already run the code based on the Rat orgDb that I already supplied.

ADD REPLY
0
Entering edit mode

Hi James! yup that was it. thank you!

ADD REPLY

Login before adding your answer.

Traffic: 863 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6