ReactomePA error graph
1
0
Entering edit mode
camillab. ▴ 10
@camillab-23642
Last seen 15 months ago
London

Hi,

I am trying to perform GO analysis with reacotmePA package and I wanted to see the GO terms for both up (log2FC>1) and down regulated genes (log2Fc<= -1). I am able to get results for the downregulated genes but not for the upregulated genes (FC >2) and I don't understand why since in my dataset I have upregulated genes and when I perform enrich pathway analysis I get a list of GO terms. where is the error? I have also removed all rows where there was NA.

my dataset:

# A tibble: 5 x 4
  ENSEMBL          ENTREZID log2FC    FC
  <chr>               <dbl>  <dbl> <dbl>
1 ENSG00000265991 100616231  11.5  2852.
2 ENSG00000206979     26787  11.1  2226.
3 ENSG00000229686     26793  10.7  1635.
4 ENSG00000265706 109617012   9.65  801.
5 ENSG00000202515     56662   9.51  731.

My code:

#load packages
library(org.Hs.eg.db) #human genome
library(DOSE)
library(ReactomePA) # to perform enrichment analysis
library(enrichplot) # to plot the results
library(UpSetR)

#tidy the datasets
genes <- f[[4]] #numeric vector #selected fc
names(genes) <- f[[2]] #named vector
geneLIST <- sort(genes, decreasing = T) #decreasing order
head(geneLIST)

de <- names(geneLIST)[abs(geneLIST) > 2]
x <- enrichPathway(gene=de,pvalueCutoff=0.05, readable=T)
head(as.data.frame(x))
result <- x@result 
barplot(x, showCategory=10, title = " ") #plot

and I got the results but not the graph. when I do head(as.data.frame(x)) I got this (that would explain why I do not have a graph):

[1] ID          Description GeneRatio   BgRatio    
[5] pvalue      p.adjust    qvalue      geneID     
[9] Count      
<0 rows> (or 0-length row.names)

BUT if I export the results in excel file I have a list of GO terms:

ID  Description GeneRatio   BgRatio pvalue  p.adjust    qvalue  geneID  Count
R-HSA-211859    R-HSA-211859    Biological oxidations   13/234  222/10654   0.001229205 0.626327002 0.626327002 MGST3/CYP26A1/CYP4F3/EPHX1/FMO1/ADH6/SLC26A2/SULT1A1/MGST1/MGST2/MARC1/ALDH3A1/ACSM2A   13

Why? thank you for your help!

the sessionInfo() is

R version 4.0.0 (2020-04-24)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)
locale:
[1] LC_COLLATE=English_United Kingdom.1252 
[2] LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] parallel  stats4    stats     graphics 
[5] grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] UpSetR_1.4.0         enrichplot_1.8.1    
 [3] ReactomePA_1.32.0    DOSE_3.14.0         
 [5] org.Hs.eg.db_3.11.4  AnnotationDbi_1.50.0
 [7] IRanges_2.22.2       S4Vectors_0.26.1    
 [9] Biobase_2.48.0       BiocGenerics_0.34.0 
[11] readxl_1.3.1        

loaded via a namespace (and not attached):
 [1] bit64_0.9-7         RColorBrewer_1.1-2 
 [3] progress_1.2.2      httr_1.4.1         
 [5] backports_1.1.7     tools_4.0.0        
 [7] R6_2.4.1            DBI_1.1.0          
 [9] colorspace_1.4-1    graphite_1.34.0    
[11] tidyselect_1.1.0    gridExtra_2.3      
[13] prettyunits_1.1.1   bit_1.1-15.2       
[15] compiler_4.0.0      graph_1.66.0       
[17] scatterpie_0.1.4    xml2_1.3.2         
[19] triebeard_0.3.0     checkmate_2.0.0    
[21] scales_1.1.1        ggridges_0.5.2     
[23] rappdirs_0.3.1      stringr_1.4.0      
[25] digest_0.6.25       pkgconfig_2.0.3    
[27] rlang_0.4.6         rstudioapi_0.11    
[29] RSQLite_2.2.0       gridGraphics_0.5-0 
[31] farver_2.0.3        generics_0.0.2     
[33] jsonlite_1.6.1      BiocParallel_1.22.0
[35] GOSemSim_2.14.0     dplyr_1.0.0        
[37] magrittr_1.5        ggplotify_0.0.5    
[39] GO.db_3.11.4        Matrix_1.2-18      
[41] Rcpp_1.0.4.6        munsell_0.5.0      
[43] viridis_0.5.1       lifecycle_0.2.0    
[45] stringi_1.4.6       ggraph_2.0.3       
[47] MASS_7.3-51.5       plyr_1.8.6         
[49] qvalue_2.20.0       grid_4.0.0         
[51] blob_1.2.1          ggrepel_0.8.2      
[53] DO.db_2.9           crayon_1.3.4       
[55] lattice_0.20-41     graphlayouts_0.7.0 
[57] cowplot_1.0.0       splines_4.0.0      
[59] hms_0.5.3           pillar_1.4.4       
[61] fgsea_1.14.0        igraph_1.2.5       
[63] reshape2_1.4.4      fastmatch_1.1-0    
[65] glue_1.4.1          data.table_1.12.8  
[67] BiocManager_1.30.10 vctrs_0.3.1        
[69] tweenr_1.0.1        urltools_1.7.3     
[71] cellranger_1.1.0    gtable_0.3.0       
[73] purrr_0.3.4         polyclip_1.10-0    
[75] tidyr_1.1.0         ggplot2_3.3.1      
[77] xfun_0.14           ggforce_0.3.1      
[79] europepmc_0.4       tidygraph_1.2.0    
[81] reactome.db_1.70.0  viridisLite_0.3.0  
[83] tibble_3.0.1        rvcheck_0.1.8      
[85] tinytex_0.23        memoise_1.1.0      
[87] ellipsis_0.3.1

Camilla

r rnaseq bulk GO ReactomePA • 2.0k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode
Kevin Blighe ★ 4.0k
@kevin
Last seen 13 days ago
Republic of Ireland

Can you confirm the contents of de?

ADD COMMENT
0
Entering edit mode

Yes, and it works if I run the same code with all up and down together or if I have only the down regulated together but not with the up. it would be ok if when I export the file there was no results at all but I don't understand why I got GO terms but not the graph

ADD REPLY
0
Entering edit mode

Sure, but what are the contents of de for up-regulated? Please show as much output from each step as possible (for up-regulated)

ADD REPLY
0
Entering edit mode

sure. so I uploaded only upregulated genes (log2fc>= 1 and p value <= 0.05) and the dataset looks like:

# A tibble: 3 x 5
  ENSEMBL         ENTREZID  log2FC p_value    FC
  <chr>           <chr>      <dbl>   <dbl> <dbl>
1 ENSG00000236257 100129866   1.29  0.0136  2.44
2 ENSG00000267278 100133991   1.13  0.0433  2.19
3 ENSG00000236695 100287191   1.78  0.0206  3.43

then:

genes <- d[[5]] #numeric vector #selected fc
    head(genes)
     [1] 2.444807 2.185109 3.430162 4.061950 2.766760
[6] 2.157705

names(genes) <- d[[2]] #named vector
head(genes)
100129866 100133991 100287191 100289099 
 2.444807  2.185109  3.430162  4.061950 
100418943 100421451 
 2.766760  2.157705

geneLIST <- sort(genes, decreasing = T) #decreasing order
head(geneLIST)
     <NA>      <NA>      <NA>    150677 
90.269867 57.499633 22.920100 11.773691 

de <- names(geneLIST)
head(de)
[1] NA       NA       NA       "150677" NA      
[6] "11067" 
x <- enrichPathway(gene=de,pvalueCutoff=0.05, readable=T)
head(as.data.frame(x))
[1] ID          Description GeneRatio  
[4] BgRatio     pvalue      p.adjust   
[7] qvalue      geneID      Count      
<0 rows> (or 0-length row.names)
     <NA>     11067 
 8.953378  8.470331 

result <- x@result #to see the result
head(result,2)
                        ID
R-HSA-196854   R-HSA-196854
R-HSA-5635838 R-HSA-5635838
                                       Description
R-HSA-196854  Metabolism of vitamins and cofactors
R-HSA-5635838                    Activation of SMO
              GeneRatio   BgRatio      pvalue
R-HSA-196854       5/58 189/10654 0.003567628
R-HSA-5635838      2/58  18/10654 0.004213869
               p.adjust    qvalue
R-HSA-196854  0.4646278 0.4646278
R-HSA-5635838 0.4646278 0.4646278
                                      geneID
R-HSA-196854  AKR1B10/AKR1C3/IDH1/SLC2A1/BTD
R-HSA-5635838                     GAS1/PTCH1
              Count
R-HSA-196854      5
R-HSA-5635838     2
ADD REPLY
0
Entering edit mode

Also if I perform the same analysis on the upregulated genes that are log2fc>=1, value <0.05 RPKM >=6 I got the results (GO terms, excel, plot). and this doesn't make any sense: why should I get the results with a subset of gene and not with a bigger pool of genes?

ADD REPLY
0
Entering edit mode

It seems to me that the function is working but that the problem relates to how enrichPathway performs enrichment (?). Keeping in mind that there are so many ways to perform gene / pathway enrichment, the 'power' to detect an association with a signature / pathway may ironically be increased if your pool of input genes is small, and assuming that this small subset of genes are related in function. However, if these same genes are included in a large pool of genes, their signal is more diluted and 'lost'

ADD REPLY
0
Entering edit mode

but why I can still get the excel file with a bigger pool of genes ( when I run and export the "result" file result <- x@result ) but I am not able to plot them? I would expect that if I run head(as.data.frame(x)) and I got <0 rows> (or 0-length row.names) when I export the results I should have zero rows not a list of GO temrs with their associated genes.

ADD REPLY
0
Entering edit mode

It could very well be related to default thresholds used. For example, none of your enriched pathways pass FDR<0.2, which is the default for enrichPathway(): https://www.rdocumentation.org/packages/ReactomePA/versions/1.16.2/topics/enrichPathway

ADD REPLY

Login before adding your answer.

Traffic: 359 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6