Question

When doing the GO gene enrichment analysis of Populus euphratica, Annotation types can be converted, but enrichment analysis cannot be performed!

0

Entering edit mode

xuyao15927402897 • 0

@xuyao15927402897-20826

Last seen 3.7 years ago

Germany

Recently, I was doing RNA-Seq data analysis of Populus euphratica, but When doing the GO gene enrichment analysis of Populus euphratica, an error of No gene can be mapped ... was thrown！ The important thing is that the gene annotation types can be converted, but cannot be analyzed later.

According to the documentation of clusterProfiler, I used AnnotationHub to build a genome annotation set, the code is as follows:

library(AnnotationHub)
hub <- AnnotationHub::AnnotationHub()
query(hub, "Populus euphratica")
euphratica.OrgDb <- hub[["AH75896"]]`

Following are the extracted differential genes:

gene [1] "105138834" "105135191" "105132258" "105140044" "105134133" "105133575" "105141909" "105123270" "105109891" "105132930" "105124276" [12] "105113312" "105135724" "105136902" "105140249" "105114638" "105111984" "105110579" "105139159" "105142457" "105124308" "105122018" [23] "105115857" "105130255" "105123530" "105126014" "105128710" "105127221" "105124911" "105128353" "105137258" "105137322" "105124790" [34] "105122488" "105134690" "105133317" "105142048" "105139696" "105119632" "105131399" "105113266" "105128458" "105138530" "105137532" [45] "105137080" "105107906" "105115364" "105133132" "105135049" "105111831" "105141437" "105136170" "105142560" "105142561" "105127580" [56] "105130295" "105114049" "105110139" "105133472" "105139576" "105112511" "105139018" "105142810" "105113922" "105134758" "105130163" [67] "105133142" "105125965" "105132391" "105127885" "105109447" "105139648" "105124384" "105128925" "105129037" "105116503" "105132431"`

GO

ego <- enrichGO(
gene = gene, 
OrgDb= euphratica.OrgDb,
keyType = 'ENTREZID', 
ont = "BP",
pAdjustMethod = "BH",
pvalueCutoff  = 0.01,
qvalueCutoff  = 0.05
)

then,I got the error message：

--> No gene can be mapped....
--> Expected input gene ID: 20160501,20160469,20160501,20160510,20160473,20160415
--> return NULL...

Then I checked if the gene exists and the type matches, and the results are correct. such as the frist gene 105138834

> "105138834" %in% gene
[1] TRUE
> "105138834" %in% keys(euphratica.OrgDb, "ENTREZID")
[1] TRUE

And the type of gene annotation can change：

df <- bitr(genes, 
fromType = "ENTREZID",
toType = c("GENENAME","GID", "SYMBOL"),
OrgDb = euphratica.OrgDb
)

> head(df)
   ENTREZID                                           GENENAME       GID       SYMBOL
1 105138834                       uncharacterized LOC105138834 105138834 LOC105138834
2 105135191            U-box domain-containing protein 21-like 105135191 LOC105135191
3 105132258                       acid beta-fructofuranosidase 105132258 LOC105132258
4 105140044                           plasma membrane ATPase 4 105140044 LOC105140044
5 105134133     protein ASPARTIC PROTEASE IN GUARD CELL 2-like 105134133 LOC105134133
6 105133575 zinc finger CCCH domain-containing protein 47-like 105133575 LOC105133575

Thanks， Yao

clusterProfiler • 1.5k views

ADD COMMENT • link updated 5.3 years ago by James W. MacDonald 68k • written 5.3 years ago by xuyao15927402897 • 0

0

Entering edit mode

I am not sure, but could it be that is due to the fact no Gene Ontology information is present in euphratica.OrgD?

ADD REPLY • link 5.3 years ago Guido Hooiveld ★ 4.1k

score 0 · Answer 1 · 2020-01-20

There is some GO data in that package, just not much. And none of which is mapped to the Gene IDs that you are interested in.

> hub <- AnnotationHub()
> z <- hub[["AH75896"]]
> z
OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: NOSCHEMA_DB
| ORGANISM: Populus euphratica
| SPECIES: Populus euphratica
| CENTRALID: GID
| Taxonomy ID: 75702
| Db type: OrgDb
| Supporting package: AnnotationDbi

## note that presenting these genes using dput() would have been friendly. As it stands I had to do
## some copy/paste action to get this...

> gns <- c("105138834", "105135191", "105132258", "105140044", "105134133", "105133575", "105141909",
+          "105123270", "105109891", "105132930", "105124276", "105113312", "105135724", "105136902",
+          "105140249", "105114638", "105111984", "105110579", "105139159", "105142457", "105124308",
+          "105122018",  "105115857", "105130255", "105123530", "105126014", "105128710", "105127221",
+          "105124911", "105128353", "105137258", "105137322", "105124790","105122488", "105134690",
+          "105133317", "105142048", "105139696", "105119632", "105131399", "105113266", "105128458",
+          "105138530", "105137532",  "105137080", "105107906", "105115364", "105133132", "105135049",
+          "105111831", "105141437", "105136170", "105142560", "105142561", "105127580",  "105130295",
+          "105114049", "105110139", "105133472", "105139576", "105112511", "105139018", "105142810",
+          "105113922", "105134758", "105130163",  "105133142", "105125965", "105132391", "105127885",
+          "105109447", "105139648", "105124384", "105128925", "105129037", "105116503", "105132431")
> zz <- select(z, gns, "GOALL")
'select()' returned 1:1 mapping between keys and columns
> head(zz)
        GID GOALL
1 105138834  <NA>
2 105135191  <NA>
3 105132258  <NA>
4 105140044  <NA>
5 105134133  <NA>
6 105133575  <NA>
> allis.na(zz[,2]))
[1] TRUE

## So no mappings between your genes and GO terms
## let's try a direct connection to the underlying Db
> con <- dbconn(z)
> library(DBI)
> dbListTables(con)
 [1] "accessions"   "alias"        "chromosomes"  "entrez_genes" "gene_info"   
 [6] "genes"        "go"           "go_all"       "go_bp"        "go_bp_all"   
[11] "go_cc"        "go_cc_all"    "go_mf"        "go_mf_all"    "map_counts"  
[16] "map_metadata" "metadata"     "pubmed"       "refseq"      
> dbGetQuery(con, "select * from go_all limit 20;")
   _id      GOALL EVIDENCEALL ONTOLOGYALL
1    1 GO:0009772         IEA          BP
2    1 GO:0018298         IEA          BP
3    1 GO:0009635         IEA          BP
4    1 GO:0008152         IEA          BP
5    1 GO:0006091         IEA          BP
6    1 GO:0055114         IEA          BP
7    1 GO:0022900         IEA          BP
8    1 GO:0008150         IEA          BP
9    1 GO:0009767         IEA          BP
10   1 GO:0009987         IEA          BP
11   1 GO:0015979         IEA          BP
12   1 GO:0019684         IEA          BP
13   1 GO:0044237         IEA          BP
14   1 GO:0006464         IEA          BP
15   1 GO:0006807         IEA          BP
16   1 GO:0019538         IEA          BP
17   1 GO:0036211         IEA          BP
18   1 GO:0043170         IEA          BP
19   1 GO:0043412         IEA          BP
20   1 GO:0044238         IEA          BP

## So there are some mappings. But how many?

> nrow(dbGetQuery(con, "select distinct _id from go_all;"))
[1] 143
>

So there are only 143 Gene IDs that map to any GO term, out of the ~36k genes in that OrgDb object.