reactome.db is not updated?
7
2
Entering edit mode
Guangchuang Yu ★ 1.2k
@guangchuang-yu-5419
Last seen 7 weeks ago
China/Guangzhou/Southern Medical Univer…

Dear all,
 

One of a user of my package ReactomePA found that a reported enriched pathway is not exists in reactome website: http://www.reactome.org/cgi-bin/link?SOURCE=Reactome&ID=1445148

The pathway is exists in reactome.db:
> require(reactome.db)
> get("1445148", reactomePATHID2NAME)
[1] "Homo sapiens: Translocation of GLUT4 to the plasma membrane"

This is why ReactomePA report it.

After searching the website, I found the pathway ID was change to 147867:
http://www.reactome.org/content/detail/REACT_147867

It seems that the reactome.db package is not updated.
 

Best Regards,

Guangchuang Yu

genesetenrichment pathways reactome.db • 4.3k views
ADD COMMENT
0
Entering edit mode

Please post your sessionInfo() so we know what version of reactome.db you are looking at.

ADD REPLY
0
Entering edit mode

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
[1] reactome.db_1.50.0   RSQLite_1.0.0        DBI_0.3.1           
[4] AnnotationDbi_1.28.1 GenomeInfoDb_1.2.3   IRanges_2.0.0       
[7] S4Vectors_0.4.0      Biobase_2.26.0       BiocGenerics_0.12.1

 

ADD REPLY
1
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 8.4 years ago
United States

Hi Pablo,

Yes that doc you were reading is out of date.  I have actually been updating reactome.db for a while now (with occasional input from Willem).  There has been an updated version every release and they are numbered to match the version of reactome that they contain.  But after the most recent update (merely a few weeks ago) the reactome folks pulled the denormalized db dumps from their site.  And now I can see why since it looks like the most recent denormalized dumps were not actually ever updated to reactome version 50.  That's a bummer since it basically means that the most recent package has some stale information in it (which I am working on getting updated).  This is the ultimate cause of both of the different problems that Guangchuang has mentioned.  These fields are all ones that originate in the denormalized database.

Anyhow, I am currently discussing a solution to this problem with them but the conversation has been proceeding at a pace of about one reply per day and so it still hasn't wrapped up (yet).  It all looks hopeful though (the reactome guys are helping me to know more about what their long term future plans are for this data resource).  And we are currently discussing the best alternative routes for me to get the information that previously came from their denormalized database.  When the conversation has concluded I will make a new updated reactome.db package I will push it online and post about it here.

 

 Marc

ADD COMMENT
1
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 8.4 years ago
United States

Hi Guangchuang,

Unfortunately, I am still waiting for some key replacement data from reactome.  On a positive note though, a few days ago the reactome guys emailed me again confirming that they planned to really do this very soon. 

 

 Marc

ADD COMMENT
1
Entering edit mode
@willemligtenberg-6989
Last seen 7.2 years ago
Netherlands

I can finally say that we have found a solution.

It was a long journey, and we will still need a long term solution, since this time they had to generate the file at Reactome, instead of me being able to do it myself. We will continue working together for a better long term solution.

I have submitted the package, and I hope it will make it in the next release. In the mean time, you can download the latest version here:
https://share.openanalytics.eu/data/public/reactome.php

ADD COMMENT
0
Entering edit mode

Why the file downloaded from the above link only have 14M in size but the one in http://www.bioconductor.org/packages/3.1/data/annotation/html/reactome.db.html have more than 400M.

Both of them still have the issue:

> get("71593", reactomePATHID2EXTID)
[1] "178"  "2992" "8908"
> get("71593", reactomePATHID2NAME)
Error in .checkKeys(value, Lkeys(x), x@ifnotfound) :
  value for "71593" not found

ADD REPLY
2
Entering edit mode

OK so Willem was finally able to get the reactome people to give one of us an updated file.  For that he deserves some appreciation!  And yesterday he gave me a package which I then did extensive modifications to.  This was needed so that things like select() will work.  I also (in order to be consistent with the past) included the full reactome database for release 52 (which is the most recent release and also the one that matches the important denormalized data that Willem secured for us.  This is the package that is in devel.  Unfortunately, we waited so darned long for these people to get us the data that we basically lost/skipped a release of reactome.  As was pointed out here before the version that was in release was not really fully release 50 and the reactome people never helped us with that version.  So in order to keep things honest and transparent, I changed the version number of the older reactome that has been in release to correctly reflect that which is why it will now say release 48 if you go looking for it.

As for the 'issue' that you are reporting, the trouble is that you are comparing two different bimaps.  So that means that even though both bimaps contain keys of the same type, they are not each a full set of all keys of that kind ever used.  So for example if you do this:

p2el <- as.list(reactomePATHID2EXTID)
head(names(p2el))
length(names(p2el))
## And compare that to this:
p2nl <- as.list(reactomePATHID2NAME)
head(names(p2nl))
length(names(p2nl))

What you will notice is that you get different sets of keys.  They are the same kind of key, but they are different sets of that kind of key.  If you look at the intersection of these two sets of keys you will find that they only mostly overlap.

table(names(p2nl) %in% names(p2el))

But they do not overlap completely since the mappings themselves represent very different relationships.  I hope that this helps explain things better.

Also: I noticed while replying to you that the keys method for the "PATHID" keytype is not returning as many results as it should.  So I have patched that behavior and will be checking it in soon.  In the future, that result should be more complete than it is now.

 

 Marc

ADD REPLY
0
Entering edit mode

Thanks Marc. Now it's very clear.

I really appreciate both of your efforts in maintaining reactome.db.

Bests,

Guangchuang

ADD REPLY
1
Entering edit mode

The size difference is because the one that is published on Bioconductor also includes a full SQLite version of the entire Reactome database. Useful for people who want to go beyond the mappings provided in the package itself.

The other issue is cause by an error in the query that was run.
PATHID2EXTID also contains reactions, not only pathways, whereas PATHID2NAME actually only has pathways.
We will try to address this before release.

ADD REPLY
0
Entering edit mode

Thanks Willem.

ADD REPLY
0
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 8.4 years ago
United States

Hi Guangchuang,

I actually did update this package to version 50 of the reactome database for the most recent release.  But now I am searching for the source of the issue that you mentioned. 

I will post again here when I work out what happened with these latest files from reactome.

 

 Marc

ADD COMMENT
0
Entering edit mode
Guangchuang Yu ★ 1.2k
@guangchuang-yu-5419
Last seen 7 weeks ago
China/Guangzhou/Southern Medical Univer…

Dear Marc,

There is also another issue. Some valid pathID in reactome.db don't have pathway name.

    ## > get("5493857", reactomePATHID2EXTID)
    ##  [1] "510850" "523328" "282187" "282188" ...
    ## > get("5493857", reactomePATHID2NAME)
    ## Error in .checkKeys(value, Lkeys(x), x@ifnotfound) : 
    ##   value for "5493857" not found

 

I have a dirty hack to solve it by removing these pathIDs.

https://github.com/GuangchuangYu/ReactomePA/commit/a507bf9bb398854ce251801de0dad4ebb76ab924

 

Can you also check this issue?

 

Bests,

Guangchuang

 

 

ADD COMMENT
0
Entering edit mode
@pablo-moreno-7064
Last seen 19 months ago
University of Cambridge, UK

Hi,

Reactome.db sqlite database was generated with files that are no longer available from Reactome.org to download (a denormalized version of the database, according to the doc, on 2010, but that might be inaccurate). Please see the following thread by Willem (the maintainer of reactome.db):

C: reactome.db: reactome IDs not mapped to pathway names

All the best,

Pablo

 

 

ADD COMMENT
0
Entering edit mode
Guangchuang Yu ★ 1.2k
@guangchuang-yu-5419
Last seen 7 weeks ago
China/Guangzhou/Southern Medical Univer…

any updated news?

ADD COMMENT

Login before adding your answer.

Traffic: 480 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6