Using GOsemsim to calculate semantic similarity between orthologus genes of two species
1
0
Entering edit mode
@saadmurtazakhan-6782
Last seen 9 months ago
United States

Hi,

I know this question has been asked before a long time ago and I don't see an answer of that question in the mailing list or in the vignette of GOsemsim package. So I was wondering what is the easiest possible way of calculating GO semantic similarity value for orthologus gene pairs between two species using the above R package or any other package you know of. I am not doing this for less annotated species I need to calculate that for orthologus genes between Human and Mouse (both of which are well annotated IMHO). So I would much appreciate it if anyone who has already done this before can point me to a resource which already has pre-calculated semantic similarity values for Mouse and Human orthologues or has inbuilt code to do that.

Thanks & regards

GOsemsim Gene ontology semantic similarity across species • 3.9k views
ADD COMMENT
0
Entering edit mode
Guangchuang Yu ★ 1.2k
@guangchuang-yu-5419
Last seen 11 weeks ago
China/Guangzhou/Southern Medical Univer…

This is not supported by GOSemSim, if you are talking about using geneSim/mgeneSim, which will mapped gene ID to GO terms by GOSemSimDATA object which used OrgDb object internally, and if IC methods were used, we also need to pre-calculate information content of each GO term.

Cross species semantic similarity measurement can be possible if I implement a function to merge GOSemSimDATA objects (e.g. one from human and one from mouse, the function should be chainable, so multiple objects can be merged sequentially). In this way you can use the merged object as background annotation to calculate semantic similarity among genes using geneSim/mgeneSim.

This is now on the TODO list, I may add this functionality to next release.

 

Currently, it is still possible if you use Wang method which don't need pre-calculated IC. You can firstly map your input genes to GO terms and use mgoSim to calculate their similarities via Wang method which only use the GO structure.

 

 

ADD COMMENT
0
Entering edit mode

But mgoSim also takes semData as input. How do I bypass that?

ADD REPLY
0
Entering edit mode
> args(mgoSim)
function (GO1, GO2, semData, measure = "Wang", combine = "BMA")
ADD REPLY
1
Entering edit mode

I meant what should I specify as semData. In the vignette it uses hsGO as semdata. Since here the data is from two different species what should be semData here.

ADD REPLY
0
Entering edit mode

As I said, merge two semData from two different species will be in TODO.

 

Currently, what you can do is mgoSim(measure="Wang"). In this case, see the following example:

> d=godata(ont="MF")
> go1 <- c("GO:0004022", "GO:0004024", "GO:0004023")
> mgoSim(go1, go1, semData=d, measure="Wang", combine=NULL)
           GO:0004022 GO:0004024 GO:0004023
GO:0004022      1.000      0.869      0.869
GO:0004024      0.869      1.000      0.747
GO:0004023      0.869      0.747      1.000
> mgoSim(go1, go1, semData=d, measure="Wang")
[1] 1

 

ADD REPLY
0
Entering edit mode

But in this example you still have to give semData as an argument. Should I give hsGO or mmGO as semData or would both yeild similar results?

ADD REPLY
0
Entering edit mode

In this case, mgoSim only need the information of which ontology is using.

 

using hsGO or mmGO as semData will generate identical result, since species information will not used in mgoSim(measure="Wang").

 

ADD REPLY
0
Entering edit mode

Thanks that helps.

ADD REPLY
0
Entering edit mode

Some of the genes where number of GO terms are not equal always return 1. Is that an accurate value for functional similarity to use? what can be possible turnaround for the same other than not considering those genes altogether?

ADD REPLY
0
Entering edit mode

number of go terms is not a factor in semantic similarity calculation.

see https://bioconductor.org/packages/devel/bioc/vignettes/GOSemSim/inst/doc/GOSemSim.html#combine-methods

ADD REPLY
0
Entering edit mode

So whats explains the reason for so many gene pairs having value 1?

ADD REPLY
0
Entering edit mode

as you are comparing orthologus genes, it is expected.

did you read the document? If you did, you may want to try 'avg' combine method.

 

 

ADD REPLY

Login before adding your answer.

Traffic: 535 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6