How does the org.Dm.eg.db package deal with NOT annotation qualifiers when annotating genes to GO?
2
0
Entering edit mode
triZZla • 0
@ivozeller-12603
Last seen 9 months ago
Germany

I am interested in gene ontology enrichment and/or depletion analysis in D. melanogaster. Therefore I implemented an package in R that does the job, also because I really want to learn R. For the direct annotation of Entrez ids to GO terms I used the org.Dm.eg.db package. By accident I discovered that also Genes with an NOT qualifier are annotated (roughly 3% of the fly genome) to a certain GO category making them indistinguishable from genes that are truely associated with a GO category.  Did I overlook something or is there something to it?

org.dm.eg.db gene ontology go enrichment annotation NOT • 2.3k views
ADD COMMENT
0
Entering edit mode

Is the NOT qualifier not specific for D. melanogaster GO annotation? As someone working with human or mouse data, I have never heard of that GO qualifier before... I would suggest a little more explanation and some sample code/examples would help, especially for the BioC core members who generate these type of annotation packages (and are likely less of a domain expert on Dm than you are...).

 

Edit: this link may be useful; after a quick read it seems to me the NOT qualifier is specific for FlyBase. Since the GO mappings for Dm are based on info directly derived from the GO consortium (and not FlyBase), this may be the cause of this apparent discrepancy?

ADD REPLY
0
Entering edit mode

The NOT qualifier is relevant for many databases:

"GO uses three qualifiers, contributes_to, colocalizes_with and NOT, to further refine annotations (see
the GO annotation conventions). The NOT qualifier,
which indicates the lack of a property, is most vital in
data interpretation. This is used judiciously, only when
there is potential for confusion or contradiction. For
example, a gene product might have sequence similarity
to protein kinases, but the curator can apply the NOT
qualifier to indicate that, contrary to expectation, the
gene product does not exhibit kinase activity based on
published results. Although the total number of NOT
annotations is minor, several databases have hundreds of
these annotations (TABLE 3)"    from doi:10.1038/nrg2363  Rhee et al., 2008

On the gene ontology website  go to download -> annotations -> download the annotation textfile of the desired species

run read.delim(gzfile(path_to_annotation_textfile), na.strings = "", header = FALSE, 
    comment.char = "!", sep = "\t") in R   column "V4" gives you the qualifier of the annotation; check if the annotations that are qualified with "NOT" are also present in the org.your_fav_species.eg.db file. In my opininion they should not be. In the case of org.Dm.eg.db it seems like they are.

ADD REPLY
0
Entering edit mode

For human few proteins the NOT qualifier is also used, but this is only for a small percentage (1281 out of 409697 entries). Whether this is taken into account in the annotation packages (org.Xx.eg.db or GO.db), and if so how, is a good question! I don't know....

 

> GAF <- read.delim(gzfile("goa_human.gaf.gz"), na.strings = "", header = FALSE, comment.char = "!", sep = "\t")
>
> table(GAF$V4)

    colocalizes_with       contributes_to                  NOT
                1159                 1144                 1269
NOT|colocalizes_with   NOT|contributes_to
                  14                    2
>
> length(GAF$V4)
[1] 409697
>
> sum(!is.na(GAF$V4))
[1] 3588
>
> sum(is.na(GAF$V4))
[1] 406109
>

http://www.geneontology.org/page/download-annotations

 

ADD REPLY
0
Entering edit mode
triZZla • 0
@ivozeller-12603
Last seen 9 months ago
Germany

I did test all NOT qualified GOIDS in Dm (before I just checked inviduals by eye) and it seems to me that in the majority of cases the NOT qualifier is respected in the Org.Dm.eg.db package.

Here is a link to an R data object containing  https://drive.google.com/file/d/0B2cadPb0HTwbZW5sZWFxMUZOLUU/view?usp=sharing the basis of the test:

one file containing  entrez Ids associated with GOIDs (created using org.Dm.eg.db)

the other file containing entrez ids associated with GOID categories from which they are explicitly excluded by the "NOT" qualifier (these mappings should not appear in the org.Dm.eg.db package)

 

org.Dm.eg.db_3.4.0 was used and the latest annotation file from http://www.geneontology.org/page/download-annotations

 

ADD COMMENT
1
Entering edit mode

I get the same:

> library(org.Dm.eg.db)
> con <- org.Dm.eg_dbconn()
> annot <- read.delim("C:/Users/jmacdon/Downloads/gene_association.fb", na.strings = "", header=F, comment.char = "!", sep = "\t")

> tocheck <- annot[annot[,4] %in% "NOT",c(2,5)]

> checked <- lapply(1:nrow(tocheck), function(x) dbGetQuery(con, paste0("select flybase_id, go_id, evidence from go_all inner join flybase using(_id) where flybase_id='", tocheck[x,1],"' and go_id='", tocheck[x,2],"';")));

> table(sapply(checked, nrow))

  0   1   2   3
404  75  17   7

So just under 20% of the NOT mappings still exist in the database.

 

 

ADD REPLY
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.9 years ago
United States

Thanks for the feedback. We're rebuilding the annotations over the next few weeks for the April 25 release and we won't include the GO terms with the NOT annotation.

Valerie

ADD COMMENT
2
Entering edit mode

Hi,

I've just finished building the new db0 packages (versions 3.4.2). It turns out that we do filter out genes with the NOT qualifier ... the catch is that some of these genes have a gene ID -> GO mapping that is both ok and a NOT. The output below is from an intermediate database used to build the final packages. You can see some pubmed IDs support the relationship and some don't. We filter the ones that don't and are left with the ones that do.

> dbGetQuery(con, "select gene_id,evidence,go_qualifier,pubmed_id from gene2go where gene_id=31625")
   gene_id evidence go_qualifier                  pubmed_id
1    31625      IDA            -                   20826458
2    31625      IMP            - 16177138|16199763|21317294
3    31625      IMP          NOT                   15965240
4    31625      NAS            -                   10908587
5    31625      ISS            -                          -
6    31625      ISS            -                          -
7    31625      ISS            -                          -
8    31625      IDA            -                   20826458
9    31625      IMP          NOT                   16177138
10   31625      NAS            -                   10908587
11   31625      ISS            -                          -
12   31625      IMP          NOT                   16177138
13   31625      ISS            -                          -
14   31625      IDA            -                   20826458
15   31625      IDA            -                   20826458

Thanks to Jim and Martin for getting to the bottom of this.

Valerie

 

ADD REPLY

Login before adding your answer.

Traffic: 623 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6