topGO::annFUN.org() problem due to case-sensitivity of OrgDb field names
0
0
Entering edit mode
psutton • 0
@psutton-20110
Last seen 5.8 years ago

topGO::annFUN.org() has the following lines:

## function annFUN.org() to work with the "org.XX.eg" annotations
annFUN.org <- function(whichOnto, feasibleGenes = NULL, mapping, ID = "entrez") {

     # [some lines have been omitted]

    geneID <- keyName[tolower(ID)]
    .sql <- paste("SELECT DISTINCT ", geneID, ", go_id FROM ", tableName[tolower(ID)],
                  " INNER JOIN ", paste("go", tolower(whichOnto), sep = "_"),
                  " USING(_id)", sep = "")
    retVal <- dbGetQuery(get(paste(mapping, "dbconn", sep = "_"))(), .sql)

    ## restric to the set of feasibleGenes
    if(!is.null(feasibleGenes))
        retVal <- retVal[retVal[[geneID]] %in% feasibleGenes, ]

    ## split the table into a named list of GOs
    return(split(retVal[[geneID]], retVal[["go_id"]]))
}

I created a custom OrgDb (for a non-model organism) using AnnotationForge, which I wanted to use with topGO.

The custom OrgDb didn't have tables like go_bp, so topGO wouldn't work at first (which I posted about in https://support.bioconductor.org/p/118713/), but I was able to create that table and get around that problem.

Using my custom OrgDb, topGO failed on the last line of annFUN.org() with the call to split(), because there is no data in retVal[[geneID]]

I debugged this and found out that custom OrgDb files created using AnnotationForge seem to have uppercase field names like SYMBOL and ENSEMBL in the SQLite tables. In contrast, the standard OrgDb packages like org.Hs.eg.db have uppercase columns(), but lowercase field names in the SQLite tables.

The weird thing, is that SQL is case-insensitive, so the SQL query above returns data whether or not the field name stored in geneID is uppercase or lowercase. But retVal[[geneID]] returns NULL when the SQLite table field names are uppercase, because of the line geneID <- keyName[tolower(ID)], which is why my custom OrgDb failed here.

My question is: would it be possible for topGO::annFUN.org to handle upper and lowercase field names more gracefully?

Sorry, I am new to bioconductor and annotation packages, and it is overwhelming at times, since it seems a lot more difficult to work with a non-model organism. It took me a long time to figure out why split() was throwing an error.

topGO AnnotationForge • 1.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 579 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6