I am currently analysing ChIP-seq data of S. lycopersicum (tomato). My goal is to perform a GO enrichment analysis in R using ChIPpeakAnno. The problem here is that there are no annotation data packages for S. lycopersicum (complete list available here: http://www.bioconductor.org/packages/release/data/annotation/). Do you have any suggestions on how to call the getEnrichedGO and getEnrichedPATH functions in this case?
The issue with this should be that the annotatedPeak_H2O_K4me3_1 is a GRanges object and not a character vector. So going back to a step before calling the getEnrichedGO, it is necessary to add feature IDs to the annotated peaks (which I cannot do).
> addGeneIDs(annotatedPeak_H2O_K4me3_1, orgAnn = org.Slycopersicum.eg.db, IDs2Add = c("entrez_id"))
'getOption("repos")' replaces Bioconductor standard repositories, see 'help("repositories", package =
"BiocManager")' for details.
Replacement repositories:
CRAN: https://cran.rstudio.com/
Bioconductor version 3.17 (BiocManager 1.30.22), R 4.3.0 (2023-04-21 ucrt)
Installing package(s) 'org.Slycopersicum.eg.db'
Error in addGeneIDs(annotatedPeak_H2O_K4me3_1, orgAnn = org.Slycopersicum.eg.db, :
Please refer
http://www.bioconductor.org/packages/release/data/annotation/
for available org.xx.eg.db packages
In addition: Warning messages:
1: package 'org.Slycopersicum.eg.db' is not available for Bioconductor version '3.17'
A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
2: In library(orgAnn, character.only = TRUE, logical.return = TRUE) :
there is no package called 'org.Slycopersicum.eg.db'
The problem is that you didn't follow the code I provided you. Note that I said to use orgAnn = "org.Slycopersicum.eg.db", whereas you used orgAnn = org.Slycopersicum.eg.db. Those are not the same things!
You are also mis-diagnosing the source of the error, which has to do with the fact that getEnrichedGO expects orgAnn to be character so it can clip off the '.db' from the end.
> sub(".db", "", org.Slycopersicum.eg.db)
Error in as.vector(x, "character") :
cannot coerce type 'environment' to vector of type 'character'
But I don't think it's going to work anyway, because the function is expecting to find AnnDbBiMap objects in the orgDb, and there are none in a bare package like this. Ideally it would be refactored to use any SQLite-based orgDb package instead of using BiMap objects that were superceded with better tools over a decade ago, but not my package.
But as I already mentioned, it's simple enough to use existing tools to do the GO stuff. Here's an example using some random analysis I recently did.
Hi Jerina, I agree with James' comments on orgDb. Since getEnrichedGO relies on a complete orgDb with AnnDbBiMap, you can use other tools for the GO analysis. I am not sure which annotation file did you use to obtain your annotated peaks. However, the tomato gene id nomenclature does seem different compared to human and mouse. You may need to convert them into Entrez id, see this post. Alternatively, it might be easier to use tools specifically designed for plant GO analysis like this one.
Right, the org.Slycopersicum.eg.db comes with Entrez ID. Then, just ensure that the feature id type in the annotated peak is also using Entrez ID and James' solution should do the job.
Error in makeValidParams(.Object) : no geneIds in universeGeneIdsFALSE
In addition: Warning message:
In makeValidParams(.Object) : removing geneIds not in universeGeneIds
Then, you probably need to convert the IDs into Entrez ID, refer to the post in my previous reply. Or use the web tool designed for plant GO analysis also mentioned in my previous reply.
Or just run getAnnotation using the org.Slycopersicum.eg.db object that you already have in your possession. Which seems the easier way to go, but ymmv.
Sorry. I meant addGeneIDs, which does what you might imagine. If you have used Ensembl based annotations (e.g., you used a GTF or EnsDb for annotatePeakInBatch), then you might not want to use GOstats. Instead you could use topGO. I'll leave it to you to read/understand how that package works.
It doesn't seem to work. Do you think there's a way to fix this using biomaRt instead?
Can you specify command you used as well as the error info, and share a piece of your annotated peak file for further debugging? Thanks,
The issue with this should be that the annotatedPeak_H2O_K4me3_1 is a GRanges object and not a character vector. So going back to a step before calling the getEnrichedGO, it is necessary to add feature IDs to the annotated peaks (which I cannot do).
The problem is that you didn't follow the code I provided you. Note that I said to use orgAnn = "org.Slycopersicum.eg.db", whereas you used orgAnn = org.Slycopersicum.eg.db. Those are not the same things!
You are also mis-diagnosing the source of the error, which has to do with the fact that
getEnrichedGO
expects orgAnn to be character so it can clip off the '.db' from the end.But I don't think it's going to work anyway, because the function is expecting to find
AnnDbBiMap
objects in theorgDb
, and there are none in a bare package like this. Ideally it would be refactored to use any SQLite-basedorgDb
package instead of usingBiMap
objects that were superceded with better tools over a decade ago, but not my package.But as I already mentioned, it's simple enough to use existing tools to do the GO stuff. Here's an example using some random analysis I recently did.
Hi Jerina, I agree with James' comments on
orgDb
. SincegetEnrichedGO
relies on a completeorgDb
withAnnDbBiMap
, you can use other tools for the GO analysis. I am not sure which annotation file did you use to obtain your annotated peaks. However, the tomato gene id nomenclature does seem different compared to human and mouse. You may need to convert them into Entrez id, see this post. Alternatively, it might be easier to use tools specifically designed for plant GO analysis like this one.Best,
There's no need to change the IDs.
But do note that when using a bare package, you have to use the object itself rather than the object's name
Right, the
org.Slycopersicum.eg.db
comes with Entrez ID. Then, just ensure that the feature id type in the annotated peak is also using Entrez ID and James' solution should do the job.I used biomaRt to annotate my peaks.
When I run your code, I still get an error message (probably because my annotated peaks do not use Entrez ID).
[1] "Solyc01g005000.3" "Solyc01g005010.3" "Solyc01g005020.3" "Solyc01g005030.3" "Solyc01g005070.3" "Solyc01g005080.3"
Error in makeValidParams(.Object) : no geneIds in universeGeneIdsFALSE In addition: Warning message: In makeValidParams(.Object) : removing geneIds not in universeGeneIds
Then, you probably need to convert the IDs into Entrez ID, refer to the post in my previous reply. Or use the web tool designed for plant GO analysis also mentioned in my previous reply.
Or just run
getAnnotation
using theorg.Slycopersicum.eg.db
object that you already have in your possession. Which seems the easier way to go, but ymmv.Do you mean like this:
Because if yes, then it doesn't look possible because a mart object is required for this.
Sorry. I meant
addGeneIDs
, which does what you might imagine. If you have used Ensembl based annotations (e.g., you used a GTF orEnsDb
forannotatePeakInBatch
), then you might not want to use GOstats. Instead you could usetopGO
. I'll leave it to you to read/understand how that package works.