Question

ChIPpeakAnno: S.lycopersicum annotation missing

0

Entering edit mode

Jerina ▴ 10

@e377d406

Last seen 15 months ago

Germany

Hello.

I am currently analysing ChIP-seq data of S. lycopersicum (tomato). My goal is to perform a GO enrichment analysis in R using ChIPpeakAnno. The problem here is that there are no annotation data packages for S. lycopersicum (complete list available here: http://www.bioconductor.org/packages/release/data/annotation/). Do you have any suggestions on how to call the getEnrichedGO and getEnrichedPATH functions in this case?

Thanks in advance for your help!

ChIPpeakAnno ChIPSeq • 1.7k views

ADD COMMENT • link updated 15 months ago by James W. MacDonald 67k • written 15 months ago by Jerina ▴ 10

James W. MacDonald · Answer 1 · 2023-08-22

0

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 11 hours ago

United States

This might work.

> library(AnnotationHub)
> hub <- AnnotationHub()
> org.Slycopersicum.eg.db <-  hub[["AH112192"]]
> getEnrichedGO(<your anno object goes here>, "org.Slycopersicum.eg.db", feature_id_type = "entrez_id")

Where I am assuming that it will accept a bare package from AnnotationHub. Otherwise you could use GOstats or topGO to do the analysis directly.

ADD COMMENT • link 15 months ago James W. MacDonald 67k

0

Entering edit mode

It doesn't seem to work. Do you think there's a way to fix this using biomaRt instead?

ADD REPLY • link 15 months ago Jerina ▴ 10

0

Entering edit mode

Can you specify command you used as well as the error info, and share a piece of your annotated peak file for further debugging? Thanks,

ADD REPLY • link 15 months ago Kai Hu ▴ 70

0

Entering edit mode

> ah <- AnnotationHub()

snapshotDate(): 2023-04-24

> query(ah, c("Solanum"))
AnnotationHub with 11 records
# snapshotDate(): 2023-04-24
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/, Inparanoid8, WikiPathways
# $species: Solanum lycopersicum, Solanum tuberosum, Solanum verrucosum, Solanum stenotomum, Solanum pen...
# $rdataclass: OrgDb, Inparanoid8Db, Tibble
# additional mcols(): taxonomyid, genome, description, coordinate_1_based, maintainer,
#   rdatadateadded, preparerclass, tags, rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH10593"]]' 
             title                                             
  AH10593  | hom.Solanum_lycopersicum.inp8.sqlite              
  AH10606  | hom.Solanum_tuberosum.inp8.sqlite                 
  AH91815  | wikipathways_Solanum_lycopersicum_metabolites.rda 
  AH111863 | org.Solanum_stenotomum.eg.sqlite                  
  AH111980 | org.Solanum_verrucosum.eg.sqlite                  
  ...        ...                                               
  AH111995 | org.Solanum_pennellii.eg.sqlite                   
  AH112060 | org.Solanum_tuberosum.eg.sqlite                   
  AH112190 | org.Solanum_esculentum.eg.sqlite                  
  AH112191 | org.Solanum_lycopersicum.eg.sqlite                
  AH112192 | org.Solanum_lycopersicum_var._humboldtii.eg.sqlite

> org.Slycopersicum.eg.db <-  ah[["AH112192"]]

> org.Slycopersicum.eg.db 

OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: NOSCHEMA_DB
| ORGANISM: Solanum lycopersicum_var._humboldtii
| SPECIES: Solanum lycopersicum_var._humboldtii
| CENTRALID: GID
| Taxonomy ID: 4081
| Db type: OrgDb
| Supporting package: AnnotationDbi

#### annotatedPeak_H2O_K4me3_1 is a GRanges object

> getEnrichedGO(annotatedPeak_H2O_K4me3_1, orgAnn = org.Slycopersicum.eg.db, feature_id_type = "entrez_id", maxP = 0.01, minGOterm = 10, multiAdjMethod = "BH", condense = FALSE)

Error in as.vector(x, "character") : 
  cannot coerce type 'environment' to vector of type 'character'

ADD REPLY • link updated 15 months ago by James W. MacDonald 67k • written 15 months ago by Jerina ▴ 10

0

Entering edit mode

The issue with this should be that the annotatedPeak_H2O_K4me3_1 is a GRanges object and not a character vector. So going back to a step before calling the getEnrichedGO, it is necessary to add feature IDs to the annotated peaks (which I cannot do).

> addGeneIDs(annotatedPeak_H2O_K4me3_1, orgAnn = org.Slycopersicum.eg.db, IDs2Add = c("entrez_id"))

'getOption("repos")' replaces Bioconductor standard repositories, see 'help("repositories", package =
"BiocManager")' for details.
Replacement repositories:
    CRAN: https://cran.rstudio.com/
Bioconductor version 3.17 (BiocManager 1.30.22), R 4.3.0 (2023-04-21 ucrt)
Installing package(s) 'org.Slycopersicum.eg.db'
Error in addGeneIDs(annotatedPeak_H2O_K4me3_1, orgAnn = org.Slycopersicum.eg.db,  : 
  Please refer 
                 http://www.bioconductor.org/packages/release/data/annotation/ 
                 for available org.xx.eg.db packages
In addition: Warning messages:
1: package 'org.Slycopersicum.eg.db' is not available for Bioconductor version '3.17'

A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages 
2: In library(orgAnn, character.only = TRUE, logical.return = TRUE) :
  there is no package called 'org.Slycopersicum.eg.db'

ADD REPLY • link updated 15 months ago by James W. MacDonald 67k • written 15 months ago by Jerina ▴ 10

1

Entering edit mode

The problem is that you didn't follow the code I provided you. Note that I said to use orgAnn = "org.Slycopersicum.eg.db", whereas you used orgAnn = org.Slycopersicum.eg.db. Those are not the same things!

You are also mis-diagnosing the source of the error, which has to do with the fact that getEnrichedGO expects orgAnn to be character so it can clip off the '.db' from the end.

> sub(".db", "", org.Slycopersicum.eg.db)
Error in as.vector(x, "character") : 
  cannot coerce type 'environment' to vector of type 'character'

But I don't think it's going to work anyway, because the function is expecting to find AnnDbBiMap objects in the orgDb, and there are none in a bare package like this. Ideally it would be refactored to use any SQLite-based orgDb package instead of using BiMap objects that were superceded with better tools over a decade ago, but not my package.

But as I already mentioned, it's simple enough to use existing tools to do the GO stuff. Here's an example using some random analysis I recently did.

## output from annotatePeaksInBatch
> anno
GRanges object with 1387 ranges and 13 metadata columns:
         seqnames              ranges strand |     logFC   AveExpr         t
            <Rle>           <IRanges>  <Rle> | <numeric> <numeric> <numeric>
      X1    chr16   56667298-56670309      * |  -2.60460  1.918220  -9.37267
      X1    chr16   56667298-56670309      * |  -2.60460  1.918220  -9.37267
      X3     chr9 101435364-101435970      * |  -3.32757 -1.470352  -9.08833
      X4    chr16   89613215-89614540      * |  -2.59921  0.558632  -9.03299
      X9    chr14 100324929-100326203      * |  -2.61166  0.508296  -8.88097
     ...      ...                 ...    ... .       ...       ...       ...
  X12592     chr8   90216567-90217366      * |  1.583590 -0.981918   2.97212
  X12611    chr11 130318176-130318701      * | -1.235087 -0.158639  -2.95110
  X12616     chr3     3798668-3801306      * | -1.266596  1.873638  -2.95231
  X12616     chr3     3798668-3801306      * | -1.266596  1.873638  -2.95231
  X12647     chr1 220689936-220691860      * | -0.950788  2.047241  -2.93706
             P.Value   adj.P.Val        peak     feature      feature.ranges
           <numeric>   <numeric> <character> <character>           <IRanges>
      X1 1.60123e-08 0.000547409          X1        4495   56666730-56668065
      X1 1.60123e-08 0.000547409          X1        4496   56669814-56671129
      X3 2.63031e-08 0.000547409          X3         229 101420560-101435774
      X4 2.75393e-08 0.000547409          X4        1800   89613308-89638433
      X9 3.52892e-08 0.000547885          X9      283600 100323339-100330421
     ...         ...         ...         ...         ...                 ...
  X12592  0.00448576   0.0496826      X12592   100874052   90221488-90387959
  X12611  0.00449890   0.0497532      X12611       29068 130226679-130314917
  X12616  0.00450224   0.0497704      X12616   100130207     3700814-3800875
  X12616  0.00450224   0.0497704      X12616       57633     3799431-3849834
  X12647  0.00453369   0.0499937      X12647       79762 220690363-220699153
         feature.strand  distance insideFeature distanceToSite       symbol
                  <Rle> <integer>   <character>      <integer>  <character>
      X1              -         0  overlapStart              0         MT1G
      X1              +         0  overlapStart              0         MT1H
      X3              -         0  overlapStart              0        ALDOB
      X4              +         0  overlapStart              0        DPEP1
      X9              +         0        inside           1589     SLC25A47
     ...            ...       ...           ...            ...          ...
  X12592              +      4121      upstream           4121    LINC00534
  X12611              -      3258      upstream           3258       ZBTB44
  X12616              -         0  overlapStart              0 LOC100130207
  X12616              +         0  overlapStart              0        LRRN1
  X12647              +         0  overlapStart              0     C1orf115
  -------

> egids <- unique(anno$feature)
> univ <- keys(org.Hs.eg.db)
> p <- new("GOHyperGParams", geneIds = egids, universeGeneIds = univ, annotation = "org.Hs.eg.db", ontology = "BP", conditional = TRUE)
Warning message:
In makeValidParams(.Object) : removing geneIds not in universeGeneIds
> hyp <- hyperGTest(p)
> z <- summary(hyp, categorySize = 10)
> head(z)
      GOBPID       Pvalue OddsRatio   ExpCount Count Size
1 GO:0015849 1.910002e-10  3.315413  15.095520    44  353
2 GO:0015711 2.019091e-09  2.918192  18.045254    47  425
3 GO:0015718 3.738815e-08  3.950649   7.569142    26  177
4 GO:0071280 1.069559e-06 11.308767   1.154615     9   27
5 GO:0015698 2.488057e-06  3.368710   7.654669    23  179
6 GO:0051234 4.362749e-06  1.428335 199.962179   255 4676
                             Term
1          organic acid transport
2         organic anion transport
3   monocarboxylic acid transport
4 cellular response to copper ion
5       inorganic anion transport
6   establishment of localization
>

ADD REPLY • link 15 months ago James W. MacDonald 67k

0

Entering edit mode

Hi Jerina, I agree with James' comments on orgDb. Since getEnrichedGO relies on a complete orgDb with AnnDbBiMap, you can use other tools for the GO analysis. I am not sure which annotation file did you use to obtain your annotated peaks. However, the tomato gene id nomenclature does seem different compared to human and mouse. You may need to convert them into Entrez id, see this post. Alternatively, it might be easier to use tools specifically designed for plant GO analysis like this one.

Best,

ADD REPLY • link 15 months ago Kai Hu ▴ 70

0

Entering edit mode

There's no need to change the IDs.

> head(keys(org.Slycopersicum.eg.db))
[1] "543501" "543502" "543503" "543504" "543506" "543507"

But do note that when using a bare package, you have to use the object itself rather than the object's name

## just use random IDs
> geneIds <- head(keys(org.Slycopersicum.eg.db), 250)
> univ <- keys(org.Slycopersicum.eg.db)
> p <- new("GOHyperGParams", geneIds = geneIds, universeGeneIds = univ, annotation = org.Slycopersicum.eg.db, ontology = "BP", conditional = TRUE)
> hyp <- hyperGTest(p)
'select()' returned 1:many mapping between keys and columns
> head(summary(hyp, categorySize = 10))
      GOBPID       Pvalue OddsRatio  ExpCount Count Size
1 GO:0009664 4.455525e-10 23.537068 0.6661198    10   29
2 GO:0000079 2.748413e-07 12.490809 0.9417555     9   41
3 GO:0042325 5.488095e-07  9.482956 1.3092699    10   57
4 GO:0009653 7.286253e-07  6.366417 2.4118130    13  105
5 GO:0051726 1.259179e-06  6.647343 2.1361772    12   93
6 GO:0045859 1.643755e-06  9.738359 1.1484824     9   50
                                                                     Term
1                                       plant-type cell wall organization
2 regulation of cyclin-dependent protein serine/threonine kinase activity
3                                           regulation of phosphorylation
4                                      anatomical structure morphogenesis
5                                                regulation of cell cycle
6                                   regulation of protein kinase activity

ADD REPLY • link 15 months ago James W. MacDonald 67k

0

Entering edit mode

Right, the org.Slycopersicum.eg.db comes with Entrez ID. Then, just ensure that the feature id type in the annotated peak is also using Entrez ID and James' solution should do the job.

ADD REPLY • link 15 months ago Kai Hu ▴ 70

0

Entering edit mode

I used biomaRt to annotate my peaks.

ensembl <- useMart("plants_mart", host = "https://plants.ensembl.org")
ensembl <- useMart("plants_mart", dataset = "slycopersicum_eg_gene",
                   host = "https://plants.ensembl.org")

annoDataMart_TSS <- getAnnotation(ensembl, featureType = "TSS")

annotatedPeak_H2O_K4me3_1 <- annotatePeakInBatch(peak_H2O_K4me3_1_filtered,
                                         AnnotationData = annoDataMart_TSS)

When I run your code, I still get an error message (probably because my annotated peaks do not use Entrez ID).

geneIds <- unique(annotatedPeak_H2O_K4me3_1$feature)

head(geneIds)

[1] "Solyc01g005000.3" "Solyc01g005010.3" "Solyc01g005020.3" "Solyc01g005030.3" "Solyc01g005070.3" "Solyc01g005080.3"

univ <- keys(org.Slycopersicum.eg.db)

p <- new("GOHyperGParams", geneIds = geneIds, universeGeneIds = univ, annotation = org.Slycopersicum.eg.db, ontology = "BP", conditional = TRUE)

Error in makeValidParams(.Object) : no geneIds in universeGeneIdsFALSE In addition: Warning message: In makeValidParams(.Object) : removing geneIds not in universeGeneIds

ADD REPLY • link 15 months ago Jerina ▴ 10

1

Entering edit mode

Then, you probably need to convert the IDs into Entrez ID, refer to the post in my previous reply. Or use the web tool designed for plant GO analysis also mentioned in my previous reply.

ADD REPLY • link 15 months ago Kai Hu ▴ 70

0

Entering edit mode

Or just run getAnnotation using the org.Slycopersicum.eg.db object that you already have in your possession. Which seems the easier way to go, but ymmv.

ADD REPLY • link 15 months ago James W. MacDonald 67k

0

Entering edit mode

Do you mean like this:

getAnnotation(org.Slycopersicum.eg.db, c("TSS", "miRNA", "Exon", "5utr", "3utr", "ExonPlusUtr", "transcript")))

Because if yes, then it doesn't look possible because a mart object is required for this.

Error in getAnnotation(org.Slycopersicum.eg.db, c("TSS", "miRNA", "Exon",  : 
  No valid mart object is passed in!

ADD REPLY • link 15 months ago Jerina ▴ 10

0

Entering edit mode

Sorry. I meant addGeneIDs, which does what you might imagine. If you have used Ensembl based annotations (e.g., you used a GTF or EnsDb for annotatePeakInBatch), then you might not want to use GOstats. Instead you could use topGO. I'll leave it to you to read/understand how that package works.

ADD REPLY • link 15 months ago James W. MacDonald 67k