Question

Trying to use enrichGO

1

Entering edit mode

fernanda.backsouza ▴ 30

@68c324c6

Last seen 6 months ago

Brazil

Hey!

I'm trying to make some analysis using enrichGO from clusterProfiler, but I don't know what I doing wrong. Here my script:

enrichGO(

gene = ortologos_filtrados$Name,

OrgDb = org.Cs.eg.db,

keyType = "ENTREZID",

ont = "MF",

pvalueCutoff = 0.05,

pAdjustMethod = "BH",

universe = universe_genes,

qvalueCutoff = 0.2,

minGSSize = 10,

maxGSSize = 500,

readable = FALSE,

pool = FALSE ) ``` I'm using a vector in "gene" and "universe".

When I submit my my attempts this error apears: --> No gene can be mapped.... --> Expected input gene ID: 23630793,24573831,24573748,23630741,27215463,23630752 --> return NULL... NULL

Thank you.

clusterProfiler GenomicDistributionsData KEGG • 1.9k views

ADD COMMENT • link 7 months ago fernanda.backsouza ▴ 30

0

Entering edit mode

You seem to be using a name (ortologos_filtrados$Name) insted of the ENTREZ ID that the you say in keytype = "ENTREZID".

ADD REPLY • link 7 months ago Lluís Revilla Sancho ▴ 760

0

Entering edit mode

Sure, I changed it but still NULL

G0 <- enrichGO(

gene = ortologos_filtrados$Gene.ID,

OrgDb = org.Cs.eg.db,

keyType = "ENTREZID",

ont = "ALL",

pvalueCutoff = 0.05,

pAdjustMethod = "BH",

universe = universe_genes,

qvalueCutoff = 0.2,

minGSSize = 10,

maxGSSize = 500,

readable = FALSE,

pool = FALSE )

NULL

What I doing wrong? gene is a vector, org,Cs,eg,db works, the keytype is ENTREZID because all chr are numbers and universe is the ncbi database from cannabis.

ADD REPLY • link 7 months ago fernanda.backsouza ▴ 30

0

Entering edit mode

Please provide more information! In your first post you seem to report another error than in your 2nd post: No gene can be mapped... versus NULL.

So:

what happens if you set pvalueCutoff = 1; so effectively no cutoff is applied?
show the output of str(ortologos_filtrados) and head(ortologos_filtrados).
how did you obtain org.Cs.eg.db?

ADD REPLY • link 7 months ago Guido Hooiveld ★ 4.1k

0

Entering edit mode

Thank you for yout time guido, 1) When the pvalueCutoff = 1 nothing change. 2) output from str(ortologos_filtrados) data.frame': 14 obs. of 34 variables: $ Name : chr "1-deoxy-D-xylulose 5-phosphate reductoisomerase, chloroplastic" "2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase, chloroplastic" "2-C-methyl-D-erythritol 4-phosphate cytidylyltransferase, chloroplastic" "4-diphosphocytidyl-2-C-methyl-D-erythritol kinase, chloroplastic" ... $ Orthogroup : chr "OG0012904" "OG0015551" "OG0014953" "OG0016911" ... $ humulus_protein : chr "XP_062116274.1" "XP_062074424.1" "XP_062095393.1" "XP_062082389.1" ... $ cannabis_protein : chr "XP_030493319.2" "XP_030501126.2" "XP_030499547.2" "XP_030506248.2" ... $ Accession : chr "NC_083603.1" "NC_083605.1" "NC_083604.1" "NC_083602.1" ... $ Begin : int 48729217 73995929 51383576 14285717 89821944 89821944 89821944 7453761 43653134 33524926 ... $ End : int 48734093 73998084 51387823 14289934 89828179 89828179 89828179 7456681 43657173 33529159 ... $ Chromosome : chr "3" "5" "4" "2" ... $ Orientation : chr "minus" "minus" "minus" "plus" ... $ Symbol : chr "LOC115709372" "LOC115716460" "LOC115714928" "LOC115721136" ... $ Gene.ID : int 115709372 115716460 115714928 115721136 115720893 115720893 115720893 115703163 115707261 115699135 ... $ Gene.Type : chr "protein-coding" "protein-coding" "protein-coding" "protein-coding" ... $ Transcripts.accession: chr "XM_030637459.2" "XM_030645266.2" "XM_030643687.2" "XM_030650388.2" ... $ Protein.length : int 473 245 305 398 742 742 742 460 415 406 ... $ Locus.tag : chr "" "" "" "" ... $ name : chr NA NA NA NA ... $ type : chr NA NA NA NA ... $ reaction : chr NA NA NA NA ... $ graphics_name : chr NA NA NA NA ... $ x : num NA NA NA NA NA NA NA NA NA NA ... $ y : num NA NA NA NA NA NA NA NA NA NA ... $ width : num NA NA NA NA NA NA NA NA NA NA ... $ height : num NA NA NA NA NA NA NA NA NA NA ... $ fgcolor : chr NA NA NA NA ... $ bgcolor : chr NA NA NA NA ... $ graphics_type : chr NA NA NA NA ... $ coords : chr NA NA NA NA ... $ xmin : num NA NA NA NA NA NA NA NA NA NA ... $ xmax : num NA NA NA NA NA NA NA NA NA NA ... $ ymin : num NA NA NA NA NA NA NA NA NA NA ... $ ymax : num NA NA NA NA NA NA NA NA NA NA ... $ orig.id : chr NA NA NA NA ... $ pathway_id : chr NA NA NA NA ... $ showname : chr NA NA NA NA ...

head(ortologos_filtrados) Name Orthogroup humulus_protein 69 1-deoxy-D-xylulose 5-phosphate reductoisomerase, chloroplastic OG0012904 XP_062116274.1 137 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase, chloroplastic OG0015551 XP_062074424.1 138 2-C-methyl-D-erythritol 4-phosphate cytidylyltransferase, chloroplastic OG0014953 XP_062095393.1 320 4-diphosphocytidyl-2-C-methyl-D-erythritol kinase, chloroplastic OG0016911 XP_062082389.1 321 4-hydroxy-3-methylbut-2-en-1-yl diphosphate synthase (ferredoxin), chloroplastic OG0001837 XP_062082072.1 322 4-hydroxy-3-methylbut-2-en-1-yl diphosphate synthase (ferredoxin), chloroplastic OG0001837 XP_062082070.1 cannabis_protein Accession Begin End Chromosome Orientation Symbol Gene.ID Gene.Type 69 XP_030493319.2 NC_083603.1 48729217 48734093 3 minus LOC115709372 115709372 protein-coding 137 XP_030501126.2 NC_083605.1 73995929 73998084 5 minus LOC115716460 115716460 protein-coding 138 XP_030499547.2 NC_083604.1 51383576 51387823 4 minus LOC115714928 115714928 protein-coding 320 XP_030506248.2 NC_083602.1 14285717 14289934 2 plus LOC115721136 115721136 protein-coding 321 XP_060964884.1 NC_083602.1 89821944 89828179 2 plus LOC115720893 115720893 protein-coding 322 XP_060964884.1 NC_083602.1 89821944 89828179 2 plus LOC115720893 115720893 protein-coding Transcripts.accession Protein.length Locus.tag name type reaction graphics_name x y width height fgcolor 69 XM_030637459.2 473 <NA> <NA> <NA> <NA> NA NA NA NA <NA> 137 XM_030645266.2 245 <NA> <NA> <NA> <NA> NA NA NA NA <NA> 138 XM_030643687.2 305 <NA> <NA> <NA> <NA> NA NA NA NA <NA> 320 XM_030650388.2 398 <NA> <NA> <NA> <NA> NA NA NA NA <NA> 321 XM_061108901.1 742 <NA> <NA> <NA> <NA> NA NA NA NA <NA> 322 XM_061108901.1 742 <NA> <NA> <NA> <NA> NA NA NA NA <NA> bgcolor graphics_type coords xmin xmax ymin ymax orig.id pathway_id showname 69 <NA> <NA> <NA> NA NA NA NA <NA> <NA> <NA> 137 <NA> <NA> <NA> NA NA NA NA <NA> <NA> <NA> 138 <NA> <NA> <NA> NA NA NA NA <NA> <NA> <NA> 320 <NA> <NA> <NA> NA NA NA NA <NA> <NA> <NA> 321 <NA> <NA> <NA> NA NA NA NA <NA> <NA> <NA> 322 <NA> <NA> <NA> NA NA NA NA <NA> <NA> <NA>

3) I know that I'm struggling about to download a correct org.Cs.eg.db, meanwhile I used this script:

library(AnnotationHub)

hub <- AnnotationHub()

query(hub, c("cannabis","orgdb"))

org.Cs.eg.db <- hub[["AH114845"]]

org.Cs.eg.db

columns(org.Cs.eg.db)

ADD REPLY • link 7 months ago fernanda.backsouza ▴ 30

0

Entering edit mode

To increase readability: could you please reformat your post? Select the R--code, and click the CODE button... the 5th button.

ADD REPLY • link 7 months ago Guido Hooiveld ★ 4.1k

0

Entering edit mode

Thank you for yout time guido, 1) When the pvalueCutoff = 1 nothing change. 2) output from

> `str(ortologos_filtrados) data.frame': 14 obs. of 34 variables: $ Name
> : chr "1-deoxy-D-xylulose 5-phosphate reductoisomerase, chloroplastic"
> "2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase, chloroplastic"
> "2-C-methyl-D-erythritol 4-phosphate cytidylyltransferase,
> chloroplastic" "4-diphosphocytidyl-2-C-methyl-D-erythritol kinase,
> chloroplastic" ... $ Orthogroup : chr "OG0012904" "OG0015551"
> "OG0014953" "OG0016911" ... $ humulus_protein : chr "XP_062116274.1"
> "XP_062074424.1" "XP_062095393.1" "XP_062082389.1" ... $
> cannabis_protein : chr "XP_030493319.2" "XP_030501126.2"
> "XP_030499547.2" "XP_030506248.2" ... $ Accession : chr "NC_083603.1"
> "NC_083605.1" "NC_083604.1" "NC_083602.1" ... $ Begin : int 48729217
> 73995929 51383576 14285717 89821944 89821944 89821944 7453761 43653134
> 33524926 ... $ End : int 48734093 73998084 51387823 14289934 89828179
> 89828179 89828179 7456681 43657173 33529159 ... $ Chromosome : chr "3"
> "5" "4" "2" ... $ Orientation : chr "minus" "minus" "minus" "plus" ...
> $ Symbol : chr "LOC115709372" "LOC115716460" "LOC115714928"
> "LOC115721136" ... $ Gene.ID : int 115709372 115716460 115714928
> 115721136 115720893 115720893 115720893 115703163 115707261 115699135
> ... $ Gene.Type : chr "protein-coding" "protein-coding"
> "protein-coding" "protein-coding" ... $ Transcripts.accession: chr
> "XM_030637459.2" "XM_030645266.2" "XM_030643687.2" "XM_030650388.2"
> ... $ Protein.length : int 473 245 305 398 742 742 742 460 415 406 ...
> $ Locus.tag : chr "" "" "" "" ... $ name : chr NA NA NA NA ... $ type
> : chr NA NA NA NA ... $ reaction : chr NA NA NA NA ... $ graphics_name
> : chr NA NA NA NA ... $ x : num NA NA NA NA NA NA NA NA NA NA ... $ y
> : num NA NA NA NA NA NA NA NA NA NA ... $ width : num NA NA NA NA NA
> NA NA NA NA NA ... $ height : num NA NA NA NA NA NA NA NA NA NA ... $
> fgcolor : chr NA NA NA NA ... $ bgcolor : chr NA NA NA NA ... $
> graphics_type : chr NA NA NA NA ... $ coords : chr NA NA NA NA ... $
> xmin : num NA NA NA NA NA NA NA NA NA NA ... $ xmax : num NA NA NA NA
> NA NA NA NA NA NA ... $ ymin : num NA NA NA NA NA NA NA NA NA NA ... $
> ymax : num NA NA NA NA NA NA NA NA NA NA ... $ orig.id : chr NA NA NA
> NA ... $ pathway_id : chr NA NA NA NA ... $ showname : chr NA NA NA NA
> ...





> `head(ortologos_filtrados`)` Name Orthogroup humulus_protein 69
    > 1-deoxy-D-xylulose 5-phosphate reductoisomerase, chloroplastic
    > OG0012904 XP_062116274.1 137 2-C-methyl-D-erythritol
    > 2,4-cyclodiphosphate synthase, chloroplastic OG0015551 XP_062074424.1
    > 138 2-C-methyl-D-erythritol 4-phosphate cytidylyltransferase,
    > chloroplastic OG0014953 XP_062095393.1 320
    > 4-diphosphocytidyl-2-C-methyl-D-erythritol kinase, chloroplastic
    > OG0016911 XP_062082389.1 321 4-hydroxy-3-methylbut-2-en-1-yl
    > diphosphate synthase (ferredoxin), chloroplastic OG0001837
    > XP_062082072.1 322 4-hydroxy-3-methylbut-2-en-1-yl diphosphate
    > synthase (ferredoxin), chloroplastic OG0001837 XP_062082070.1
    > cannabis_protein Accession Begin End Chromosome Orientation Symbol
    > Gene.ID Gene.Type 69 XP_030493319.2 NC_083603.1 48729217 48734093 3
    > minus LOC115709372 115709372 protein-coding 137 XP_030501126.2
    > NC_083605.1 73995929 73998084 5 minus LOC115716460 115716460
    > protein-coding 138 XP_030499547.2 NC_083604.1 51383576 51387823 4
    > minus LOC115714928 115714928 protein-coding 320 XP_030506248.2
    > NC_083602.1 14285717 14289934 2 plus LOC115721136 115721136
    > protein-coding 321 XP_060964884.1 NC_083602.1 89821944 89828179 2 plus
    > LOC115720893 115720893 protein-coding 322 XP_060964884.1 NC_083602.1
    > 89821944 89828179 2 plus LOC115720893 115720893 protein-coding
    > Transcripts.accession Protein.length Locus.tag name type reaction
    > graphics_name x y width height fgcolor 69 XM_030637459.2 473 <NA> <NA>
    > <NA> <NA> NA NA NA NA <NA> 137 XM_030645266.2 245 <NA> <NA> <NA> <NA>
    > NA NA NA NA <NA> 138 XM_030643687.2 305 <NA> <NA> <NA> <NA> NA NA NA
    > NA <NA> 320 XM_030650388.2 398 <NA> <NA> <NA> <NA> NA NA NA NA <NA>
    > 321 XM_061108901.1 742 <NA> <NA> <NA> <NA> NA NA NA NA <NA> 322
    > XM_061108901.1 742 <NA> <NA> <NA> <NA> NA NA NA NA <NA> bgcolor
    > graphics_type coords xmin xmax ymin ymax orig.id pathway_id showname
    > 69 <NA> <NA> <NA> NA NA NA NA <NA> <NA> <NA> 137 <NA> <NA> <NA> NA NA
    > NA NA <NA> <NA> <NA> 138 <NA> <NA> <NA> NA NA NA NA <NA> <NA> <NA> 320
    > <NA> <NA> <NA> NA NA NA NA <NA> <NA> <NA> 321 <NA> <NA> <NA> NA NA NA
    > NA <NA> <NA> <NA> 322 <NA> <NA> <NA> NA NA NA NA <NA> <NA> <NA>

3) I know that I'm struggling about to download a correct org.Cs.eg.db, meanwhile I used this script:

library(AnnotationHub)

hub <- AnnotationHub()

query(hub, c("cannabis","orgdb"))

org.Cs.eg.db <- hub[["AH114845"]]

org.Cs.eg.db

columns(org.Cs.eg.db)

ADD REPLY • link 7 months ago fernanda.backsouza ▴ 30

0

Entering edit mode

That did NOT really improve things...... Likely I wasn't clear enough, but with R-code I meant both the command you typed, as well as the output that is returned, in the R-console. Please try again; just edit you last post and apply the CODE box.

ADD REPLY • link 7 months ago Guido Hooiveld ★ 4.1k

0

Entering edit mode

I'm new here, but its better now?

ADD REPLY • link 7 months ago fernanda.backsouza ▴ 30

1

Entering edit mode

It is indeed better.

Anyway, did you notice this part in the output from str(ortologos_filtrados):

$Gene.ID : int 115709372 115716460 115714928 115721136 115720893 115720893 115720893 115703163 115707261 115699135

??

This means the input is considered as integers (= numbers), but enrichGO requires a character vector as input!

So change the 2nd line of your code to: gene = as.character( ortologos_filtrados$Gene.ID ),

I also noticed that your input consists of only 34 genes. That is not a lot.

ADD REPLY • link 7 months ago Guido Hooiveld ★ 4.1k

1

Entering edit mode

Your awnser at Cannabis orgDb solve my problems using enrichGO by far. I'm underdegree biotechnology student, and I learning bioinformatic by mylself and your help saved me to waist a lot of time. Thank you.

ADD REPLY • link 7 months ago fernanda.backsouza ▴ 30

0

Entering edit mode

Yes, I'm working only with genes from terpenoids pathway (14 genes), so I can't use enrichGO? Can you recommend a package or something like this? I just want to do a gaph using betweenness centrality from GO of this 14 genes

ADD REPLY • link 7 months ago fernanda.backsouza ▴ 30

1

Entering edit mode

Nothing would prevent you from using the enrichGO function (or any other function for over-representation analysis [ORA]) with only 14 genes as input, but you should wonder what the (biological) relevance is of such analysis with that low number of input genes.

Based on your comment/goal it seems that the functionality that is provided through the clusterProfiler package is not what you are looking for; clusterProfiler contains a set of functions that help to interpret 'the biology', that is represented in lists of genes using (biological) information available in the Gene Ontology, KEGG, or WikiPathway databases (or any other collection of gene sets), by means of statistical analyses (i.e. over-representation analysis [ORA] or gene set enrichment analysis [GSEA]).

You apparently would like to perform a kind of network analysis based on a network as represented by a specific pathway or GO category. That is something completely different! I don't have any hands-on experience with that myselves. Based on the type of metrics you mention I suggest you have a look at the functions available through the igraph package (link), and through your other post you got some pointers on how to import a KEGG pathway. For import/analysis of GO data/networks you may, for example, want to check the GOxploreR package (link). Good luck!

ADD REPLY • link 7 months ago Guido Hooiveld ★ 4.1k

1

Entering edit mode

With all my love, thank you Guido, GOxploreR beeing a big ally for me. I don't know how to be grateful right now.

ADD REPLY • link 7 months ago fernanda.backsouza ▴ 30