Question

Suggestion for FGSEA and GSEA creation for C57BL/6J Mice

0

Entering edit mode

Gordon • 0

@083bbd83

Last seen 4 months ago

United States

Hi all,

I am trying to create GSEA and FGSEA for C57BL/6J Mice species. I had created a code for human species, and thought to switch out all human input as mice. However, I ran into some issues,

res <- res %>%
  mutate(rank = rank(log2FoldChange, ties.method = "random")) 
ens2symbol <- AnnotationDbi::select(org.Mm.eg.db,
                                    key=res$row, 
                                    columns="SYMBOL",
                                    keytype="ENTREZID")

Error in .testForValidKeys(x, keys, keytype, fks) : 'keys' must be a character vector

The input of column row is ENSMUSG* gmt pathway is mh.all.v2024.1.Mm.symbols.gmt

Any suggestions?

gsea mice org.Mm.eg.db • 1.2k views

ADD COMMENT • link updated 5 months ago by James W. MacDonald 68k • written 5 months ago by Gordon • 0

0

Entering edit mode

what is the class and values of res$row ?

ADD REPLY • link 5 months ago shepherl 4.1k

0

Entering edit mode

It appeared as "NULL"

Res input is: "row","baseMean","log2FoldChange","lfcSE","stat","pvalue","padj" Data shows 7 variables with row as first column.

ENSMUSG00000101249 23365.54446 1.411235 0.3365968 4.192657 2.757065e-05 1.026079e-02

ADD REPLY • link 5 months ago Gordon • 0

0

Entering edit mode

If you get NULL for something in R, it means it's not there. You don't show the head of res, but instead one row, so my guess would be the column is actually named something different like 'Row', or maybe those are the rownames rather than a column.

ADD REPLY • link 5 months ago James W. MacDonald 68k

0

Entering edit mode

As an example,

> d.f <- data.frame(Row = letters, values = 1:26)
> d.f$row
NULL

ADD REPLY • link 5 months ago James W. MacDonald 68k

0

Entering edit mode

> head(res)
# A tibble: 6 × 9

row                baseMean log2FoldChange lfcSE  stat        pvalue      padj  rank SYMBOL
  <chr>                 <dbl>          <dbl> <dbl> <dbl>         <dbl>     <dbl> <int> <chr> 
1 ENSMUSG00000101249   23366.           1.41 0.337  4.19 0.0000276     0.0103      123 NA    
2 ENSMUSG00000024610   12719.          -1.96 0.506 -3.87 0.000109      0.0262       91 Cd74  
3 ENSMUSG00000069516    7123.          -2.22 0.374 -5.94 0.00000000281 0.0000115    77 Lyz2  
4 ENSMUSG00000076617    7631.          -1.94 0.355 -5.48 0.0000000437  0.0000813    92 Ighm  
5 ENSMUSG00000076609    7438.          -2.31 0.406 -5.68 0.0000000135  0.0000459    68 Igkc  
6 ENSMUSG00000060586    5376.          -1.86 0.473 -3.92 0.0000871     0.0217       96 H2-Eb1

This was the depiction using as head(res)

ADD REPLY • link updated 5 months ago by James W. MacDonald 68k • written 5 months ago by Gordon • 0

score 0 · Answer 1 · 2024-10-22

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 1 day ago

United States

Oh. You used 'key' as the argument name when it's actually 'keys'. Also, those are Ensembl IDs, so you need to use 'ENSEMBL', not "ENTREZID".

ADD COMMENT • link 5 months ago James W. MacDonald 68k

0

Entering edit mode

Ah I see. Thank you.

For that particular species. I used the GSEA GMT file for mice. Is it better to use Genome assembly: C57BL_6NJ_v1 (GCA_001632555.1)?

ADD REPLY • link 5 months ago Gordon • 0

1

Entering edit mode

It's not a genome assembly question, but instead it's an annotation question, and from whom you get the data. NCBI and EBI/EMBL use different methods to define transcripts, and unsurprisingly come up with different results (they have been working for years now to come up with just one transcript per gene they can agree on, so you can imagine how complicated it is).

There is no profit in dealing with that complexity, so my longstanding recommendation is to pick one and stick with it for an analysis. Don't try to map from NCBI to Ensembl IDs, because you will have genes that remain unannotated because of the disagreements between NCBI and EBI/EMBL which is pointless. As an example,

> ids <- paste0("ENSMUSG", sprintf("%011d", c(101249,24610,69516,76617,76609,60586)))
> ids
[1] "ENSMUSG00000101249" "ENSMUSG00000024610" "ENSMUSG00000069516" "ENSMUSG00000076617" "ENSMUSG00000076609"
[6] "ENSMUSG00000060586"
> library(org.Mm.eg.db)
> library(AnnotationHub)
> hub <- AnnotationHub()
snapshotDate(): 2024-04-30
> query(hub, c("ensdb","mus","musculus"))
AnnotationHub with 98 records
# snapshotDate(): 2024-04-30
# $dataprovider: Ensembl
# $species: Mus musculus, Balaenoptera musculus, Mus musculus musculus, Mus musculus domesticus, Mus musculus c...
# $rdataclass: EnsDb
# additional mcols(): taxonomyid, genome, description, coordinate_1_based, maintainer, rdatadateadded,
#   preparerclass, tags, rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH53222"]]' 

             title                                        
  AH53222  | Ensembl 87 EnsDb for Mus Musculus            
  AH53726  | Ensembl 88 EnsDb for Mus Musculus            
  AH56691  | Ensembl 89 EnsDb for Mus Musculus            
  AH57770  | Ensembl 90 EnsDb for Mus Musculus            
  AH60788  | Ensembl 91 EnsDb for Mus Musculus            
  ...        ...                                          
  AH116905 | Ensembl 112 EnsDb for Mus musculus           
  AH116906 | Ensembl 112 EnsDb for Mus musculus           
  AH116907 | Ensembl 112 EnsDb for Mus musculus musculus  
  AH116908 | Ensembl 112 EnsDb for Mus musculus domesticus
  AH116909 | Ensembl 112 EnsDb for Mus musculus           
Warning message:
call dbDisconnect() when finished working with a connection 
> ensdb <- hub[["AH116909"]]
loading from cache
> select(org.Mm.eg.db, ids, "SYMBOL", "ENSEMBL")
'select()' returned 1:1 mapping between keys and columns
             ENSEMBL SYMBOL
1 ENSMUSG00000101249   <NA>
2 ENSMUSG00000024610   Cd74
3 ENSMUSG00000069516   Lyz2
4 ENSMUSG00000076617   Ighm
5 ENSMUSG00000076609   Igkc
6 ENSMUSG00000060586 H2-Eb1
> select(ensdb, ids, "GENENAME", "GENEID")
              GENEID GENENAME
1 ENSMUSG00000101249  Gm29216
2 ENSMUSG00000024610     Cd74
3 ENSMUSG00000069516     Lyz2
4 ENSMUSG00000076617     Ighm
5 ENSMUSG00000076609     Igkc
6 ENSMUSG00000060586   H2-Eb1

As the number of genes increases, the number of NA symbols you get from the OrgDb package will increase.

ADD REPLY • link 5 months ago James W. MacDonald 68k