Question

What are these genes ? [How to get Ensembl IDs for them]

0

Entering edit mode

prabin.dm • 0

@prabindm-9986

Last seen 4 months ago

United States

Hi, I need to get the ensembl ID for the genes in my dataset. I believe I have gene symbols, but I can not figure out what are these gene names and how do I convert them to ensembl IDs.

> dge[[1]][grep("-",x = dge[[1]]$Gene),]
# A tibble: 25,989 × 5
   Gene          baseMean pvalue              padj  FoldChange         
   <chr>            <dbl> <chr>               <chr> <chr>              
 1 RP24-342J3.4     0.101 0.92788346559582402 NA    -1.3769054206920972
 2 CH36-217G15.3    0     NA                  NA    NA                 
 3 RP23-280C13.3    0     NA                  NA    NA                 
 4 RP24-235N21.3    0     NA                  NA    NA                 
 5 RP23-434H18.4    0     NA                  NA    NA                 
 6 RP23-4K22.1      0     NA                  NA    NA                 
 7 RP24-117F20.3    0     NA                  NA    NA                 
 8 RP23-363J15.4    0     NA                  NA    NA                 
 9 RP23-112D14.1    0     NA                  NA    NA                 
10 RP23-132K20.4    0.672 0.80928984414463001 NA    -1.7080714557114776
# … with 25,979 more rows

I have tried using AnnotationDb as well as biomart assuming these are symbols. But clearly they are not.

dge2 <- dge %>% map(
            mutate,
                    "ensemble_gene_id" = mapIds(org.Mm.eg.db,
                                              key = Gene, keytype = "SYMBOL", 
                                              column = "ENSEMBL",
                                              multiVals = "first")) 

> dge2[[1]][grep("-",x = dge2[[1]]$Gene),]
# A tibble: 25,989 × 6
   Gene        baseMean pvalue           padj  FoldChange       ensemble_gene_id
   <chr>          <dbl> <chr>            <chr> <chr>            <chr>           
 1 RP24-342J3…    0.101 0.9278834655958… NA    -1.376905420692… NA              
 2 CH36-217G1…    0     NA               NA    NA               NA              
 3 RP23-280C1…    0     NA               NA    NA               NA              
 4 RP24-235N2…    0     NA               NA    NA               NA              
 5 RP23-434H1…    0     NA               NA    NA               NA              
 6 RP23-4K22.1    0     NA               NA    NA               NA              
 7 RP24-117F2…    0     NA               NA    NA               NA              
 8 RP23-363J1…    0     NA               NA    NA               NA              
 9 RP23-112D1…    0     NA               NA    NA               NA              
10 RP23-132K2…    0.672 0.8092898441446… NA    -1.708071455711… NA

Any suggestions will be appreciated.

AnnotationDbi biomaRt • 1.5k views

ADD COMMENT • link updated 3.7 years ago by abf ▴ 30 • written 3.7 years ago by prabin.dm • 0

1

Entering edit mode

I guess these are TPF sequences. But surely you must know where you got this dataset from and hence what the data refers to??

ADD REPLY • link 3.7 years ago Gordon Smyth 52k

0

Entering edit mode

Mouse genes are not typically all capitals, but I was able to find a reference to a mouse gene in NCBI to CH36-217G15. Maybe this is not Ensembl data? Without knowing how the data was generated, it's going to be very difficult to determine what these gene symbols mean.

ADD REPLY • link 3.7 years ago abf ▴ 30