Entering edit mode
Hi, I need to get the ensembl ID for the genes in my dataset. I believe I have gene symbols, but I can not figure out what are these gene names and how do I convert them to ensembl IDs.
> dge[[1]][grep("-",x = dge[[1]]$Gene),]
# A tibble: 25,989 × 5
Gene baseMean pvalue padj FoldChange
<chr> <dbl> <chr> <chr> <chr>
1 RP24-342J3.4 0.101 0.92788346559582402 NA -1.3769054206920972
2 CH36-217G15.3 0 NA NA NA
3 RP23-280C13.3 0 NA NA NA
4 RP24-235N21.3 0 NA NA NA
5 RP23-434H18.4 0 NA NA NA
6 RP23-4K22.1 0 NA NA NA
7 RP24-117F20.3 0 NA NA NA
8 RP23-363J15.4 0 NA NA NA
9 RP23-112D14.1 0 NA NA NA
10 RP23-132K20.4 0.672 0.80928984414463001 NA -1.7080714557114776
# … with 25,979 more rows
I have tried using AnnotationDb as well as biomart assuming these are symbols. But clearly they are not.
dge2 <- dge %>% map(
mutate,
"ensemble_gene_id" = mapIds(org.Mm.eg.db,
key = Gene, keytype = "SYMBOL",
column = "ENSEMBL",
multiVals = "first"))
> dge2[[1]][grep("-",x = dge2[[1]]$Gene),]
# A tibble: 25,989 × 6
Gene baseMean pvalue padj FoldChange ensemble_gene_id
<chr> <dbl> <chr> <chr> <chr> <chr>
1 RP24-342J3… 0.101 0.9278834655958… NA -1.376905420692… NA
2 CH36-217G1… 0 NA NA NA NA
3 RP23-280C1… 0 NA NA NA NA
4 RP24-235N2… 0 NA NA NA NA
5 RP23-434H1… 0 NA NA NA NA
6 RP23-4K22.1 0 NA NA NA NA
7 RP24-117F2… 0 NA NA NA NA
8 RP23-363J1… 0 NA NA NA NA
9 RP23-112D1… 0 NA NA NA NA
10 RP23-132K2… 0.672 0.8092898441446… NA -1.708071455711… NA
Any suggestions will be appreciated.
I guess these are TPF sequences. But surely you must know where you got this dataset from and hence what the data refers to??
Mouse genes are not typically all capitals, but I was able to find a reference to a mouse gene in NCBI to CH36-217G15. Maybe this is not Ensembl data? Without knowing how the data was generated, it's going to be very difficult to determine what these gene symbols mean.