Question

Converting gene symbol list to Entrez IDs

0

Entering edit mode

imalumberjack ▴ 10

@imalumberjack-15042

Last seen 6.7 years ago

Hello all,

I'm not very experienced with bioconductor and R, and I am struggling with converting a list of gene symbols I've read in from a .csv file into R into their relevant ENTREZ ID(s). I was wondering if anyone had any tips for how to address this? The code I'd been attempting to use was the following:

>prog<-read.csv(file="mydata.csv," header=TRUE, sep="/")

> gns<-select(org.Hs.eg.db, prog, c("ENTREZID","GENENAME"))

Error in .testForValidKeys(x, keys, keytype, fks) :

None of the keys entered are valid keys for 'ENTREZID'. Please use the keys method to see a listing of valid arguments.

Many thanks for your help!

org.hs.eg.db entrez gene identifiers genesymbols • 15k views

ADD COMMENT • link updated 7.2 years ago by mat149 ▴ 80 • written 7.2 years ago by imalumberjack ▴ 10

0

Entering edit mode

Where did you get the list of gene symbols from? From a published paper? I ask this many published sources include gene symbols that are no longer current official symbols.

Your file has a "csv" extension, suggesting that it is a comma-separated file, but then you specify sep="/". What gives with that? Can you show us the first few lines of your file? Does your data file have a column containing gene symbols?

What will you do with the Entrez Gene Ids when you get them? What will be the next step?

ADD REPLY • link 7.2 years ago Gordon Smyth 52k

score 1 · Answer 1 · 2018-02-17

1

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 1 day ago

United States

You are passing a data.frame to select, rather than a character vector. Presumably one of the columns of prog contains the Entrez Gene IDs, so you should subset to that column. Also note that the default of read.csv is to convert strings to factors, so you should probably include stringsAsFactors = FALSE to your call to read.csv.

ADD COMMENT • link 7.2 years ago James W. MacDonald 68k

score 1 · Answer 2 · 2018-02-18

Here is a code chunk that I use to convert zebrafish gene symbols to Entrez gene ID's:

("t" in this case is of class character with random genes that I'm interested in, but you can use your "read.csv" object)

library(org.Dr.eg.db)
keytypes(org.Dr.eg.db)
library(clusterProfiler)

t <- c("lepa","lepr","lepb","leprot")
et <- bitr(t, fromType="SYMBOL", toType=(c("ENTREZID","PATH","GO","ALIAS","GENENAME")), OrgDb="org.Dr.eg.db")
head(et)

and the reverse:

tt<-c("100150233","567241","564348","550484")
ett <- bitr(tt, fromType="ENTREZID", toType="SYMBOL", OrgDb="org.Dr.eg.db")
head(ett)