I'm not very experienced with bioconductor and R, and I am struggling with converting a list of gene symbols I've read in from a .csv file into R into their relevant ENTREZ ID(s). I was wondering if anyone had any tips for how to address this? The code I'd been attempting to use was the following:
Where did you get the list of gene symbols from? From a published paper? I ask this many published sources include gene symbols that are no longer current official symbols.
Your file has a "csv" extension, suggesting that it is a comma-separated file, but then you specify sep="/". What gives with that? Can you show us the first few lines of your file? Does your data file have a column containing gene symbols?
What will you do with the Entrez Gene Ids when you get them? What will be the next step?
You are passing a data.frame to select, rather than a character vector. Presumably one of the columns of prog contains the Entrez Gene IDs, so you should subset to that column. Also note that the default of read.csv is to convert strings to factors, so you should probably include stringsAsFactors = FALSE to your call to read.csv.
Here is a code chunk that I use to convert zebrafish gene symbols to Entrez gene ID's:
("t" in this case is of class character with random genes that I'm interested in, but you can use your "read.csv" object)
library(org.Dr.eg.db)
keytypes(org.Dr.eg.db)
library(clusterProfiler)
t <- c("lepa","lepr","lepb","leprot")
et <- bitr(t, fromType="SYMBOL", toType=(c("ENTREZID","PATH","GO","ALIAS","GENENAME")), OrgDb="org.Dr.eg.db")
head(et)
and the reverse:
tt<-c("100150233","567241","564348","550484")
ett <- bitr(tt, fromType="ENTREZID", toType="SYMBOL", OrgDb="org.Dr.eg.db")
head(ett)
Where did you get the list of gene symbols from? From a published paper? I ask this many published sources include gene symbols that are no longer current official symbols.
Your file has a "csv" extension, suggesting that it is a comma-separated file, but then you specify sep="/". What gives with that? Can you show us the first few lines of your file? Does your data file have a column containing gene symbols?
What will you do with the Entrez Gene Ids when you get them? What will be the next step?