How to get gene information
2
0
Entering edit mode
Kay Jaja ▴ 90
@kay-jaja-3481
Last seen 10.2 years ago
I have a list of 80 genes in a txt file and I am looking to use a data base, for example NCBI to get information on each of these gene. I need get the start and the end base pair position for each gene listed in my file? Any idea how to get started or what to use? Your help is greatly appreciated [[alternative HTML version deleted]]
• 1.3k views
ADD COMMENT
0
Entering edit mode
@saroj-k-mohapatra-3419
Last seen 10.2 years ago
You can do some of the work within bioconductor with the org. annotation packages. Suppose you have a list of 3 human gene symbols. > glist [1] "A1BG" "A2M" "A2MP" Using the corresponding "org." package: >library("org.Hs.eg.db") you can map the gene symbols to Entrez gene ids: > mget(glist, revmap(org.Hs.egSYMBOL)) $A1BG [1] "1" $A2M [1] "2" $A2MP [1] "3" There are many other mappings available. Look at: > ls("package:org.Hs.eg.db") If the organism is something else, use the appropriate org. package, e.g., org.Mm.eg.db The second term (Mm) is a short form combining the first letter of genus name and first letter of species name. The full list of annoatation packages are available at http://www.bioconductor.org/packages/release/data/annotation/ Saroj Kay Jaja wrote: > I have a list of 80 genes in a txt file and I am looking to use a data base, for example NCBI to get information on each of these gene. I need get the start and the end base pair position for each gene listed in my file? Any idea how to get started or what to use? > > Your help is greatly appreciated > > > > [[alternative HTML version deleted]] > > > -------------------------------------------------------------------- ---- > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
@saroj-k-mohapatra-3419
Last seen 10.2 years ago
Hi, I might have misunderstood your question the first time. Is it that you have a list of gene ids and you need to find their start and end locations on the chromosome? If so, I show an example below. I have a list with three genes: > glist [1] "CRIPAK" "CAND2" "STK25" I get the entrez gene ids: > eglist=as.character(unlist(mget(glist, revmap(org.Hs.egSYMBOL)))) > eglist [1] "285464" "23066" "10494" I find out which chromosomes these belong to: > mget(eglist, org.Hs.egCHR) $`285464` [1] "4" $`23066` [1] "3" $`10494` [1] "2" Find the start position: > mget(eglist, org.Hs.egCHRLOC) $`285464` 4 1375339 $`23066` 3 12813170 $`10494` 2 -242083104 And the end positions: > mget(eglist, org.Hs.egCHRLOCEND) $`285464` 4 1379782 $`23066` 3 12851301 $`10494` 2 -242096707 Is this what you are looking for? Best, Saroj Kay Jaja wrote: > I have a list of 80 genes in a txt file and I am looking to use a data base, for example NCBI to get information on each of these gene. I need get the start and the end base pair position for each gene listed in my file? Any idea how to get started or what to use? > > Your help is greatly appreciated > > > > [[alternative HTML version deleted]] > > > -------------------------------------------------------------------- ---- > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 474 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6