Convert Chromosome postions to hgnc_symbol using R
1
0
Entering edit mode
joyk2a • 0
@joyk2a-22539
Last seen 5.0 years ago

Hi,

I am trying to convert "chromosome postions" to "hgnc_symbol" using R. I referred the below site. http://statisticalrecipes.blogspot.com/2012/08/biomart-find-gene-name-using-chromosome.html

The code is below.

Load the library

library(biomaRt)

Define biomart object

mart <- useMart(biomart="ensembl", dataset="hsapiensgeneensembl")

Gives a list of all possible annotations; Currently there are 1668 listed

listAttributes(mart)

I chose to filter by: chromosome_name, start, end

listFilters(mart)

Read in tab-delimited file with three columns: chromosome number, start position and end position

positions <- read.table("positions.txt")

Extract HGNC gene symbol

results <- getBM(attributes = c("hgncsymbol", "chromosomename", "startposition", "endposition"), filters = c("chromosome_name", "start", "end"), values = list(positions[,1], positions[,2], positions[,3]), mart = mart)

================================================================================ After I got to run the code, I got this error message

*"Error in getBM(attributes = c("hgnc_symbol", "chromosome_name", "start_position", : Query ERROR: caught BioMart::Exception::Usage: Wrong format value for Start "*

My Input file format(positions.txt) start like below; chromosome_name startend 1 62920 16855942 1 16863509 16932568 ....

Would you advise to fix the error message? I appreciate for your help alot.

Thanks in advance!

Kelly

microarray • 4.3k views
ADD COMMENT
1
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 8 hours ago
EMBL Heidelberg

This seems to work for me if I run the following code based on your examples:

## Create example data.frame with genomic positions
positions <- data.frame(chromosome = c(1,1),
                        start = c(62920, 16863509),
                        end = c(16855942, 16932568))

## load package and set biomaRt dataset
library(biomaRt)
ensembl = useEnsembl(biomart='ensembl', 
                     dataset="hsapiens_gene_ensembl") 

## run query
results <- getBM(attributes = c("hgnc_symbol", "chromosome_name", "start_position", "end_position"), 
                 filters = c("chromosome_name", "start", "end"),
                 values = list(positions[,1], positions[,2], positions[,3]),
                 mart = ensembl)

I think for you're output you're running a much longer query than this with many more positions, and I wonder if biomaRt is doing something odd here.

Can you try running these commands, which will concatenate the chromosomal positions and use that in the query:

postions_combined <- apply(as.matrix(positions), 1, paste, collapse = ":")

results2 <- getBM(attributes = c("hgnc_symbol", "chromosome_name", "start_position", "end_position"), 
                 filters = c("chromosomal_region"),
                 values = postions_combined,
                 mart = ensembl)
ADD COMMENT
0
Entering edit mode

Mike, It works perfectly. Thanks alot!

One more, how can I check the reference genome version? If the reference genome version is different, can I put a command line in this R script?

I appreciate for your help!

ADD REPLY
0
Entering edit mode

The currently used reference sequence versions for all vertebrates can be found on this page: https://www.ensembl.org/info/about/species.html

By default, when you don't provide a host to the useEnsembl() function, it will use www.ensembl.org and that will then use the most recent refence geneome. This means that you results could change over time, as Ensembl make new releases every 3 months where some fraction of the data will change.

You can fix on using a specific version by providing the URL to the host argument for the version you want e.g. https://sep2019.archive.ensembl.org

You can view a full list of all available archive with listEnsemblArchives()

             name     date                                url version
1  Ensembl GRCh37 Feb 2014          http://grch37.ensembl.org  GRCh37
2      Ensembl 98 Sep 2019 http://sep2019.archive.ensembl.org      98
3      Ensembl 97 Jul 2019 http://jul2019.archive.ensembl.org      97
4      Ensembl 96 Apr 2019 http://apr2019.archive.ensembl.org      96
5      Ensembl 95 Jan 2019 http://jan2019.archive.ensembl.org      95
6      Ensembl 94 Oct 2018 http://oct2018.archive.ensembl.org      94
7      Ensembl 93 Jul 2018 http://jul2018.archive.ensembl.org      93
...
ADD REPLY
0
Entering edit mode

I am trying to find modifications in genomic location. Now, the output I am getting is in the chromosome and location of the modification (attached file). I am not sure how to convert it into a gene name using biomart or any other tool. Would you mind if you please helping me in fixing this issue? enter image description here

ADD REPLY
1
Entering edit mode

Please don't add comments on three year old posts. Instead, ask a new question.

ADD REPLY
0
Entering edit mode

I have asked a new question. If possible, please help me figure it out. Thanks

Convert Chromosome postion to gene symbol

ADD REPLY

Login before adding your answer.

Traffic: 520 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6