"associated_gene" attribute from biomaRt returns too few values - SNP(rsID dataset)
1
0
Entering edit mode
aroso491 • 0
@aroso491-23952
Last seen 4.3 years ago

I am trying to use biomaRt in order to retrieve the gene symbols of a small (126 entries) list of SNPs referenced by rsIDs, chromosome, and bp position. I am trying to use biomaRt to retrieve their gene symbols and their SNP rsIDs as a dataframe. Because I will want to do some manual checking, my code is written so that I also retrieve the ensembl ID along with the other two features.

My query uses the 'snp' mart because of the 'snpfilter' (rsID) filtering options and because I thought I had identified the gene attributes that return exactly what I was looking for ('ensembl gene stable id' and 'associated gene'), but the results turn out to be quite disappointing as there are really very few hits in the associatedgene column.

At first I thought that perhaps this had to do with a lack of gene symbol association in the ensembl database but I have checked a few of the ensembl IDs that return no associated_gene symbol and they do show a gene name.

Because it was a fairly small list I was hoping I could do it with biomaRt since I have already managed to get familiar with it and I have some time constraints, but if anyone can suggest an alternative way to get the gene symbols list I am listening!

Here is my bit of code to provide some context. snp = useMart("ENSEMBLMARTSNP", host="grch37.ensembl.org", path="/biomart/martservice", dataset="hsapiens_snp")

results<-c() 
for (i in 1:dim(trim_SSNP_W)[1]){
  temp <- getBM(attributes = c('refsnp_id', 'ensembl_gene_stable_id', 'associated_gene'), 
                filters = c('snp_filter'), 
                values = list(trim_SSNP_W[i,1]), 
                mart = snp, 
                uniqueRows = TRUE)  
  results <- rbind(results,temp)
}

I know that a for loop is not exactly standard coding in R but I am just learning and a bit more used to other programming languages (Java and Python) and I am struggling a bit with the compact coding style of R.

Thanks in advance, any suggestions are hugely welcome! Alejandra

r biomart SNP annotation HUGO • 2.1k views
ADD COMMENT
1
Entering edit mode

Hey Alejandra ( and I saw your post on Biostars: https://www.biostars.org/p/336724/#453535 ), could you share some of the rs IDs that return no associated gene? I am not sure, but, running this as a for loop could result in your IP address being blacklisted due to repeat and rapid requests to the Ensembl servers. You can pass all IDs as a vector to values, and then just run getBM() once.

ADD REPLY
0
Entering edit mode

Oh, sorry, I did not know that was a thing =S I will try and look how to do that...

Here is the output I obtain for some of the rsIDs:

rs9919670   ENSG00000149294
rs35175834  ENSG00000259221
rs10226228  ENSG00000154678
rs986391    ENSG00000145934 
rs3742365   ENSG00000100711
ADD REPLY
0
Entering edit mode

Those appear to be RSIDs that do return an associated gene. It would be helpful to have a vector of those that do not.

ADD REPLY
1
Entering edit mode

I think these are examples that are missing a value for the associated_gene attribute.

ADD REPLY
0
Entering edit mode

Ah, I get it. But the associated_gene is the 'Associated gene with phenotype' which appears to be information that links a given variant with a particular gene based on a particular study, unless I misunderstand the Phenotype Annotation section on Biomart.

If the OP is actually trying to get the HUGO gene symbol, I am not sure you can get that from the SNP mart, can you?

ADD REPLY
2
Entering edit mode

Or do you have to do it the hard way:

> mart <- useMart("ENSEMBL_MART_SNP", host="grch37.ensembl.org", path="/biomart/martservice", dataset="hsapiens_snp")
> z <- getBM(c('refsnp_id', 'ensembl_gene_stable_id', 'associated_gene'), "snp_filter", rsids, mart)
> mart2 <- useEnsembl("ensembl","hsapiens_gene_ensembl")
> zz <- getBM(c("ensembl_gene_id","hgnc_symbol"), "ensembl_gene_id", z[,2], mart2)
> merge(z, zz, by.x = "ensembl_gene_stable_id", by.y = "ensembl_gene_id")
  ensembl_gene_stable_id  refsnp_id associated_gene hgnc_symbol
1        ENSG00000137872 rs35175834              NA      SEMA6D
2        ENSG00000149294  rs9919670              NA       NCAM1
3        ENSG00000163046 rs10206008              NA   ANKRD30BL
4        ENSG00000259221 rs35175834              NA            
ADD REPLY
1
Entering edit mode

Yep, I think this is an example where the attribute name is pretty ambiguous taken on it's own, and you really need the full text name to get a little more insight.

This two step approach is what I'd do too.

ADD REPLY
0
Entering edit mode

Oh, I see the issue there now, thank you so much for that!! I absolutely misunderstood what the associatedgene did! I thought it was somewhat analogous to the "hgncsymbol"... I will be taking the hard way as that is exactly the output that I want. I was not sure if I could extract the HUGO gene symbol (I am pretty new at this and just learnt today that it's called HUGO...)

Thanks!!

ADD REPLY
0
Entering edit mode

Those appear to be RSIDs that do return an associated gene. It would be helpful to have a vector of those that do not.

ADD REPLY
1
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 6 hours ago
EMBL Heidelberg

Just to expand on James' comment above, I don't think the associated_gene attribute represents what you think it does. The full name for the attribute (which I got by browsing the Ensembl BioMart web interface) is "Associated gene with phenotype" which might be slightly clearer. This attribute is only populated if there's some evidence from a study linking the SNP and a gene, but it doesn't necessarily mean the SNP is located within that gene.

An example of a SNP with an assoicate gene is rs1277203. If you click the "Phenotype Data" link on that page you'll see it's associated with AKNAD1

I don't think you can get the gene symbol directly from the SNP mart, you'll probably have to work with the Ensembl IDs and then convert in a second step (see https://support.bioconductor.org/p/132901/#132915 ). These conversions are always a little messy, so if it's possible to stick with only one ID type during an analysis then I would recommend doing so.

ADD COMMENT
0
Entering edit mode

Yep, absolutely right, I had it wrong from the start so that is why I was not getting what I expected. Thank you so much for taking the time to develop the answer, I can continue from here!

ADD REPLY

Login before adding your answer.

Traffic: 674 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6