Hi Kay,
Kay Jaja wrote:
> Hi ,
>
> I have a list of SNPS (rs numbers ) and I am interested in pulling
the functional data corresponding to each SNP from a data base like
ensemble, i.e.( is the gene name if the snp i sin a gene, intron,
exon, non_ synonymous snp, or synonymous snp).
> is it possible to do this in R using BioMart or any other packages?
Do you mean to ask if it is possible, or is it easy? It is certainly
possible, although it depends on exactly what you want. Your question
is
not as complete as it could be. In the future, you should try to
explain
exactly what you are trying to do rather than asking open-ended
questions.
You can get information about SNPs using biomaRt, but the available
information looks pretty sparse to me when compared to the small list
of
interests you seem to have. But you can look to see what is available
easily enough:
library(biomaRt)
mart <- useMart("snp","hsapiens_snp")
listAttributes(mart)
There are one or two vignettes that come with biomaRt that should help
you get started if you like what you see.
I generally don't use biomaRt for this sort of thing, instead
preferring
to hit the UCSC database directly. Note that what I show below might
be
done as easily using the rtracklayer package; you might explore the
vignettes for that package as well. Anyway, I would use the RMySQL
package and query directly:
library(RMySQL)
con <- dbConnect("MySQL", host = "genome-mysql.cse.ucsc.edu", dbname =
"hg18", user = "genome")
## what type of info is available?
> dbGetQuery(con, "select * from snp129 where name='rs25';")
bin chrom chromStart chromEnd name score strand refNCBI refUCSC
observed
1 673 chr7 11550666 11550667 rs25 0 - T T
A/G
molType class valid avHet avHetSE
func
1 genomic single by-cluster,by-frequency,by-hapmap 0.499586 0.014383
intron
locType weight
1 exact 1
Note two things here. First, you don't know the return order, so you
should always ask for the database to return what you are querying on
(this is true of biomaRt as well). Second, if you are querying lots of
SNPs, just do it in one big query instead of one by one. Repeatedly
querying an online database will get you banned. So say your rs IDs
are
in a vector rsid, and you want the chromosome, the position, the
bases,
and the function (intron, exon, intragenic, etc).
sql <- paste("select name, chrom, chromEnd, observed, func from snp129
where name in ('", paste(rsid, collapse = "','"), "');", sep = "")
there are a lot of ' and " in there, because we want something that
looks like this:
select name, chrom, chromEnd, observed, func from snp129 where name in
('rs25','rs26','rs27','rs28');
so you want to make sure the sql statement looks OK first. Then just
do
dat <- dbGetQuery(con, sql)
> rsid <- c("rs25","rs26","rs27","rs28")
> rsid
[1] "rs25" "rs26" "rs27" "rs28"
> sql <- paste("select name, chrom, chromEnd, observed, func from
snp129 where name in ('", paste(rsid, collapse = "','"), "');", sep =
"")
> sql
[1] "select name, chrom, chromEnd, observed, func from snp129 where
name
in ('rs25','rs26','rs27','rs28');"
> z <- dbGetQuery(con, sql)
> z
name chrom chromEnd observed func
1 rs25 chr7 11550667 A/G intron
2 rs26 chr7 11549996 -/A/G intron
3 rs27 chr7 11549750 C/G intron
4 rs28 chr7 11562590 A/G intron
Best,
Jim
>
> I appreciate your help,
> thanks
>
>
>
> [[alternative HTML version deleted]]
>
>
>
> --------------------------------------------------------------------
----
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should
not be used for urgent or sensitive issues