Entering edit mode
Hi Praveen,
Praveen Surendran wrote:
> Dear Herve Pages,
>
> I am working on the identification of non-synonymous snp's in humans
> from an Affymetrix Data Source.
> Currently I am using a version of bioconductor package which will
fetch
> the information on these snps with the variation and provides
> the information on whether a snp is non-synonymous or not.
> But I just found that the database does not have enough information
on
> all the non-synonymous snp's and would like to use this this
> bioconductor package.
>
> Please have your comments on whether I will be able to use the
package
> to get information on whether the snp is non-synonymous from dbsnp
using
> this package.
Please post to the Bioconductor mailing list (I'm cc'ing it right
now).
You'll benefit from a wider audience and the answers you will get will
be archived so other people can find them and refer to them in the
future.
If I understand correctly you want to be able to determine whether
the SNPs stored in SNPlocs.Hsapiens.dbSNP.20071016 are synonymous or
not.
Note that you give very little information about which BioC package
you
are currently using to fetch the information for the Human SNPs, where
the
package is fetching them from and why it "does not have enough
information".
SNPlocs.Hsapiens.dbSNP.20071016 only provides the locations and
alleles of a SNP (see this recent thread for the details
https://stat.ethz.ch/pipermail/bioconductor/2009-February/026231.html)
so it's unlikely that you will get more information by using this
package than by "fetching SNPs" directly from a public database like
dbSNP.
The information in SNPlocs.Hsapiens.dbSNP.20071016 was retrieved
from dbSNP, from this location to be precise:
ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat/
(note that the content of this folder has been updated since
SNPlocs.Hsapiens.dbSNP.20071016 was made).
My understanding is that in order to determine whether a SNP is
synonymous
or not you need to know the context of the SNP i.e. does it occur in a
gene?
if yes, which strand does the gene belong too? does it occur in a
codon
and where in the codon i.e. at position 1, 2 or 3? Also a SNP can have
more
than 1 alternate allele, some of them can be synonymous to the
reference allele,
other not.
For example SNP with RefSNP id 6474828 (in chr9), has alleles C, G and
T:
> library(Biostrings)
> library(SNPlocs.Hsapiens.dbSNP.20071016)
> chr9snps <- getSNPlocs("chr9")
> subset(chr9snps, RefSNP_id=="6474828")
RefSNP_id alleles_as_ambig loc
61589 6474828 B 14279138
> IUPAC_CODE_MAP[["B"]]
[1] "CGT"
The reference allele (T) can be determined by looking at the reference
genome:
> library(BSgenome.Hsapiens.UCSC.hg18)
> dna <- subseq(unmasked(Hsapiens$chr9), 14279138-2, 14279138+2)
> dna
5-letter "DNAString" instance
seq: TATAC
The UCSC genome browser will confirm this and will also show that this
chromosome location is inside a gene (NFIB) that belongs to the minus
strand. So the coding DNA is:
> codingdna <- reverseComplement(dna)
> codingdna
5-letter "DNAString" instance
seq: GTATA
Note that letters in this short sequences are on the minus strand of
chr9
but now at positions 14279138+2 to 14279138-2 in this order. The SNP
is
at position 14279138 (letter A), and the set of alleles originally
reported for the plus strand (C, G, T) now becomes G, C and A.
I don't know if the SNP belongs to a codon (this would need to be
checked)
but in case it did, I would also need to know its position in the
codon.
If it's at position 1:
GTATA
123
> GENETIC_CODE[c("ATA", "CTA", "GTA")]
ATA CTA GTA
"I" "L" "V"
so no alternate allele is synonymous to the reference allele for this
SNP.
If it's at position 2:
GTATA
123
> GENETIC_CODE[c("TAT", "TCT", "TGT")]
TAT TCT TGT
"Y" "S" "C"
same conclusion.
But if it's at position 3:
GTATA
123
> GENETIC_CODE[c("GTA", "GTC", "GTG")]
GTA GTC GTG
"V" "V" "V"
then all alleles are synonymous.
Hope this helps,
H.
>
> Appreciate your kind attention on this query.
>
> Kind Regards,
>
> Praveen Surendran
> Shields Lab.
> School of Medicine & Medical Science.
> Complex & Adaptive Systems Laboratory (CASL).
> 8 Belfield Office.
> University College Dublin (UCD).
> Dublin 4, Ireland.
> Mob : +353 8793 13071
> Off : +353 171 65334
>
> --------------------------------------------------------------------
----
> Unlimited freedom, unlimited storage. Get it now
> <http: in.rd.yahoo.com="" tagline_mail_2="" *http:="" help.yahoo.com="" l="" in="" y="" ahoo="" mail="" yahoomail="" tools="" tools-08.html=""/>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319