http://www.bioconductor.org/packages/2.12/data/annotation/html/FDb.UCS
C.snp137common.hg19.html,
http://www.bioconductor.org/packages/2.13/data/annotation/html/FDb.UCS
C.snp135common.hg19.html
,
or
http://www.bioconductor.org/packages/release/data/annotation/html/SNPl
ocs.Hsapiens.dbSNP.20120608.html
may
be handy. The first two can simply be overlapped, but are 'common'
(MAF >
0.01) SNPs only. If you want all of the SNPs that have been submitted
to
dbSNP, you need the SNPlocs package.
The UCSC snp13[5|7]common packages are compiled from newer builds of
dbSNP
than the manifest, which had some bizarre inclusions (SNPs which are >
1bp
3' to the targeted locus, for example) when we looked. I personally
screen
out common SNPs that overlap the targeted or extension base, using the
most
recent build available to me, but that's just my preference. There
are
arguments to be made for SNPs anywhere from 1 to 49 bases 5' to the
target
based on melting temperature of the oligos, and there are arguments to
be
made for genotyping all of your subjects and screening individually
for
SNPs.
Anyways, it is straightforward to dump out the probes that get hit by
common SNPs:
library(FDb.UCSC.snp137common.hg19)
commonSNPs <- features(FDb.UCSC.snp137common.hg19)
## load the data: a SummarizedExperiment is like an eSet, but with a
GRanges describing the features
my.SE <- readRDS('my.SummarizedExperiment.rds')
dimmy.SE)
## [1] 485577 11
## mask common SNPs that overlap the targeted CpG (or CpH, or SNP)
site
my.SE.noCpgSNPs <- my.SE[ countOverlapsmy.SE, commonSNPs) < 1, ]
dim(my.SE.noCpgSNPs)
## [1] 468211 11
## retain only CpG probes, and only those that do not overlap a common
SNP
my.SE.noCpgSnps.onlyCpGs
<- my.SE.noCpgSNPs[which(substr(rownames(my.SE.noCpgSNPs),1,2)== 'cg')
, ]
dim(my.SE.noCpgSnps.onlyCpGs)
## [1] 465130 11
I prefer to work on SummarizedExperiments (hence the .SE), as it makes
life
a bit easier; it also happens to be the parent class for
GenomicMethylSet,
GenomicRatioSet, etc. in minfi, so the steps are the same for those.
Working on genomic coordinates is (almost?) universally preferable in
this
respect.
YMMV...
On Fri, Jul 5, 2013 at 9:31 AM, Victoria Svinti <
victoria.svinti@igmm.ed.ac.uk> wrote:
> Hi there,
>
> I decided to post after searching the forums for a few days, in hope
that
> somebody can point me in the right direction.
>
> I am analysing a 450k methylation array to look for differentially
> methylated sites, and got as far as having normalised data. Various
> resources suggest that I need to drop probes with know SNPs residing
in the
> sequence, microsattelites, those that anneal to multiple genomic
locations
> etc.
>
> I have looked into the FDb.InfiniumMethylation.hg19 package
(get450k), but
> I don't see the annotation regarding SNPs (could be due to my
unfamiliarity
> with GRanges). I finally have acquired a list of these from the GEO,
> Illumina GPL13534, but wonder if it's outdated and if there is a
better way
> of doing this.
>
> Does someone know of a good/any tutorial for this workflow?
>
> Many thanks,
> Victoria
>
> --
> Victoria Svinti
> Colon Cancer Genetics Group
> MRC Human Genetics Unit, IGMM
> University of Edinburgh, Western General Hospital,
> Crewe Road, Edinburgh, EH4 2XU
>
>
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@r-project.org
>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
--
*A model is a lie that helps you see the truth.*
*
*
Howard
Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
[[alternative HTML version deleted]]