Distance from TSS and CPG
1
0
Entering edit mode
@khadeeja-ismail-4711
Last seen 8.8 years ago
Hi, I have a list of probes from IlluminaHumanMethylation450k array, and I need to find the distance from TSS and also the distance from CpG island for each. Is there a simple way to do this? Thanks in advance, Khadeeja
• 2.1k views
ADD COMMENT
0
Entering edit mode
Tim Triche ★ 4.2k
@tim-triche-3561
Last seen 4.2 years ago
United States
I wrote this up as an example in the IlluminaHumanMethylation450kprobe package... which seemingly disappeared into thin air after uploading it! Oh well. IlluminaHumanMethylation450kprobe for this and several other common use cases, otherwise here's the man page and data.frame... hopefully it makes sense. (There is a similar object in the .db package but without any sequences) For what you want, you could just do (even with the crufty old 1.4.6 .db package) > library(IlluminaHumanMethylation450k.db) > sites <- toTable(IlluminaHumanMethylation450kCPG37) # or CPG36 if using hg18 > chrs <- toTable(IlluminaHumanMethylation450kCHR37) # or CHR36 if using hg18 > coords <- merge(sites, chrs, by='Probe_ID') > names(coords) <- c('probe','site','chr') > head(coords) probe site chr 1 cg00000029 53468112 16 2 cg00000108 37459206 3 3 cg00000109 171916037 3 4 cg00000165 91194674 1 5 cg00000236 42263294 8 6 cg00000289 69341139 14 > library(GenomicFeatures) > CpGs.unstranded <- with(coords, GRanges(paste('chr',chr,sep=''), IRanges(start=site, width=1, names=probe))) > refgene.TxDb = makeTranscriptDbFromUCSC('refGene', genome='hg19') > TSS.forward = transcripts(refgene.TxDb, vals=list(tx_strand='+'), columns='gene_id') > nearest.fwd = precede(CpGs.unstranded, TSS.forward) > nearest.fwd.eg = nearest.fwd # to keep dimensions right > notfound = whichis.na(nearest.fwd)) # track for later > nearest.fwd.eg[-notfound] = as.character(elementMetadata(TSS.forward)$gene_id[nearest.fwd[-notfoun d]]) > TSSs.fwd = start(TSS.forward[nearest.fwd[-notfound]]) > distToTSS.fwd = nearest.fwd # to keep dimensions right > distToTSS.fwd[-notfound] = start(CpGs.unstranded)[-notfound] - TSSs.fwd And likewise with vals=list(tx_strand='-') for the reverse strand. For CpG island distance you will need to decide which CpG island definition to use. Personally I like Irizarry's. Once you have constructed a GRanges object with the start and end coordinates of the CpG islands, most of it will be equally straightforward. On Wed, Dec 7, 2011 at 2:25 AM, Khadeeja Ismail <hajjja at="" yahoo.com=""> wrote: > Hi, > > I have a list of probes from IlluminaHumanMethylation450k array, and I > need to > find the distance from TSS and also the distance from CpG island for each. > Is > there a simple way to do this? > > Thanks in advance, > Khadeeja > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- If people do not believe that mathematics is simple, it is only because they do not realize how complicated life is. John von Neumann<http: www-groups.dcs.st-="" and.ac.uk="" ~history="" biographies="" von_neumann.html="">
ADD COMMENT
0
Entering edit mode
Thanks. That's very helpful. Regards, Khadeeja ________________________________ From: "Tim Triche, Jr." <tim.triche@gmail.com> Cc: bioconductor@stat.math.ethz.ch Sent: Wednesday, December 14, 2011 9:51 PM Subject: Re: [BioC] Distance from TSS and CPG I wrote this up as an example in the IlluminaHumanMethylation450kprobe package... which seemingly disappeared into thin air after uploading it! � Oh well. IlluminaHumanMethylation450kprobe for this and several other common use cases, otherwise here's the man page and data.frame... hopefully it makes sense. �(There is a similar object in the .db package but without any sequences) For what you want, you could just do (even with the crufty old 1.4.6 .db package) > library(IlluminaHumanMethylation450k.db) > sites <- toTable(IlluminaHumanMethylation450kCPG37) # or CPG36 if using hg18 > chrs <-�toTable(IlluminaHumanMethylation450kCHR37) # or CHR36 if using hg18 > coords <- merge(sites, chrs, by='Probe_ID') > names(coords) <- c('probe','site','chr') > head(coords) � � � �probe � � �site chr 1 cg00000029 �53468112 �16 2 cg00000108 �37459206 � 3 3 cg00000109 171916037 � 3 4 cg00000165 �91194674 � 1 5 cg00000236 �42263294 � 8 6 cg00000289 �69341139 �14 >�library(GenomicFeatures) > CpGs.unstranded <- with(coords,� � � � � � � � � � � � � � GRanges(paste('chr',chr,sep=''), � � � � � � � � � � � � � � � � � IRanges(start=site, width=1, names=probe))) >�refgene.TxDb = makeTranscriptDbFromUCSC('refGene', genome='hg19') > TSS.forward = transcripts(refgene.TxDb,� � � � � � � � � � � � � � � vals=list(tx_strand='+'),� � � � � � � � � � � � � � � columns='gene_id') > nearest.fwd = precede(CpGs.unstranded, TSS.forward) > nearest.fwd.eg = nearest.fwd # to keep dimensions right > notfound = whichis.na(nearest.fwd)) # track for later� > nearest.fwd.eg[-notfound] =� � � as.character(elementMetadata(TSS.forward)$gene_id[nearest.fwd[-not found]]) > TSSs.fwd = start(TSS.forward[nearest.fwd[-notfound]]) > distToTSS.fwd = nearest.fwd # to keep dimensions right > distToTSS.fwd[-notfound] = start(CpGs.unstranded)[-notfound] - TSSs.fwd And likewise with vals=list(tx_strand='-') for the reverse strand. � For CpG island distance you will need to decide which CpG island definition to use. �Personally I like Irizarry's. �Once you have constructed a GRanges object with the start and end coordinates of the CpG islands, most of it will be equally straightforward.� Hi, > >I have a list of probes from IlluminaHumanMethylation450k array, and I need to >find the distance from TSS and also the distance from CpG island for each. Is >there a simple way to do this? > >Thanks in advance, >Khadeeja > >_______________________________________________ >Bioconductor mailing list >Bioconductor@r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- If people do not believe that mathematics is simple, it is only because they do not realize how complicated life is. John von Neumann [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 542 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6