Genome Browser and R
3
0
Entering edit mode
@khadeeja-ismail-4711
Last seen 8.8 years ago
Hi, I am working  with some 450k array probes which I need to look up in Geneome browser to see in which type of areas these probes are located in.  For example, if the CpG site (+/- 100kb) overlaps with any of the following in the GM12878 track. Layered H3K27Ac Layered H3K4Me1 Layered H3K4Me3 Transcription DNase Clusters DNase Clusters V1 Txn Fac ChIP V3 Txn Factor ChIP I would like to do it as batch and not one by one since the list of probes is long. I have tried querying the GenomeBrowser database and also the rtracklayer package in R but have not been successful. Would be great if anyone can give me any ideas on how it can be done. Thanking you, Khadeeja [[alternative HTML version deleted]]
rtracklayer rtracklayer • 2.9k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States
Please give a little more detail about your problem. If the number of probes is large enough, you might want to summarize the overlaps and plot the summaries, rather than looking at every region individually. If you did want to go that route, it would be best to use ggbio and/or GViz to automate the plotting. The UCSC browser is simply not designed for batch plotting. Michael On Thu, Oct 24, 2013 at 9:37 AM, khadeeja ismail <hajjja@yahoo.com> wrote: > > > Hi, > I am working with some 450k array probes which I need to look up in > Geneome browser to see in which type of areas these probes are located in. > For example, if the CpG site (+/- 100kb) overlaps with any of the following > in the GM12878 track. > > > Layered H3K27Ac > Layered H3K4Me1 > Layered H3K4Me3 > Transcription > DNase Clusters > DNase Clusters V1 > Txn Fac ChIP V3 > Txn Factor ChIP > > > I would like to do it as batch and not one by one since the list of probes > is long. I have tried querying the GenomeBrowser database and also the > rtracklayer package in R but have not been successful. Would be great if > anyone can give me any ideas on how it can be done. > > Thanking you, > Khadeeja > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 4 months ago
United States
On 10/24/2013 09:37 AM, khadeeja ismail wrote: > > > Hi, > I am working with some 450k array probes which I need to look up in Geneome browser to see in which type of areas these probes are located in. For example, if the CpG site (+/- 100kb) overlaps with any of the following in the GM12878 track. > > > Layered H3K27Ac > Layered H3K4Me1 > Layered H3K4Me3 > Transcription > DNase Clusters > DNase Clusters V1 > Txn Fac ChIP V3 > Txn Factor ChIP These tracks are available in AnnotationHub library(AnnotationHub) hub = AnnotationHub() m = metadata(hub) and then > head(m$Description[grep("H3k27Ac", m$Description, ignore.case=TRUE)]) [1] "wgEncodeBroadHistoneHsmmtH3k27acStdPk" [2] "wgEncodeBroadHistoneNhaH3k27acStdPk" [3] "wgEncodeBroadHistoneA549H3k27acEtoh02Pk" [4] "wgEncodeBroadHistoneK562H3k27acStdPk" [5] "wgEncodeBroadHistoneGm12878H3k27acStdPk" [6] "wgEncodeSydhHistoneMcf7H3k27acUcdPk" > xx = hub$goldenpath.hg19.encodeDCC.wgEncodeBroadHistone.wgEncodeBroadHiston eGm12878H3k27acStdPk.broadPeak_0.0.1.RData Retrieving 'goldenpath/hg19/encodeDCC/wgEncodeBroadHistone/wgEncodeBroadHistoneGm 12878H3k27acStdPk.broadPeak_0.0.1.RData' > head(xx) GRanges with 6 ranges and 5 metadata columns: seqnames ranges strand | name score signalValue <rle> <iranges> <rle> | <character> <integer> <numeric> [1] chr22 [17091048, 17091199] * | . 579 11.651761 [2] chr22 [17305774, 17306441] * | . 531 10.111585 [3] chr22 [17517314, 17517945] * | . 527 9.991400 [4] chr22 [17518132, 17518819] * | . 837 19.847850 pValue qValue <numeric> <numeric> [1] 2.4 -1 [2] 15.4 -1 [3] 100.0 -1 [4] 15.3 -1 [ reached getOption("max.print") -- omitted 2 rows ] and then ready for findOverlaps or other GRanges operations. There's a vignette in AnnotationHub http://bioconductor.org/packages/release/bioc/html/AnnotationHub.html and it is mentioned in the work flow on annotation and AnnotatingRanges work flows are relevant http://bioconductor.org/help/workflows/annotation/annotation/ http://bioconductor.org/help/workflows/annotation/AnnotatingRanges/ It would be interesting and useful to have this as a stand-alone work flow, so if you do pursue this root and are interested in writing up a workflow then let me know... Martin > > > I would like to do it as batch and not one by one since the list of probes is long. I have tried querying the GenomeBrowser database and also the rtracklayer package in R but have not been successful. Would be great if anyone can give me any ideas on how it can be done. > > Thanking you, > Khadeeja > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
Thanks much for the help. Will have a go and let you know. I have about 80 probes, from many different genes. I'm not sure if they can be summarized, but sure it's worth having a look. BR, Khadeeja On Thursday, October 24, 2013 8:53 PM, Martin Morgan <mtmorgan@fhcrc.org> wrote: On 10/24/2013 09:37 AM, khadeeja ismail wrote: > > > Hi, > I am working  with some 450k array probes which I need to look up in Geneome browser to see in which type of areas these probes are located in.  For example, if the CpG site (+/- 100kb) overlaps with any of the following in the GM12878 track. > > > Layered H3K27Ac > Layered H3K4Me1 > Layered H3K4Me3 > Transcription > DNase Clusters > DNase Clusters V1 > Txn Fac ChIP V3 > Txn Factor ChIP These tracks are available in AnnotationHub   library(AnnotationHub)   hub = AnnotationHub()   m = metadata(hub) and then > head(m$Description[grep("H3k27Ac", m$Description, ignore.case=TRUE)]) [1] "wgEncodeBroadHistoneHsmmtH3k27acStdPk" [2] "wgEncodeBroadHistoneNhaH3k27acStdPk" [3] "wgEncodeBroadHistoneA549H3k27acEtoh02Pk" [4] "wgEncodeBroadHistoneK562H3k27acStdPk" [5] "wgEncodeBroadHistoneGm12878H3k27acStdPk" [6] "wgEncodeSydhHistoneMcf7H3k27acUcdPk" > xx = hub$goldenpath.hg19.encodeDCC.wgEncodeBroadHistone.wgEncodeBroadHiston eGm12878H3k27acStdPk.broadPeak_0.0.1.RData Retrieving 'goldenpath/hg19/encodeDCC/wgEncodeBroadHistone/wgEncodeBroadHistoneGm 12878H3k27acStdPk.broadPeak_0.0.1.RData' > head(xx) GRanges with 6 ranges and 5 metadata columns:       seqnames              ranges strand |        name    score signalValue           <rle>            <iranges>  <rle> | <character> <integer> <numeric>   [1]    chr22 [17091048, 17091199]      * |          .      579 11.651761   [2]    chr22 [17305774, 17306441]      * |          .      531 10.111585   [3]    chr22 [17517314, 17517945]      * |          .      527 9.991400   [4]    chr22 [17518132, 17518819]      * |          .      837 19.847850           pValue    qValue       <numeric> <numeric>   [1]      2.4        -1   [2]      15.4        -1   [3]    100.0        -1   [4]      15.3        -1   [ reached getOption("max.print") -- omitted 2 rows ] and then ready for findOverlaps or other GRanges operations. There's a vignette in AnnotationHub http://bioconductor.org/packages/release/bioc/html/AnnotationHub.html and it is mentioned in the work flow on annotation and AnnotatingRanges work flows are relevant   http://bioconductor.org/help/workflows/annotation/annotation/   http://bioconductor.org/help/workflows/annotation/AnnotatingRanges/ It would be interesting and useful to have this as a stand-alone work flow, so if you do pursue this root and are interested in writing up a workflow then let me know... Martin > > > I would like to do it as batch and not one by one since the list of probes is long. I have tried querying the GenomeBrowser database and also the rtracklayer package in R but have not been successful. Would be great if anyone can give me any ideas on how it can be done. > > Thanking you, > Khadeeja >     [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
If I'm guessing right, something like this... ? grset <- readRDS("grset.rds") show(grset) ## ## class: GenomicRatioSet ## dim: 468211 32 ## exptData(0): ## assays(2): M CN ## ... ## highVar <- names(which(rowData(grset)$varByGroupQval < 0.05)) ## ## about 50 probes, here ## ## could also use FDb.InfiniumMethylation.hg19 if not already mapped grow <- function(x, y) resize(x, width(x) + (2*y)) probes <- grow(granges(grset)[highVar], 10e5) ## +/- 100kb require(AnnotationHub) hub = AnnotationHub() m = metadata(hub) ## ## ...time passes... ## histoneMarks <- c('k27ac','k4me1','k4me3') names(histoneMarks) <- histoneMarks pre <- 'goldenpath.hg19.encodeDCC.wgEncodeBroadHistone.wgEncodeBroadHistone' post <- 'StdPk.broadPeak_0.0.1.RData' gm12878 <- GRangesList(lapply(histoneMarks, function(x) hub[[paste0(pre, 'Gm12878H3', x, post)]])) lapply(gm12878, function(x) names(subsetByOverlaps(probes, x))) ## $k27ac ## [1] "cg07238657" "cg06431905" "cg14555649" "cg00031967" "cg10311020" ## ... ## ## $k4me1 ## [1] "cg25243082" "cg06431905" "cg00031967" "cg10311020" "cg05482956" ## ... ## ## $k4me3 ## [1] "cg16220844" "cg24991732" "cg07238657" "cg06431905" "cg14555649" ## ... Is that pretty similar to what you were thinking? The rest will be an issue of hunt-and-peck; you could also use countOverlaps, though it won't make it as easy to e.g. intersect h3k27ac and h3k4me1 to find active enhancers. hope this helps, --t *He that would live in peace and at ease, * *Must not speak all he knows, nor judge all he sees.* * * Benjamin Franklin, Poor Richard's Almanack<http: archive.org="" details="" poorrichardsalma00franrich=""> On Thu, Oct 24, 2013 at 11:21 AM, khadeeja ismail <hajjja@yahoo.com> wrote: > Thanks much for the help. Will have a go and let you know. > I have about 80 probes, from many different genes. I'm not sure if they > can be summarized, but sure it's worth having a look. > > BR, > Khadeeja > > > > > > > On Thursday, October 24, 2013 8:53 PM, Martin Morgan <mtmorgan@fhcrc.org> > wrote: > > On 10/24/2013 09:37 AM, khadeeja ismail wrote: > > > > > > Hi, > > I am working with some 450k array probes which I need to look up in > Geneome browser to see in which type of areas these probes are located in. > For example, if the CpG site (+/- 100kb) overlaps with any of the following > in the GM12878 track. > > > > > > Layered H3K27Ac > > Layered H3K4Me1 > > Layered H3K4Me3 > > Transcription > > DNase Clusters > > DNase Clusters V1 > > Txn Fac ChIP V3 > > Txn Factor ChIP > > These tracks are available in AnnotationHub > > library(AnnotationHub) > hub = AnnotationHub() > m = metadata(hub) > > and then > > > head(m$Description[grep("H3k27Ac", m$Description, ignore.case=TRUE)]) > [1] "wgEncodeBroadHistoneHsmmtH3k27acStdPk" > [2] "wgEncodeBroadHistoneNhaH3k27acStdPk" > [3] "wgEncodeBroadHistoneA549H3k27acEtoh02Pk" > [4] "wgEncodeBroadHistoneK562H3k27acStdPk" > [5] "wgEncodeBroadHistoneGm12878H3k27acStdPk" > [6] "wgEncodeSydhHistoneMcf7H3k27acUcdPk" > > > xx = > > hub$goldenpath.hg19.encodeDCC.wgEncodeBroadHistone.wgEncodeBroadHist oneGm12878H3k27acStdPk.broadPeak_0.0.1.RData > Retrieving > > 'goldenpath/hg19/encodeDCC/wgEncodeBroadHistone/wgEncodeBroadHistone Gm12878H3k27acStdPk.broadPeak_0.0.1.RData' > > > head(xx) > GRanges with 6 ranges and 5 metadata columns: > seqnames ranges strand | name score > signalValue > <rle> <iranges> <rle> | <character> <integer> > <numeric> > [1] chr22 [17091048, 17091199] * | . 579 > 11.651761 > [2] chr22 [17305774, 17306441] * | . 531 > 10.111585 > [3] chr22 [17517314, 17517945] * | . 527 > 9.991400 > [4] chr22 [17518132, 17518819] * | . 837 > 19.847850 > pValue qValue > <numeric> <numeric> > [1] 2.4 -1 > [2] 15.4 -1 > [3] 100.0 -1 > [4] 15.3 -1 > [ reached getOption("max.print") -- omitted 2 rows ] > > and then ready for findOverlaps or other GRanges operations. There's a > vignette > in AnnotationHub > > http://bioconductor.org/packages/release/bioc/html/AnnotationHub.html > > and it is mentioned in the work flow on annotation and AnnotatingRanges > work > flows are relevant > > http://bioconductor.org/help/workflows/annotation/annotation/ > http://bioconductor.org/help/workflows/annotation/AnnotatingRanges/ > > It would be interesting and useful to have this as a stand-alone work > flow, so > if you do pursue this root and are interested in writing up a workflow > then let > me know... > > Martin > > > > > > > > I would like to do it as batch and not one by one since the list of > probes is long. I have tried querying the GenomeBrowser database and also > the rtracklayer package in R but have not been successful. Would be great > if anyone can give me any ideas on how it can be done. > > > > Thanking you, > > Khadeeja > > [[alternative HTML version deleted]] > > > > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
ps. Why +/- 100kb? That's an awful lot of padding given that tons of the genome falls into h3k4me1 peaks *He that would live in peace and at ease, * *Must not speak all he knows, nor judge all he sees.* * * Benjamin Franklin, Poor Richard's Almanack<http: archive.org="" details="" poorrichardsalma00franrich=""> On Thu, Oct 24, 2013 at 2:52 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > If I'm guessing right, something like this... ? > > grset <- readRDS("grset.rds") > show(grset) > ## > ## class: GenomicRatioSet > ## dim: 468211 32 > ## exptData(0): > ## assays(2): M CN > ## ... > ## > highVar <- names(which(rowData(grset)$varByGroupQval < 0.05)) > ## > ## about 50 probes, here > ## > ## could also use FDb.InfiniumMethylation.hg19 if not already mapped > > grow <- function(x, y) resize(x, width(x) + (2*y)) > probes <- grow(granges(grset)[highVar], 10e5) ## +/- 100kb > > require(AnnotationHub) > hub = AnnotationHub() > m = metadata(hub) > ## > ## ...time passes... > ## > > histoneMarks <- c('k27ac','k4me1','k4me3') > names(histoneMarks) <- histoneMarks > > pre <- > 'goldenpath.hg19.encodeDCC.wgEncodeBroadHistone.wgEncodeBroadHistone' > post <- 'StdPk.broadPeak_0.0.1.RData' > gm12878 <- GRangesList(lapply(histoneMarks, > function(x) > hub[[paste0(pre, 'Gm12878H3', x, post)]])) > > lapply(gm12878, function(x) names(subsetByOverlaps(probes, x))) > ## $k27ac > ## [1] "cg07238657" "cg06431905" "cg14555649" "cg00031967" "cg10311020" > ## ... > ## > ## $k4me1 > ## [1] "cg25243082" "cg06431905" "cg00031967" "cg10311020" "cg05482956" > ## ... > ## > ## $k4me3 > ## [1] "cg16220844" "cg24991732" "cg07238657" "cg06431905" "cg14555649" > ## ... > > Is that pretty similar to what you were thinking? The rest will be an > issue of hunt-and-peck; you could also use countOverlaps, though it won't > make it as easy to e.g. intersect h3k27ac and h3k4me1 to find active > enhancers. > > hope this helps, > > --t > > > > *He that would live in peace and at ease, * > *Must not speak all he knows, nor judge all he sees.* > * > * > Benjamin Franklin, Poor Richard's Almanack<http: archive.org="" details="" poorrichardsalma00franrich=""> > > > On Thu, Oct 24, 2013 at 11:21 AM, khadeeja ismail <hajjja@yahoo.com>wrote: > >> Thanks much for the help. Will have a go and let you know. >> I have about 80 probes, from many different genes. I'm not sure if they >> can be summarized, but sure it's worth having a look. >> >> BR, >> Khadeeja >> >> >> >> >> >> >> On Thursday, October 24, 2013 8:53 PM, Martin Morgan <mtmorgan@fhcrc.org> >> wrote: >> >> On 10/24/2013 09:37 AM, khadeeja ismail wrote: >> > >> > >> > Hi, >> > I am working with some 450k array probes which I need to look up in >> Geneome browser to see in which type of areas these probes are located in. >> For example, if the CpG site (+/- 100kb) overlaps with any of the following >> in the GM12878 track. >> > >> > >> > Layered H3K27Ac >> > Layered H3K4Me1 >> > Layered H3K4Me3 >> > Transcription >> > DNase Clusters >> > DNase Clusters V1 >> > Txn Fac ChIP V3 >> > Txn Factor ChIP >> >> These tracks are available in AnnotationHub >> >> library(AnnotationHub) >> hub = AnnotationHub() >> m = metadata(hub) >> >> and then >> >> > head(m$Description[grep("H3k27Ac", m$Description, ignore.case=TRUE)]) >> [1] "wgEncodeBroadHistoneHsmmtH3k27acStdPk" >> [2] "wgEncodeBroadHistoneNhaH3k27acStdPk" >> [3] "wgEncodeBroadHistoneA549H3k27acEtoh02Pk" >> [4] "wgEncodeBroadHistoneK562H3k27acStdPk" >> [5] "wgEncodeBroadHistoneGm12878H3k27acStdPk" >> [6] "wgEncodeSydhHistoneMcf7H3k27acUcdPk" >> >> > xx = >> >> hub$goldenpath.hg19.encodeDCC.wgEncodeBroadHistone.wgEncodeBroadHis toneGm12878H3k27acStdPk.broadPeak_0.0.1.RData >> Retrieving >> >> 'goldenpath/hg19/encodeDCC/wgEncodeBroadHistone/wgEncodeBroadHiston eGm12878H3k27acStdPk.broadPeak_0.0.1.RData' >> >> > head(xx) >> GRanges with 6 ranges and 5 metadata columns: >> seqnames ranges strand | name score >> signalValue >> <rle> <iranges> <rle> | <character> <integer> >> <numeric> >> [1] chr22 [17091048, 17091199] * | . 579 >> 11.651761 >> [2] chr22 [17305774, 17306441] * | . 531 >> 10.111585 >> [3] chr22 [17517314, 17517945] * | . 527 >> 9.991400 >> [4] chr22 [17518132, 17518819] * | . 837 >> 19.847850 >> pValue qValue >> <numeric> <numeric> >> [1] 2.4 -1 >> [2] 15.4 -1 >> [3] 100.0 -1 >> [4] 15.3 -1 >> [ reached getOption("max.print") -- omitted 2 rows ] >> >> and then ready for findOverlaps or other GRanges operations. There's a >> vignette >> in AnnotationHub >> >> http://bioconductor.org/packages/release/bioc/html/AnnotationHub.html >> >> and it is mentioned in the work flow on annotation and AnnotatingRanges >> work >> flows are relevant >> >> http://bioconductor.org/help/workflows/annotation/annotation/ >> http://bioconductor.org/help/workflows/annotation/AnnotatingRanges/ >> >> It would be interesting and useful to have this as a stand-alone work >> flow, so >> if you do pursue this root and are interested in writing up a workflow >> then let >> me know... >> >> Martin >> >> >> > >> > >> > I would like to do it as batch and not one by one since the list of >> probes is long. I have tried querying the GenomeBrowser database and also >> the rtracklayer package in R but have not been successful. Would be great >> if anyone can give me any ideas on how it can be done. >> > >> > Thanking you, >> > Khadeeja >> > [[alternative HTML version deleted]] >> > >> > >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor@r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> >> >> -- >> Computational Biology / Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. >> PO Box 19024 Seattle, WA 98109 >> >> Location: Arnold Building M1 B861 >> Phone: (206) 667-2793 >> [[alternative HTML version deleted]] >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
On Thu, Oct 24, 2013 at 2:54 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > ps. Why +/- 100kb? That's an awful lot of padding given that tons of the > genome falls into h3k4me1 peaks > > > > *He that would live in peace and at ease, * > *Must not speak all he knows, nor judge all he sees.* > * > * > Benjamin Franklin, Poor Richard's Almanack<http: archive.org="" details="" poorrichardsalma00franrich=""> > > > On Thu, Oct 24, 2013 at 2:52 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > >> If I'm guessing right, something like this... ? >> >> grset <- readRDS("grset.rds") >> show(grset) >> ## >> ## class: GenomicRatioSet >> ## dim: 468211 32 >> ## exptData(0): >> ## assays(2): M CN >> ## ... >> ## >> highVar <- names(which(rowData(grset)$varByGroupQval < 0.05)) >> ## >> ## about 50 probes, here >> ## >> ## could also use FDb.InfiniumMethylation.hg19 if not already mapped >> >> grow <- function(x, y) resize(x, width(x) + (2*y)) >> probes <- grow(granges(grset)[highVar], 10e5) ## +/- 100kb >> >> This grow function is currently implemented as: granges(grset)[highVar] + 1e5 If people like an alias like "grow" or "widen", we should consider adding it. require(AnnotationHub) >> hub = AnnotationHub() >> m = metadata(hub) >> ## >> ## ...time passes... >> ## >> >> histoneMarks <- c('k27ac','k4me1','k4me3') >> names(histoneMarks) <- histoneMarks >> >> pre <- >> 'goldenpath.hg19.encodeDCC.wgEncodeBroadHistone.wgEncodeBroadHistone' >> post <- 'StdPk.broadPeak_0.0.1.RData' >> gm12878 <- GRangesList(lapply(histoneMarks, >> function(x) >> hub[[paste0(pre, 'Gm12878H3', x, post)]])) >> >> I kind of think that GenomicRangesList() should be used here instead of GRangesList, for efficiency, generality and perhaps semantics. > lapply(gm12878, function(x) names(subsetByOverlaps(probes, x))) >> ## $k27ac >> ## [1] "cg07238657" "cg06431905" "cg14555649" "cg00031967" "cg10311020" >> ## ... >> ## >> ## $k4me1 >> ## [1] "cg25243082" "cg06431905" "cg00031967" "cg10311020" "cg05482956" >> ## ... >> ## >> ## $k4me3 >> ## [1] "cg16220844" "cg24991732" "cg07238657" "cg06431905" "cg14555649" >> ## ... >> >> Is that pretty similar to what you were thinking? The rest will be an >> issue of hunt-and-peck; you could also use countOverlaps, though it won't >> make it as easy to e.g. intersect h3k27ac and h3k4me1 to find active >> enhancers. >> >> hope this helps, >> >> --t >> >> >> >> *He that would live in peace and at ease, * >> *Must not speak all he knows, nor judge all he sees.* >> * >> * >> Benjamin Franklin, Poor Richard's Almanack<http: archive.org="" details="" poorrichardsalma00franrich=""> >> >> >> On Thu, Oct 24, 2013 at 11:21 AM, khadeeja ismail <hajjja@yahoo.com>wrote: >> >>> Thanks much for the help. Will have a go and let you know. >>> I have about 80 probes, from many different genes. I'm not sure if they >>> can be summarized, but sure it's worth having a look. >>> >>> BR, >>> Khadeeja >>> >>> >>> >>> >>> >>> >>> On Thursday, October 24, 2013 8:53 PM, Martin Morgan <mtmorgan@fhcrc.org> >>> wrote: >>> >>> On 10/24/2013 09:37 AM, khadeeja ismail wrote: >>> > >>> > >>> > Hi, >>> > I am working with some 450k array probes which I need to look up in >>> Geneome browser to see in which type of areas these probes are located in. >>> For example, if the CpG site (+/- 100kb) overlaps with any of the following >>> in the GM12878 track. >>> > >>> > >>> > Layered H3K27Ac >>> > Layered H3K4Me1 >>> > Layered H3K4Me3 >>> > Transcription >>> > DNase Clusters >>> > DNase Clusters V1 >>> > Txn Fac ChIP V3 >>> > Txn Factor ChIP >>> >>> These tracks are available in AnnotationHub >>> >>> library(AnnotationHub) >>> hub = AnnotationHub() >>> m = metadata(hub) >>> >>> and then >>> >>> > head(m$Description[grep("H3k27Ac", m$Description, ignore.case=TRUE)]) >>> [1] "wgEncodeBroadHistoneHsmmtH3k27acStdPk" >>> [2] "wgEncodeBroadHistoneNhaH3k27acStdPk" >>> [3] "wgEncodeBroadHistoneA549H3k27acEtoh02Pk" >>> [4] "wgEncodeBroadHistoneK562H3k27acStdPk" >>> [5] "wgEncodeBroadHistoneGm12878H3k27acStdPk" >>> [6] "wgEncodeSydhHistoneMcf7H3k27acUcdPk" >>> >>> > xx = >>> >>> hub$goldenpath.hg19.encodeDCC.wgEncodeBroadHistone.wgEncodeBroadHi stoneGm12878H3k27acStdPk.broadPeak_0.0.1.RData >>> Retrieving >>> >>> 'goldenpath/hg19/encodeDCC/wgEncodeBroadHistone/wgEncodeBroadHisto neGm12878H3k27acStdPk.broadPeak_0.0.1.RData' >>> >>> > head(xx) >>> GRanges with 6 ranges and 5 metadata columns: >>> seqnames ranges strand | name score >>> signalValue >>> <rle> <iranges> <rle> | <character> <integer> >>> <numeric> >>> [1] chr22 [17091048, 17091199] * | . 579 >>> 11.651761 >>> [2] chr22 [17305774, 17306441] * | . 531 >>> 10.111585 >>> [3] chr22 [17517314, 17517945] * | . 527 >>> 9.991400 >>> [4] chr22 [17518132, 17518819] * | . 837 >>> 19.847850 >>> pValue qValue >>> <numeric> <numeric> >>> [1] 2.4 -1 >>> [2] 15.4 -1 >>> [3] 100.0 -1 >>> [4] 15.3 -1 >>> [ reached getOption("max.print") -- omitted 2 rows ] >>> >>> and then ready for findOverlaps or other GRanges operations. There's a >>> vignette >>> in AnnotationHub >>> >>> http://bioconductor.org/packages/release/bioc/html/AnnotationHub.html >>> >>> and it is mentioned in the work flow on annotation and AnnotatingRanges >>> work >>> flows are relevant >>> >>> http://bioconductor.org/help/workflows/annotation/annotation/ >>> http://bioconductor.org/help/workflows/annotation/AnnotatingRanges/ >>> >>> It would be interesting and useful to have this as a stand-alone work >>> flow, so >>> if you do pursue this root and are interested in writing up a workflow >>> then let >>> me know... >>> >>> Martin >>> >>> >>> > >>> > >>> > I would like to do it as batch and not one by one since the list of >>> probes is long. I have tried querying the GenomeBrowser database and also >>> the rtracklayer package in R but have not been successful. Would be great >>> if anyone can give me any ideas on how it can be done. >>> > >>> > Thanking you, >>> > Khadeeja >>> > [[alternative HTML version deleted]] >>> > >>> > >>> > >>> > _______________________________________________ >>> > Bioconductor mailing list >>> > Bioconductor@r-project.org >>> > https://stat.ethz.ch/mailman/listinfo/bioconductor >>> > Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> > >>> >>> >>> -- >>> Computational Biology / Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N. >>> PO Box 19024 Seattle, WA 98109 >>> >>> Location: Arnold Building M1 B861 >>> Phone: (206) 667-2793 >>> [[alternative HTML version deleted]] >>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >> > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
> GenomicRangesList() should be used here instead of GRangesList, for efficiency, generality and perhaps semantics. What makes GenomicRangesList() more general or efficient? I did not realize that I should be doing this. Thanks, --t *He that would live in peace and at ease, * *Must not speak all he knows, nor judge all he sees.* * * Benjamin Franklin, Poor Richard's Almanack<http: archive.org="" details="" poorrichardsalma00franrich=""> On Thu, Oct 24, 2013 at 3:36 PM, Michael Lawrence <lawrence.michael@gene.com> wrote: > > > > On Thu, Oct 24, 2013 at 2:54 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > >> ps. Why +/- 100kb? That's an awful lot of padding given that tons of >> the genome falls into h3k4me1 peaks >> >> >> >> *He that would live in peace and at ease, * >> *Must not speak all he knows, nor judge all he sees.* >> * >> * >> Benjamin Franklin, Poor Richard's Almanack<http: archive.org="" details="" poorrichardsalma00franrich=""> >> >> >> On Thu, Oct 24, 2013 at 2:52 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: >> >>> If I'm guessing right, something like this... ? >>> >>> grset <- readRDS("grset.rds") >>> show(grset) >>> ## >>> ## class: GenomicRatioSet >>> ## dim: 468211 32 >>> ## exptData(0): >>> ## assays(2): M CN >>> ## ... >>> ## >>> highVar <- names(which(rowData(grset)$varByGroupQval < 0.05)) >>> ## >>> ## about 50 probes, here >>> ## >>> ## could also use FDb.InfiniumMethylation.hg19 if not already mapped >>> >>> grow <- function(x, y) resize(x, width(x) + (2*y)) >>> probes <- grow(granges(grset)[highVar], 10e5) ## +/- 100kb >>> >>> > This grow function is currently implemented as: > granges(grset)[highVar] + 1e5 > > If people like an alias like "grow" or "widen", we should consider adding > it. > > require(AnnotationHub) >>> hub = AnnotationHub() >>> m = metadata(hub) >>> ## >>> ## ...time passes... >>> ## >>> >>> histoneMarks <- c('k27ac','k4me1','k4me3') >>> names(histoneMarks) <- histoneMarks >>> >>> pre <- >>> 'goldenpath.hg19.encodeDCC.wgEncodeBroadHistone.wgEncodeBroadHistone' >>> post <- 'StdPk.broadPeak_0.0.1.RData' >>> gm12878 <- GRangesList(lapply(histoneMarks, >>> function(x) >>> hub[[paste0(pre, 'Gm12878H3', x, >>> post)]])) >>> >>> > I kind of think that GenomicRangesList() should be used here instead of > GRangesList, for efficiency, generality and perhaps semantics. > > >> lapply(gm12878, function(x) names(subsetByOverlaps(probes, x))) >>> ## $k27ac >>> ## [1] "cg07238657" "cg06431905" "cg14555649" "cg00031967" "cg10311020" >>> ## ... >>> ## >>> ## $k4me1 >>> ## [1] "cg25243082" "cg06431905" "cg00031967" "cg10311020" "cg05482956" >>> ## ... >>> ## >>> ## $k4me3 >>> ## [1] "cg16220844" "cg24991732" "cg07238657" "cg06431905" "cg14555649" >>> ## ... >>> >>> Is that pretty similar to what you were thinking? The rest will be an >>> issue of hunt-and-peck; you could also use countOverlaps, though it won't >>> make it as easy to e.g. intersect h3k27ac and h3k4me1 to find active >>> enhancers. >>> >>> hope this helps, >>> >>> --t >>> >>> >>> >>> *He that would live in peace and at ease, * >>> *Must not speak all he knows, nor judge all he sees.* >>> * >>> * >>> Benjamin Franklin, Poor Richard's Almanack<http: archive.org="" details="" poorrichardsalma00franrich=""> >>> >>> >>> On Thu, Oct 24, 2013 at 11:21 AM, khadeeja ismail <hajjja@yahoo.com>wrote: >>> >>>> Thanks much for the help. Will have a go and let you know. >>>> I have about 80 probes, from many different genes. I'm not sure if they >>>> can be summarized, but sure it's worth having a look. >>>> >>>> BR, >>>> Khadeeja >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Thursday, October 24, 2013 8:53 PM, Martin Morgan < >>>> mtmorgan@fhcrc.org> wrote: >>>> >>>> On 10/24/2013 09:37 AM, khadeeja ismail wrote: >>>> > >>>> > >>>> > Hi, >>>> > I am working with some 450k array probes which I need to look up in >>>> Geneome browser to see in which type of areas these probes are located in. >>>> For example, if the CpG site (+/- 100kb) overlaps with any of the following >>>> in the GM12878 track. >>>> > >>>> > >>>> > Layered H3K27Ac >>>> > Layered H3K4Me1 >>>> > Layered H3K4Me3 >>>> > Transcription >>>> > DNase Clusters >>>> > DNase Clusters V1 >>>> > Txn Fac ChIP V3 >>>> > Txn Factor ChIP >>>> >>>> These tracks are available in AnnotationHub >>>> >>>> library(AnnotationHub) >>>> hub = AnnotationHub() >>>> m = metadata(hub) >>>> >>>> and then >>>> >>>> > head(m$Description[grep("H3k27Ac", m$Description, ignore.case=TRUE)]) >>>> [1] "wgEncodeBroadHistoneHsmmtH3k27acStdPk" >>>> [2] "wgEncodeBroadHistoneNhaH3k27acStdPk" >>>> [3] "wgEncodeBroadHistoneA549H3k27acEtoh02Pk" >>>> [4] "wgEncodeBroadHistoneK562H3k27acStdPk" >>>> [5] "wgEncodeBroadHistoneGm12878H3k27acStdPk" >>>> [6] "wgEncodeSydhHistoneMcf7H3k27acUcdPk" >>>> >>>> > xx = >>>> >>>> hub$goldenpath.hg19.encodeDCC.wgEncodeBroadHistone.wgEncodeBroadH istoneGm12878H3k27acStdPk.broadPeak_0.0.1.RData >>>> Retrieving >>>> >>>> 'goldenpath/hg19/encodeDCC/wgEncodeBroadHistone/wgEncodeBroadHist oneGm12878H3k27acStdPk.broadPeak_0.0.1.RData' >>>> >>>> > head(xx) >>>> GRanges with 6 ranges and 5 metadata columns: >>>> seqnames ranges strand | name score >>>> signalValue >>>> <rle> <iranges> <rle> | <character> <integer> >>>> <numeric> >>>> [1] chr22 [17091048, 17091199] * | . 579 >>>> 11.651761 >>>> [2] chr22 [17305774, 17306441] * | . 531 >>>> 10.111585 >>>> [3] chr22 [17517314, 17517945] * | . 527 >>>> 9.991400 >>>> [4] chr22 [17518132, 17518819] * | . 837 >>>> 19.847850 >>>> pValue qValue >>>> <numeric> <numeric> >>>> [1] 2.4 -1 >>>> [2] 15.4 -1 >>>> [3] 100.0 -1 >>>> [4] 15.3 -1 >>>> [ reached getOption("max.print") -- omitted 2 rows ] >>>> >>>> and then ready for findOverlaps or other GRanges operations. There's a >>>> vignette >>>> in AnnotationHub >>>> >>>> http://bioconductor.org/packages/release/bioc/html/AnnotationHub.html >>>> >>>> and it is mentioned in the work flow on annotation and AnnotatingRanges >>>> work >>>> flows are relevant >>>> >>>> http://bioconductor.org/help/workflows/annotation/annotation/ >>>> http://bioconductor.org/help/workflows/annotation/AnnotatingRanges/ >>>> >>>> It would be interesting and useful to have this as a stand-alone work >>>> flow, so >>>> if you do pursue this root and are interested in writing up a workflow >>>> then let >>>> me know... >>>> >>>> Martin >>>> >>>> >>>> > >>>> > >>>> > I would like to do it as batch and not one by one since the list of >>>> probes is long. I have tried querying the GenomeBrowser database and also >>>> the rtracklayer package in R but have not been successful. Would be great >>>> if anyone can give me any ideas on how it can be done. >>>> > >>>> > Thanking you, >>>> > Khadeeja >>>> > [[alternative HTML version deleted]] >>>> > >>>> > >>>> > >>>> > _______________________________________________ >>>> > Bioconductor mailing list >>>> > Bioconductor@r-project.org >>>> > https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> > Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> > >>>> >>>> >>>> -- >>>> Computational Biology / Fred Hutchinson Cancer Research Center >>>> 1100 Fairview Ave. N. >>>> PO Box 19024 Seattle, WA 98109 >>>> >>>> Location: Arnold Building M1 B861 >>>> Phone: (206) 667-2793 >>>> [[alternative HTML version deleted]] >>>> >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>> >> > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
It's the implementation: GRangesList is compressed; GenomicRangesList is not. Efficient: GRangesList compresses the data, i.e., it concatenates all the objects. Your code proceeds to loop over the elements, which needs to extract them. GenomicRangesList would keep everything as an ordinary list internally. General: If you're grabbing arbitrary tracks from AnnotationHub or anywhere else, they might not all have the same mcols; GRangesList requires that the set of mcols is the same for all objects (because they all end up in the same GRanges). On Thu, Oct 24, 2013 at 9:30 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > > GenomicRangesList() should be used here instead of GRangesList, for > efficiency, generality and perhaps semantics. > > What makes GenomicRangesList() more general or efficient? I did not > realize that I should be doing this. > > Thanks, > > --t > > > *He that would live in peace and at ease, * > *Must not speak all he knows, nor judge all he sees.* > * > * > Benjamin Franklin, Poor Richard's Almanack<http: archive.org="" details="" poorrichardsalma00franrich=""> > > > On Thu, Oct 24, 2013 at 3:36 PM, Michael Lawrence < > lawrence.michael@gene.com> wrote: > >> >> >> >> On Thu, Oct 24, 2013 at 2:54 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: >> >>> ps. Why +/- 100kb? That's an awful lot of padding given that tons of >>> the genome falls into h3k4me1 peaks >>> >>> >>> >>> *He that would live in peace and at ease, * >>> *Must not speak all he knows, nor judge all he sees.* >>> * >>> * >>> Benjamin Franklin, Poor Richard's Almanack<http: archive.org="" details="" poorrichardsalma00franrich=""> >>> >>> >>> On Thu, Oct 24, 2013 at 2:52 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: >>> >>>> If I'm guessing right, something like this... ? >>>> >>>> grset <- readRDS("grset.rds") >>>> show(grset) >>>> ## >>>> ## class: GenomicRatioSet >>>> ## dim: 468211 32 >>>> ## exptData(0): >>>> ## assays(2): M CN >>>> ## ... >>>> ## >>>> highVar <- names(which(rowData(grset)$varByGroupQval < 0.05)) >>>> ## >>>> ## about 50 probes, here >>>> ## >>>> ## could also use FDb.InfiniumMethylation.hg19 if not already mapped >>>> >>>> grow <- function(x, y) resize(x, width(x) + (2*y)) >>>> probes <- grow(granges(grset)[highVar], 10e5) ## +/- 100kb >>>> >>>> >> This grow function is currently implemented as: >> granges(grset)[highVar] + 1e5 >> >> If people like an alias like "grow" or "widen", we should consider adding >> it. >> >> require(AnnotationHub) >>>> hub = AnnotationHub() >>>> m = metadata(hub) >>>> ## >>>> ## ...time passes... >>>> ## >>>> >>>> histoneMarks <- c('k27ac','k4me1','k4me3') >>>> names(histoneMarks) <- histoneMarks >>>> >>>> pre <- >>>> 'goldenpath.hg19.encodeDCC.wgEncodeBroadHistone.wgEncodeBroadHistone' >>>> post <- 'StdPk.broadPeak_0.0.1.RData' >>>> gm12878 <- GRangesList(lapply(histoneMarks, >>>> function(x) >>>> hub[[paste0(pre, 'Gm12878H3', x, >>>> post)]])) >>>> >>>> >> I kind of think that GenomicRangesList() should be used here instead of >> GRangesList, for efficiency, generality and perhaps semantics. >> >> >>> lapply(gm12878, function(x) names(subsetByOverlaps(probes, x))) >>>> ## $k27ac >>>> ## [1] "cg07238657" "cg06431905" "cg14555649" "cg00031967" "cg10311020" >>>> ## ... >>>> ## >>>> ## $k4me1 >>>> ## [1] "cg25243082" "cg06431905" "cg00031967" "cg10311020" "cg05482956" >>>> ## ... >>>> ## >>>> ## $k4me3 >>>> ## [1] "cg16220844" "cg24991732" "cg07238657" "cg06431905" "cg14555649" >>>> ## ... >>>> >>>> Is that pretty similar to what you were thinking? The rest will be an >>>> issue of hunt-and-peck; you could also use countOverlaps, though it won't >>>> make it as easy to e.g. intersect h3k27ac and h3k4me1 to find active >>>> enhancers. >>>> >>>> hope this helps, >>>> >>>> --t >>>> >>>> >>>> >>>> *He that would live in peace and at ease, * >>>> *Must not speak all he knows, nor judge all he sees.* >>>> * >>>> * >>>> Benjamin Franklin, Poor Richard's Almanack<http: archive.org="" details="" poorrichardsalma00franrich=""> >>>> >>>> >>>> On Thu, Oct 24, 2013 at 11:21 AM, khadeeja ismail <hajjja@yahoo.com>wrote: >>>> >>>>> Thanks much for the help. Will have a go and let you know. >>>>> I have about 80 probes, from many different genes. I'm not sure if >>>>> they can be summarized, but sure it's worth having a look. >>>>> >>>>> BR, >>>>> Khadeeja >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Thursday, October 24, 2013 8:53 PM, Martin Morgan < >>>>> mtmorgan@fhcrc.org> wrote: >>>>> >>>>> On 10/24/2013 09:37 AM, khadeeja ismail wrote: >>>>> > >>>>> > >>>>> > Hi, >>>>> > I am working with some 450k array probes which I need to look up in >>>>> Geneome browser to see in which type of areas these probes are located in. >>>>> For example, if the CpG site (+/- 100kb) overlaps with any of the following >>>>> in the GM12878 track. >>>>> > >>>>> > >>>>> > Layered H3K27Ac >>>>> > Layered H3K4Me1 >>>>> > Layered H3K4Me3 >>>>> > Transcription >>>>> > DNase Clusters >>>>> > DNase Clusters V1 >>>>> > Txn Fac ChIP V3 >>>>> > Txn Factor ChIP >>>>> >>>>> These tracks are available in AnnotationHub >>>>> >>>>> library(AnnotationHub) >>>>> hub = AnnotationHub() >>>>> m = metadata(hub) >>>>> >>>>> and then >>>>> >>>>> > head(m$Description[grep("H3k27Ac", m$Description, ignore.case=TRUE)]) >>>>> [1] "wgEncodeBroadHistoneHsmmtH3k27acStdPk" >>>>> [2] "wgEncodeBroadHistoneNhaH3k27acStdPk" >>>>> [3] "wgEncodeBroadHistoneA549H3k27acEtoh02Pk" >>>>> [4] "wgEncodeBroadHistoneK562H3k27acStdPk" >>>>> [5] "wgEncodeBroadHistoneGm12878H3k27acStdPk" >>>>> [6] "wgEncodeSydhHistoneMcf7H3k27acUcdPk" >>>>> >>>>> > xx = >>>>> >>>>> hub$goldenpath.hg19.encodeDCC.wgEncodeBroadHistone.wgEncodeBroad HistoneGm12878H3k27acStdPk.broadPeak_0.0.1.RData >>>>> Retrieving >>>>> >>>>> 'goldenpath/hg19/encodeDCC/wgEncodeBroadHistone/wgEncodeBroadHis toneGm12878H3k27acStdPk.broadPeak_0.0.1.RData' >>>>> >>>>> > head(xx) >>>>> GRanges with 6 ranges and 5 metadata columns: >>>>> seqnames ranges strand | name score >>>>> signalValue >>>>> <rle> <iranges> <rle> | <character> <integer> >>>>> <numeric> >>>>> [1] chr22 [17091048, 17091199] * | . 579 >>>>> 11.651761 >>>>> [2] chr22 [17305774, 17306441] * | . 531 >>>>> 10.111585 >>>>> [3] chr22 [17517314, 17517945] * | . 527 >>>>> 9.991400 >>>>> [4] chr22 [17518132, 17518819] * | . 837 >>>>> 19.847850 >>>>> pValue qValue >>>>> <numeric> <numeric> >>>>> [1] 2.4 -1 >>>>> [2] 15.4 -1 >>>>> [3] 100.0 -1 >>>>> [4] 15.3 -1 >>>>> [ reached getOption("max.print") -- omitted 2 rows ] >>>>> >>>>> and then ready for findOverlaps or other GRanges operations. There's a >>>>> vignette >>>>> in AnnotationHub >>>>> >>>>> >>>>> http://bioconductor.org/packages/release/bioc/html/AnnotationHub.html >>>>> >>>>> and it is mentioned in the work flow on annotation and >>>>> AnnotatingRanges work >>>>> flows are relevant >>>>> >>>>> http://bioconductor.org/help/workflows/annotation/annotation/ >>>>> http://bioconductor.org/help/workflows/annotation/AnnotatingRanges/ >>>>> >>>>> It would be interesting and useful to have this as a stand-alone work >>>>> flow, so >>>>> if you do pursue this root and are interested in writing up a workflow >>>>> then let >>>>> me know... >>>>> >>>>> Martin >>>>> >>>>> >>>>> > >>>>> > >>>>> > I would like to do it as batch and not one by one since the list of >>>>> probes is long. I have tried querying the GenomeBrowser database and also >>>>> the rtracklayer package in R but have not been successful. Would be great >>>>> if anyone can give me any ideas on how it can be done. >>>>> > >>>>> > Thanking you, >>>>> > Khadeeja >>>>> > [[alternative HTML version deleted]] >>>>> > >>>>> > >>>>> > >>>>> > _______________________________________________ >>>>> > Bioconductor mailing list >>>>> > Bioconductor@r-project.org >>>>> > https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> > Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> > >>>>> >>>>> >>>>> -- >>>>> Computational Biology / Fred Hutchinson Cancer Research Center >>>>> 1100 Fairview Ave. N. >>>>> PO Box 19024 Seattle, WA 98109 >>>>> >>>>> Location: Arnold Building M1 B861 >>>>> Phone: (206) 667-2793 >>>>> [[alternative HTML version deleted]] >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor@r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>>> >>>> >>> >> > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Got it. Thanks for the implementation details -- I think this can speed up a number of other functions of mine too. Much obliged! --t > On Oct 25, 2013, at 4:49 AM, Michael Lawrence <lawrence.michael@gene.com> wrote: > > It's the implementation: GRangesList is compressed; GenomicRangesList is not. > > Efficient: GRangesList compresses the data, i.e., it concatenates all the objects. Your code proceeds to loop over the elements, which needs to extract them. GenomicRangesList would keep everything as an ordinary list internally. > > General: If you're grabbing arbitrary tracks from AnnotationHub or anywhere else, they might not all have the same mcols; GRangesList requires that the set of mcols is the same for all objects (because they all end up in the same GRanges). > > > > >> On Thu, Oct 24, 2013 at 9:30 PM, Tim Triche, Jr. <tim.triche@gmail.com> wrote: >> > GenomicRangesList() should be used here instead of GRangesList, for efficiency, generality and perhaps semantics. >> >> What makes GenomicRangesList() more general or efficient? I did not realize that I should be doing this. >> >> Thanks, >> >> --t >> >> >> He that would live in peace and at ease, >> Must not speak all he knows, nor judge all he sees. >> >> Benjamin Franklin, Poor Richard's Almanack >> >> >>> On Thu, Oct 24, 2013 at 3:36 PM, Michael Lawrence <lawrence.michael@gene.com> wrote: >>> >>> >>> >>>> On Thu, Oct 24, 2013 at 2:54 PM, Tim Triche, Jr. <tim.triche@gmail.com> wrote: >>>> ps. Why +/- 100kb? That's an awful lot of padding given that tons of the genome falls into h3k4me1 peaks >>>> >>>> >>>> >>>> He that would live in peace and at ease, >>>> Must not speak all he knows, nor judge all he sees. >>>> >>>> Benjamin Franklin, Poor Richard's Almanack >>>> >>>> >>>>> On Thu, Oct 24, 2013 at 2:52 PM, Tim Triche, Jr. <tim.triche@gmail.com> wrote: >>>>> If I'm guessing right, something like this... ? >>>>> >>>>> grset <- readRDS("grset.rds") >>>>> show(grset) >>>>> ## >>>>> ## class: GenomicRatioSet >>>>> ## dim: 468211 32 >>>>> ## exptData(0): >>>>> ## assays(2): M CN >>>>> ## ... >>>>> ## >>>>> highVar <- names(which(rowData(grset)$varByGroupQval < 0.05)) >>>>> ## >>>>> ## about 50 probes, here >>>>> ## >>>>> ## could also use FDb.InfiniumMethylation.hg19 if not already mapped >>>>> >>>>> grow <- function(x, y) resize(x, width(x) + (2*y)) >>>>> probes <- grow(granges(grset)[highVar], 10e5) ## +/- 100kb >>> >>> This grow function is currently implemented as: >>> granges(grset)[highVar] + 1e5 >>> >>> If people like an alias like "grow" or "widen", we should consider adding it. >>> >>>>> require(AnnotationHub) >>>>> hub = AnnotationHub() >>>>> m = metadata(hub) >>>>> ## >>>>> ## ...time passes... >>>>> ## >>>>> >>>>> histoneMarks <- c('k27ac','k4me1','k4me3') >>>>> names(histoneMarks) <- histoneMarks >>>>> >>>>> pre <- 'goldenpath.hg19.encodeDCC.wgEncodeBroadHistone.wgEncodeBroadHistone' >>>>> post <- 'StdPk.broadPeak_0.0.1.RData' >>>>> gm12878 <- GRangesList(lapply(histoneMarks, >>>>> function(x) >>>>> hub[[paste0(pre, 'Gm12878H3', x, post)]])) >>> >>> I kind of think that GenomicRangesList() should be used here instead of GRangesList, for efficiency, generality and perhaps semantics. >>> >>>>> lapply(gm12878, function(x) names(subsetByOverlaps(probes, x))) >>>>> ## $k27ac >>>>> ## [1] "cg07238657" "cg06431905" "cg14555649" "cg00031967" "cg10311020" >>>>> ## ... >>>>> ## >>>>> ## $k4me1 >>>>> ## [1] "cg25243082" "cg06431905" "cg00031967" "cg10311020" "cg05482956" >>>>> ## ... >>>>> ## >>>>> ## $k4me3 >>>>> ## [1] "cg16220844" "cg24991732" "cg07238657" "cg06431905" "cg14555649" >>>>> ## ... >>>>> >>>>> Is that pretty similar to what you were thinking? The rest will be an issue of hunt-and-peck; you could also use countOverlaps, though it won't make it as easy to e.g. intersect h3k27ac and h3k4me1 to find active enhancers. >>>>> >>>>> hope this helps, >>>>> >>>>> --t >>>>> >>>>> >>>>> >>>>> He that would live in peace and at ease, >>>>> Must not speak all he knows, nor judge all he sees. >>>>> >>>>> Benjamin Franklin, Poor Richard's Almanack >>>>> >>>>> >>>>>> On Thu, Oct 24, 2013 at 11:21 AM, khadeeja ismail <hajjja@yahoo.com> wrote: >>>>>> Thanks much for the help. Will have a go and let you know. >>>>>> I have about 80 probes, from many different genes. I'm not sure if they can be summarized, but sure it's worth having a look. >>>>>> >>>>>> BR, >>>>>> Khadeeja >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Thursday, October 24, 2013 8:53 PM, Martin Morgan <mtmorgan@fhcrc.org> wrote: >>>>>> >>>>>> On 10/24/2013 09:37 AM, khadeeja ismail wrote: >>>>>> > >>>>>> > >>>>>> > Hi, >>>>>> > I am working with some 450k array probes which I need to look up in Geneome browser to see in which type of areas these probes are located in. For example, if the CpG site (+/- 100kb) overlaps with any of the following in the GM12878 track. >>>>>> > >>>>>> > >>>>>> > Layered H3K27Ac >>>>>> > Layered H3K4Me1 >>>>>> > Layered H3K4Me3 >>>>>> > Transcription >>>>>> > DNase Clusters >>>>>> > DNase Clusters V1 >>>>>> > Txn Fac ChIP V3 >>>>>> > Txn Factor ChIP >>>>>> >>>>>> These tracks are available in AnnotationHub >>>>>> >>>>>> library(AnnotationHub) >>>>>> hub = AnnotationHub() >>>>>> m = metadata(hub) >>>>>> >>>>>> and then >>>>>> >>>>>> > head(m$Description[grep("H3k27Ac", m$Description, ignore.case=TRUE)]) >>>>>> [1] "wgEncodeBroadHistoneHsmmtH3k27acStdPk" >>>>>> [2] "wgEncodeBroadHistoneNhaH3k27acStdPk" >>>>>> [3] "wgEncodeBroadHistoneA549H3k27acEtoh02Pk" >>>>>> [4] "wgEncodeBroadHistoneK562H3k27acStdPk" >>>>>> [5] "wgEncodeBroadHistoneGm12878H3k27acStdPk" >>>>>> [6] "wgEncodeSydhHistoneMcf7H3k27acUcdPk" >>>>>> >>>>>> > xx = >>>>>> hub$goldenpath.hg19.encodeDCC.wgEncodeBroadHistone.wgEncodeBroa dHistoneGm12878H3k27acStdPk.broadPeak_0.0.1.RData >>>>>> Retrieving >>>>>> 'goldenpath/hg19/encodeDCC/wgEncodeBroadHistone/wgEncodeBroadHi stoneGm12878H3k27acStdPk.broadPeak_0.0.1.RData' >>>>>> >>>>>> > head(xx) >>>>>> GRanges with 6 ranges and 5 metadata columns: >>>>>> seqnames ranges strand | name score signalValue >>>>>> <rle> <iranges> <rle> | <character> <integer> <numeric> >>>>>> [1] chr22 [17091048, 17091199] * | . 579 11.651761 >>>>>> [2] chr22 [17305774, 17306441] * | . 531 10.111585 >>>>>> [3] chr22 [17517314, 17517945] * | . 527 9.991400 >>>>>> [4] chr22 [17518132, 17518819] * | . 837 19.847850 >>>>>> pValue qValue >>>>>> <numeric> <numeric> >>>>>> [1] 2.4 -1 >>>>>> [2] 15.4 -1 >>>>>> [3] 100.0 -1 >>>>>> [4] 15.3 -1 >>>>>> [ reached getOption("max.print") -- omitted 2 rows ] >>>>>> >>>>>> and then ready for findOverlaps or other GRanges operations. There's a vignette >>>>>> in AnnotationHub >>>>>> >>>>>> http://bioconductor.org/packages/release/bioc/html/AnnotationHub.html >>>>>> >>>>>> and it is mentioned in the work flow on annotation and AnnotatingRanges work >>>>>> flows are relevant >>>>>> >>>>>> http://bioconductor.org/help/workflows/annotation/annotation/ >>>>>> http://bioconductor.org/help/workflows/annotation/AnnotatingRanges/ >>>>>> >>>>>> It would be interesting and useful to have this as a stand- alone work flow, so >>>>>> if you do pursue this root and are interested in writing up a workflow then let >>>>>> me know... >>>>>> >>>>>> Martin >>>>>> >>>>>> >>>>>> > >>>>>> > >>>>>> > I would like to do it as batch and not one by one since the list of probes is long. I have tried querying the GenomeBrowser database and also the rtracklayer package in R but have not been successful. Would be great if anyone can give me any ideas on how it can be done. >>>>>> > >>>>>> > Thanking you, >>>>>> > Khadeeja >>>>>> > [[alternative HTML version deleted]] >>>>>> > >>>>>> > >>>>>> > >>>>>> > _______________________________________________ >>>>>> > Bioconductor mailing list >>>>>> > Bioconductor@r-project.org >>>>>> > https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> > >>>>>> >>>>>> >>>>>> -- >>>>>> Computational Biology / Fred Hutchinson Cancer Research Center >>>>>> 1100 Fairview Ave. N. >>>>>> PO Box 19024 Seattle, WA 98109 >>>>>> >>>>>> Location: Arnold Building M1 B861 >>>>>> Phone: (206) 667-2793 >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor@r-project.org >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi, Thanks for a lot of help and an interesting discussion. IT has been very useful. I ran the code and was able to get the overlaps for the histone markers, but still need to try the same for the tnx, and dnase etc. Hope no problems pop up. Btw, I meant 100bp, and not 100kb. Hector, I had some problem with the browser, but will try again later and let you know. Need to spend some more time on this. Thanks again, and BR, Khadeeja On Friday, October 25, 2013 5:54 PM, "Tim Triche, Jr." <tim.triche@gmail.com> wrote: Got it. Thanks for the implementation details -- I think this can speed up a number of other functions of mine too. Much obliged! --t On Oct 25, 2013, at 4:49 AM, Michael Lawrence <lawrence.michael@gene.com> wrote: It's the implementation: GRangesList is compressed; GenomicRangesList is not. > > >Efficient: GRangesList compresses the data, i.e., it concatenates all the objects. Your code proceeds to loop over the elements, which needs to extract them. GenomicRangesList would keep everything as an ordinary list internally. > >General: If you're grabbing arbitrary tracks from AnnotationHub or anywhere else, they might not all have the same mcols; GRangesList requires that the set of mcols is the same for all objects (because they all end up in the same GRanges). > > > > > > >On Thu, Oct 24, 2013 at 9:30 PM, Tim Triche, Jr. <tim.triche@gmail.com> wrote: > >> GenomicRangesList() should be used here instead of GRangesList, for efficiency, generality and perhaps semantics. >> >> >>What makes GenomicRangesList() more general or efficient?  I did not realize that I should be doing this. >> >> >>Thanks, >> >> >>--t >> >> >> >> >>He that would live in peace and at ease, >>Must not speak all he knows, nor judge all he sees. >> >> >> >>Benjamin Franklin, Poor Richard's Almanack >> >> >>On Thu, Oct 24, 2013 at 3:36 PM, Michael Lawrence <lawrence.michael@gene.com> wrote: >> >> >>> >>> >>> >>> >>>On Thu, Oct 24, 2013 at 2:54 PM, Tim Triche, Jr. <tim.triche@gmail.com> wrote: >>> >>>ps.  Why +/- 100kb?  That's an awful lot of padding given that tons of the genome falls into h3k4me1 peaks >>>> >>>> >>>> >>>> >>>> >>>> >>>>He that would live in peace and at ease, >>>>Must not speak all he knows, nor judge all he sees. >>>> >>>> >>>> >>>>Benjamin Franklin, Poor Richard's Almanack >>>> >>>> >>>>On Thu, Oct 24, 2013 at 2:52 PM, Tim Triche, Jr. <tim.triche@gmail.com> wrote: >>>> >>>>If I'm guessing right, something like this... ? >>>>> >>>>> >>>>>grset <- readRDS("grset.rds") >>>>>show(grset) >>>>>## >>>>>## class: GenomicRatioSet >>>>>## dim: 468211 32 >>>>>## exptData(0): >>>>>## assays(2): M CN >>>>>## ... >>>>>## >>>>>highVar <- names(which(rowData(grset)$varByGroupQval < 0.05)) >>>>>## >>>>>## about 50 probes, here >>>>>## >>>>>## could also use FDb.InfiniumMethylation.hg19 if not already mapped >>>>> >>>>> >>>>>grow <- function(x, y) resize(x, width(x) + (2*y)) >>>>> >>>>>probes <- grow(granges(grset)[highVar], 10e5) ## +/- 100kb >>>>> >>>>> >>> >>> >>>This grow function is currently implemented as: >>> >>>granges(grset)[highVar] + 1e5 >>> >>> >>>If people like an alias like "grow" or "widen", we should consider adding it. >>> >>> >>>require(AnnotationHub) >>>>>hub = AnnotationHub() >>>>>m = metadata(hub) >>>>>## >>>>>## ...time passes... >>>>>## >>>>> >>>>> >>>>>histoneMarks <- c('k27ac','k4me1','k4me3') >>>>>names(histoneMarks) <- histoneMarks >>>>> >>>>> >>>>>pre <- 'goldenpath.hg19.encodeDCC.wgEncodeBroadHistone.wgEncodeBroadHistone' >>>>>post <- 'StdPk.broadPeak_0.0.1.RData' >>>>>gm12878 <- GRangesList(lapply(histoneMarks, >>>>>                              function(x) >>>>>                                hub[[paste0(pre, 'Gm12878H3', x, post)]])) >>>>> >>>>> >>> >>> >>>I kind of think that GenomicRangesList() should be used here instead of GRangesList, for efficiency, generality and perhaps semantics. >>> >>> >>>lapply(gm12878, function(x) names(subsetByOverlaps(probes, x))) >>>>>## $k27ac >>>>>## [1] "cg07238657" "cg06431905" "cg14555649" "cg00031967" "cg10311020" >>>>>## ... >>>>>## >>>>>## $k4me1 >>>>>## [1] "cg25243082" "cg06431905" "cg00031967" "cg10311020" "cg05482956" >>>>>## ... >>>>>## >>>>>## $k4me3 >>>>>## [1] "cg16220844" "cg24991732" "cg07238657" "cg06431905" "cg14555649" >>>>>## ... >>>>> >>>>> >>>>>Is that pretty similar to what you were thinking?  The rest will be an issue of hunt-and-peck; you could also use countOverlaps, though it won't make it as easy to e.g. intersect h3k27ac and h3k4me1 to find active enhancers. >>>>> >>>>> >>>>>hope this helps, >>>>> >>>>> >>>>>--t >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>He that would live in peace and at ease, >>>>>Must not speak all he knows, nor judge all he sees. >>>>> >>>>> >>>>> >>>>>Benjamin Franklin, Poor Richard's Almanack >>>>> >>>>> >>>>> >>>>>On Thu, Oct 24, 2013 at 11:21 AM, khadeeja ismail <hajjja@yahoo.com> wrote: >>>>> >>>>>Thanks much for the help. Will have a go and let you know. >>>>>>I have about 80 probes, from many different genes. I'm not sure if they can be summarized, but sure it's worth having a look. >>>>>> >>>>>>BR, >>>>>>Khadeeja >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>On Thursday, October 24, 2013 8:53 PM, Martin Morgan <mtmorgan@fhcrc.org> wrote: >>>>>> >>>>>>On 10/24/2013 09:37 AM, khadeeja ismail wrote: >>>>>>> >>>>>>> >>>>>>> Hi, >>>>>>> I am working  with some 450k array probes which I need to look up in Geneome browser to see in which type of areas these probes are located in.  For example, if the CpG site (+/- 100kb) overlaps with any of the following in the GM12878 track. >>>>>>> >>>>>>> >>>>>>> Layered H3K27Ac >>>>>>> Layered H3K4Me1 >>>>>>> Layered H3K4Me3 >>>>>>> Transcription >>>>>>> DNase Clusters >>>>>>> DNase Clusters V1 >>>>>>> Txn Fac ChIP V3 >>>>>>> Txn Factor ChIP >>>>>> >>>>>>These tracks are available in AnnotationHub >>>>>> >>>>>>   library(AnnotationHub) >>>>>>   hub = AnnotationHub() >>>>>>   m = metadata(hub) >>>>>> >>>>>>and then >>>>>> >>>>>>> head(m$Description[grep("H3k27Ac", m$Description, ignore.case=TRUE)]) >>>>>>[1] "wgEncodeBroadHistoneHsmmtH3k27acStdPk" >>>>>>[2] "wgEncodeBroadHistoneNhaH3k27acStdPk" >>>>>>[3] "wgEncodeBroadHistoneA549H3k27acEtoh02Pk" >>>>>>[4] "wgEncodeBroadHistoneK562H3k27acStdPk" >>>>>>[5] "wgEncodeBroadHistoneGm12878H3k27acStdPk" >>>>>>[6] "wgEncodeSydhHistoneMcf7H3k27acUcdPk" >>>>>> >>>>>>> xx = >>>>>>hub$goldenpath.hg19.encodeDCC.wgEncodeBroadHistone.wgEncodeBroad HistoneGm12878H3k27acStdPk.broadPeak_0.0.1.RData >>>>>>Retrieving >>>>>>'goldenpath/hg19/encodeDCC/wgEncodeBroadHistone/wgEncodeBroadHis toneGm12878H3k27acStdPk.broadPeak_0.0.1.RData' >>>>>> >>>>>>> head(xx) >>>>>>GRanges with 6 ranges and 5 metadata columns: >>>>>>       seqnames               ranges strand |        name score signalValue >>>>>>          <rle>            <iranges>  <rle> | <character> <integer>   <numeric> >>>>>>   [1]    chr22 [17091048, 17091199]      * |           . 579   11.651761 >>>>>>   [2]    chr22 [17305774, 17306441]      * |           . 531   10.111585 >>>>>>   [3]    chr22 [17517314, 17517945]      * |           . 527    9.991400 >>>>>>   [4]    chr22 [17518132, 17518819]      * |           . 837   19.847850 >>>>>>          pValue    qValue >>>>>>       <numeric> <numeric> >>>>>>   [1]       2.4        -1 >>>>>>   [2]      15.4        -1 >>>>>>   [3]     100.0        -1 >>>>>>   [4]      15.3        -1 >>>>>>  [ reached getOption("max.print") -- omitted 2 rows ] >>>>>> >>>>>>and then ready for findOverlaps or other GRanges operations. There's a vignette >>>>>>in AnnotationHub >>>>>> >>>>>> http://bioconductor.org/packages/release/bioc/html/AnnotationHub.html >>>>>> >>>>>>and it is mentioned in the work flow on annotation and AnnotatingRanges work >>>>>>flows are relevant >>>>>> >>>>>>  http://bioconductor.org/help/workflows/annotation/annotation/ >>>>>> http://bioconductor.org/help/workflows/annotation/AnnotatingRanges/ >>>>>> >>>>>>It would be interesting and useful to have this as a stand-alone work flow, so >>>>>>if you do pursue this root and are interested in writing up a workflow then let >>>>>>me know... >>>>>> >>>>>>Martin >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> I would like to do it as batch and not one by one since the list of probes is long. I have tried querying the GenomeBrowser database and also the rtracklayer package in R but have not been successful. Would be great if anyone can give me any ideas on how it can be done. >>>>>>> >>>>>>> Thanking you, >>>>>>> Khadeeja >>>>>>>     [[alternative HTML version deleted]] >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioconductor mailing list >>>>>>> Bioconductor@r-project.org >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>> >>>>>> >>>>>> >>>>>>-- >>>>>>Computational Biology / Fred Hutchinson Cancer Research Center >>>>>>1100 Fairview Ave. N. >>>>>>PO Box 19024 Seattle, WA 98109 >>>>>> >>>>>>Location: Arnold Building M1 B861 >>>>>>Phone: (206) 667-2793 >>>>>>        [[alternative HTML version deleted]] >>>>>> >>>>>> >>>>>>_______________________________________________ >>>>>>Bioconductor mailing list >>>>>>Bioconductor@r-project.org >>>>>>https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> >>>>>> >>>>> >>>> >>> >> > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Khadeeja, You can try the 'epivizr' package as well. Following Tim's code you can try require(epivizr) mgr <- startEpiviz() devices <- lapply(seq(along=gm12978), function(i) mgr$addDevice(gm128778[[i]], histoneMarks[i])) mgr$slideshow(probes) This will open a browser window showing these tracks as regions and allow you to browse through these regions in the order in the probes GRanges object. Check the man pages to see how to display these data using other track types. Let me know how it goes. Hector On Thu, Oct 24, 2013 at 5:52 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > If I'm guessing right, something like this... ? > > grset <- readRDS("grset.rds") > show(grset) > ## > ## class: GenomicRatioSet > ## dim: 468211 32 > ## exptData(0): > ## assays(2): M CN > ## ... > ## > highVar <- names(which(rowData(grset)$varByGroupQval < 0.05)) > ## > ## about 50 probes, here > ## > ## could also use FDb.InfiniumMethylation.hg19 if not already mapped > > grow <- function(x, y) resize(x, width(x) + (2*y)) > probes <- grow(granges(grset)[highVar], 10e5) ## +/- 100kb > > require(AnnotationHub) > hub = AnnotationHub() > m = metadata(hub) > ## > ## ...time passes... > ## > > histoneMarks <- c('k27ac','k4me1','k4me3') > names(histoneMarks) <- histoneMarks > > pre <- > 'goldenpath.hg19.encodeDCC.wgEncodeBroadHistone.wgEncodeBroadHistone' > post <- 'StdPk.broadPeak_0.0.1.RData' > gm12878 <- GRangesList(lapply(histoneMarks, > function(x) > hub[[paste0(pre, 'Gm12878H3', x, post)]])) > > lapply(gm12878, function(x) names(subsetByOverlaps(probes, x))) > ## $k27ac > ## [1] "cg07238657" "cg06431905" "cg14555649" "cg00031967" "cg10311020" > ## ... > ## > ## $k4me1 > ## [1] "cg25243082" "cg06431905" "cg00031967" "cg10311020" "cg05482956" > ## ... > ## > ## $k4me3 > ## [1] "cg16220844" "cg24991732" "cg07238657" "cg06431905" "cg14555649" > ## ... > > Is that pretty similar to what you were thinking? The rest will be an > issue of hunt-and-peck; you could also use countOverlaps, though it won't > make it as easy to e.g. intersect h3k27ac and h3k4me1 to find active > enhancers. > > hope this helps, > > --t > > > > *He that would live in peace and at ease, * > *Must not speak all he knows, nor judge all he sees.* > * > * > Benjamin Franklin, Poor Richard's > Almanack<http: archive.org="" details="" poorrichardsalma00franrich=""> > > > On Thu, Oct 24, 2013 at 11:21 AM, khadeeja ismail <hajjja@yahoo.com> > wrote: > > > Thanks much for the help. Will have a go and let you know. > > I have about 80 probes, from many different genes. I'm not sure if they > > can be summarized, but sure it's worth having a look. > > > > BR, > > Khadeeja > > > > > > > > > > > > > > On Thursday, October 24, 2013 8:53 PM, Martin Morgan <mtmorgan@fhcrc.org> > > > wrote: > > > > On 10/24/2013 09:37 AM, khadeeja ismail wrote: > > > > > > > > > Hi, > > > I am working with some 450k array probes which I need to look up in > > Geneome browser to see in which type of areas these probes are located > in. > > For example, if the CpG site (+/- 100kb) overlaps with any of the > following > > in the GM12878 track. > > > > > > > > > Layered H3K27Ac > > > Layered H3K4Me1 > > > Layered H3K4Me3 > > > Transcription > > > DNase Clusters > > > DNase Clusters V1 > > > Txn Fac ChIP V3 > > > Txn Factor ChIP > > > > These tracks are available in AnnotationHub > > > > library(AnnotationHub) > > hub = AnnotationHub() > > m = metadata(hub) > > > > and then > > > > > head(m$Description[grep("H3k27Ac", m$Description, ignore.case=TRUE)]) > > [1] "wgEncodeBroadHistoneHsmmtH3k27acStdPk" > > [2] "wgEncodeBroadHistoneNhaH3k27acStdPk" > > [3] "wgEncodeBroadHistoneA549H3k27acEtoh02Pk" > > [4] "wgEncodeBroadHistoneK562H3k27acStdPk" > > [5] "wgEncodeBroadHistoneGm12878H3k27acStdPk" > > [6] "wgEncodeSydhHistoneMcf7H3k27acUcdPk" > > > > > xx = > > > > > hub$goldenpath.hg19.encodeDCC.wgEncodeBroadHistone.wgEncodeBroadHist oneGm12878H3k27acStdPk.broadPeak_0.0.1.RData > > Retrieving > > > > > 'goldenpath/hg19/encodeDCC/wgEncodeBroadHistone/wgEncodeBroadHistone Gm12878H3k27acStdPk.broadPeak_0.0.1.RData' > > > > > head(xx) > > GRanges with 6 ranges and 5 metadata columns: > > seqnames ranges strand | name score > > signalValue > > <rle> <iranges> <rle> | <character> <integer> > > <numeric> > > [1] chr22 [17091048, 17091199] * | . 579 > > 11.651761 > > [2] chr22 [17305774, 17306441] * | . 531 > > 10.111585 > > [3] chr22 [17517314, 17517945] * | . 527 > > 9.991400 > > [4] chr22 [17518132, 17518819] * | . 837 > > 19.847850 > > pValue qValue > > <numeric> <numeric> > > [1] 2.4 -1 > > [2] 15.4 -1 > > [3] 100.0 -1 > > [4] 15.3 -1 > > [ reached getOption("max.print") -- omitted 2 rows ] > > > > and then ready for findOverlaps or other GRanges operations. There's a > > vignette > > in AnnotationHub > > > > http://bioconductor.org/packages/release/bioc/html/AnnotationHub.html > > > > and it is mentioned in the work flow on annotation and AnnotatingRanges > > work > > flows are relevant > > > > http://bioconductor.org/help/workflows/annotation/annotation/ > > http://bioconductor.org/help/workflows/annotation/AnnotatingRanges/ > > > > It would be interesting and useful to have this as a stand-alone work > > flow, so > > if you do pursue this root and are interested in writing up a workflow > > then let > > me know... > > > > Martin > > > > > > > > > > > > > I would like to do it as batch and not one by one since the list of > > probes is long. I have tried querying the GenomeBrowser database and also > > the rtracklayer package in R but have not been successful. Would be great > > if anyone can give me any ideas on how it can be done. > > > > > > Thanking you, > > > Khadeeja > > > [[alternative HTML version deleted]] > > > > > > > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor@r-project.org > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > -- > > Computational Biology / Fred Hutchinson Cancer Research Center > > 1100 Fairview Ave. N. > > PO Box 19024 Seattle, WA 98109 > > > > Location: Arnold Building M1 B861 > > Phone: (206) 667-2793 > > [[alternative HTML version deleted]] > > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
@khadeeja-ismail-4711
Last seen 8.8 years ago
Hi Michael, We obtained a list of about 80 probes from differential methylation analysis, mapping to genes associated with the phenotype we are studying. One of my colleagues had been looking at the probes one by one on genome browser, to see if the CpG site was anywhere near a regulatory region, but the task turns out too tedious. So we wondered if we can get the locations for the regulatory regions and find overlaps with the CpG sites. There are probably better ways to do it. I had a look at ggbio and GViz on BC website, and I'm definitely going to try them asap. Meanwhile I hope I have made my problem a bit more clearer, and if you have any other suggestions I'd be more than happy to know. Thanks a lot! I might be coming back with more questions soon. Regards, Khadeeja On Sunday, October 27, 2013 12:20 AM, Michael Lawrence <lawrence.michael@gene.com> wrote: Please give a little more detail about your problem. If the number of probes is large enough, you might want to summarize the overlaps and plot the summaries, rather than looking at every region individually. If you did want to go that route, it would be best to use ggbio and/or GViz to automate the plotting. The UCSC browser is simply not designed for batch plotting. Michael On Thu, Oct 24, 2013 at 9:37 AM, khadeeja ismail <hajjja@yahoo.com> wrote: > > > Hi, > I am working  with some 450k array probes which I need to look up in > Geneome browser to see in which type of areas these probes are located in. > For example, if the CpG site (+/- 100kb) overlaps with any of the following > in the GM12878 track. > > > Layered H3K27Ac > Layered H3K4Me1 > Layered H3K4Me3 > Transcription > DNase Clusters > DNase Clusters V1 > Txn Fac ChIP V3 > Txn Factor ChIP > > > I would like to do it as batch and not one by one since the list of probes > is long. I have tried querying the GenomeBrowser database and also the > rtracklayer package in R but have not been successful. Would be great if > anyone can give me any ideas on how it can be done. > > Thanking you, > Khadeeja >        [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >     [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
If it's in UCSC, you can probably find the right track in AnnotationHub. Michael On Sat, Oct 26, 2013 at 5:02 PM, khadeeja ismail <hajjja@yahoo.com> wrote: > Hi Michael, > > We obtained a list of about 80 probes from differential methylation > analysis, mapping to genes associated with the phenotype we are studying. > One of my colleagues had been looking at the probes one by one on genome > browser, to see if the CpG site was anywhere near a regulatory region, but > the task turns out too tedious. So we wondered if we can get the locations > for the regulatory regions and find overlaps with the CpG sites. There are > probably better ways to do it. I had a look at ggbio and GViz on BC > website, and I'm definitely going to try them asap. Meanwhile I hope I have > made my problem a bit more clearer, and if you have any other suggestions > I'd be more than happy to know. Thanks a lot! I might be coming back with > more questions soon. > > Regards, > Khadeeja > > > On Sunday, October 27, 2013 12:20 AM, Michael Lawrence < > lawrence.michael@gene.com> wrote: > Please give a little more detail about your problem. If the number of > probes is large enough, you might want to summarize the overlaps and plot > the summaries, rather than looking at every region individually. If you did > want to go that route, it would be best to use ggbio and/or GViz to > automate the plotting. The UCSC browser is simply not designed for batch > plotting. > > Michael > > > On Thu, Oct 24, 2013 at 9:37 AM, khadeeja ismail <hajjja@yahoo.com> wrote: > > > > > > > Hi, > > I am working with some 450k array probes which I need to look up in > > Geneome browser to see in which type of areas these probes are located > in. > > For example, if the CpG site (+/- 100kb) overlaps with any of the > following > > in the GM12878 track. > > > > > > Layered H3K27Ac > > Layered H3K4Me1 > > Layered H3K4Me3 > > Transcription > > DNase Clusters > > DNase Clusters V1 > > Txn Fac ChIP V3 > > Txn Factor ChIP > > > > > > I would like to do it as batch and not one by one since the list of > probes > > is long. I have tried querying the GenomeBrowser database and also the > > rtracklayer package in R but have not been successful. Would be great if > > anyone can give me any ideas on how it can be done. > > > > Thanking you, > > Khadeeja > > > [[alternative HTML version deleted]] > > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Thanks for the information. It had been very helpful. Best regards, Khadeeja On Sunday, October 27, 2013 7:05 AM, Michael Lawrence <lawrence.michael@gene.com> wrote: If it's in UCSC, you can probably find the right track in AnnotationHub. Michael On Sat, Oct 26, 2013 at 5:02 PM, khadeeja ismail <hajjja@yahoo.com> wrote: Hi Michael, > > >We obtained a list of about 80 probes from differential methylation analysis, mapping to genes associated with the phenotype we are studying. One of my colleagues had been looking at the probes one by one on genome browser, to see if the CpG site was anywhere near a regulatory region, but the task turns out too tedious. So we wondered if we can get the locations for the regulatory regions and find overlaps with the CpG sites. There are probably better ways to do it. I had a look at ggbio and GViz on BC website, and I'm definitely going to try them asap. Meanwhile I hope I have made my problem a bit more clearer, and if you have any other suggestions I'd be more than happy to know. Thanks a lot! I might be coming back with more questions soon. > > > >Regards, >Khadeeja > > > > >On Sunday, October 27, 2013 12:20 AM, Michael Lawrence <lawrence.michael@gene.com> wrote: > >Please give a little more detail about your problem. If the number of >probes is large enough, you might want to summarize the overlaps and plot >the summaries, rather than looking at every region individually. If you did >want to go that route, it would be best to use ggbio and/or GViz to >automate the plotting. The UCSC browser is simply not designed for batch >plotting. > >Michael > > > >On Thu, Oct 24, 2013 at 9:37 AM, khadeeja ismail <hajjja@yahoo.com> wrote: > >> >> >> Hi, >> I am working  with some 450k array probes which I need to look up in >> Geneome browser to see in which type of areas these probes are located in. >> For example, if the CpG site (+/- 100kb) overlaps with any of the following >> in the GM12878 track. >> >> >> Layered H3K27Ac >> Layered H3K4Me1 >> Layered H3K4Me3 >> Transcription >> DNase Clusters >> DNase Clusters V1 >> Txn Fac ChIP V3 >> Txn Factor ChIP >> >> >> I would like to do it as batch and not one by one since the list of probes >> is long. I have tried querying the GenomeBrowser database and also the >> rtracklayer package in R but have not been successful. Would be great if >> anyone can give me any ideas on how it can be done. >> >> Thanking you, >> Khadeeja >>        [[alternative HTML version deleted]] >> >> > >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >    [[alternative HTML version deleted]] > >_______________________________________________ >Bioconductor mailing list >Bioconductor@r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 779 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6