assignChromosomeRegion in 2.6.0 ChIPpeakAnno

0

Entering edit mode

Julie Zhu ★ 4.3k

@julie-zhu-3596

Last seen 18 months ago

United States

Stefan, Could you please send me the sessionInfo? I just want to make sure you installed version 2.6.0. Thanks! I noticed that when I try to install ChIPpeakAnno with the following code, I get 2.4 version instead. source("http://bioconductor.org/biocLite.R") biocLite("ChIPpeakAnno") Best regards, Julie On 3/14/13 8:02 PM, "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> wrote: > Dear Julie, > > I apologize for emailing you about this directly, but I'm in a bit of > time-crunch, otherwise I'd post this to the mailing list. I'm using > ChIPpeakAnno (see installed packages below), and while the other functions > I've tested (BED2Ranges, annotatePeakInBatch, getAnnotation) in your package > (2.6.0) work, assignChromosomeRegion gives me a "could not find error". > >> assignChromosomeRegion(peaks.RD=rda.dRF8,exon=Exon,TSS=TSS,utr5=utr 5,utr3=utr >> 3,proximal.promoter.cutoff=1000,immediate.downstream.cutoff=1000) > Error: could not find function "assignChromosomeRegion" > > Below I've listed my libraries, so that you can see what I have loaded in R > (2.15). I'm running R in Rstudio (0.97.332). Although the function isn't > found, when I enter by hand: "assignChromosomeRegion(" and hit tab, it does > give me the parameter options for the function assignChromosomeRegion. But > when I execute, it says it couldn't find the function. Can you help? > Thank you and Best wishes, > > - Stefan. > > > Packages in library > ?/Library/Frameworks/R.framework/Versions/2.15/Resources/library?: > > AnnotationDbi Annotation Database Interface > base The R Base Package > Biobase Biobase: Base functions for Bioconductor > BiocGenerics Generic functions for Bioconductor > BiocInstaller Install/Update Bioconductor and CRAN Packages > biomaRt Interface to BioMart databases (e.g. Ensembl, > COSMIC > ,Wormbase and Gramene) > Biostrings String objects representing biological > sequences, and > matching algorithms > bitops Functions for Bitwise operations > boot Bootstrap Functions (originally by Angelo Canty > for S) > BSgenome Infrastructure for Biostrings-based genome data > packages > BSgenome.Ecoli.NCBI.20080805 > Escherichia coli full genomes > caTools Tools: moving window statistics, GIF, Base64, > ROC AUC, etc. > ChIPpeakAnno Batch annotation of the peaks identified from > either > ChIP-seq, ChIP-chip experiments or any > experiments resulted > in large number of chromosome ranges. > class Functions for Classification > cluster Cluster Analysis Extended Rousseeuw et al. > codetools Code Analysis Tools for R > colorspace Color Space Manipulation > compiler The R Compiler Package > datasets The R Datasets Package > DBI R Database Interface > dichromat Color Schemes for Dichromats > digest Create cryptographic hash digests of R objects > foreign Read Data Stored by Minitab, S, SAS, SPSS, > Stata, Systat, > dBase, ... > gdata Various R programming tools for data > manipulation > GenomicRanges Representation and manipulation of genomic > intervals > ggplot2 An implementation of the Grammar of Graphics > GO.db A set of annotation maps describing the entire > Gene > Ontology > gplots Various R programming tools for plotting data > graphics The R Graphics Package > grDevices The R Graphics Devices and Support for Colours > and Fonts > grid The Grid Graphics Package > gtable Arrange grobs in tables. > gtools Various R programming tools > IRanges Infrastructure for manipulating intervals on > sequences > KernSmooth Functions for kernel smoothing for Wand & Jones > (1995) > labeling Axis Labeling > lattice Lattice Graphics > limma Linear Models for Microarray Data > MASS Support Functions and Datasets for Venables and > Ripley's > MASS > Matrix Sparse and Dense Matrix Classes and Methods > methods Formal Methods and Classes > mgcv Mixed GAM Computation Vehicle with GCV/AIC/REML > smoothness > estimation > multtest Resampling-based multiple hypothesis testing > munsell Munsell colour system > nlme Linear and Nonlinear Mixed Effects Models > nnet Feed-forward Neural Networks and Multinomial > Log-Linear > Models > org.Hs.eg.db Genome wide annotation for Human > parallel Support for Parallel computation in R > plyr Tools for splitting, applying and combining data > proto Prototype object-based programming > RColorBrewer ColorBrewer palettes > RCurl General network (HTTP/FTP/...) client interface > for R > reshape2 Flexibly reshape data: a reboot of the reshape > package. > rpart Recursive Partitioning > RSQLite SQLite interface for R > scales Scale functions for graphics. > spatial Functions for Kriging and Point Pattern Analysis > splines Regression Spline Functions and Classes > stats The R Stats Package > stats4 Statistical Functions using S4 Classes > stringr Make it easier to work with strings. > survival Survival Analysis > tcltk Tcl/Tk Interface > tools Tools for Package Development > utils The R Utils Package > VennDiagram Generate high-resolution Venn and Euler plots > XML Tools for parsing and generating XML within R > and S-Plus. > > Packages in library ?/Applications/RStudio.app/Contents/Resources/R/library?: > > manipulate Interactive Plots for RStudio > rstudio Tools and Utilities for RStudio

Microarray Annotation GO Network Regression Survival Infrastructure Escherichia coli ROC • 2.9k views

ADD COMMENT • link 12.1 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Julie Zhu ★ 4.3k

@julie-zhu-3596

Last seen 18 months ago

United States

Stefan, Thanks for reporting this! Actually the function assignChromosomeRegion is not exported. Could you please try the following to see if you can use it? Thanks! ChIPpeakAnno:::assignChromosomeRegion Best regards, Julie On 3/14/13 7:36 PM, "Lihua Julie Zhu" <julie.zhu at="" umassmed.edu=""> wrote: > Stefan, > > Could you please send me the sessionInfo? I just want to make sure you > installed version 2.6.0. Thanks! > > I noticed that when I try to install ChIPpeakAnno with the following code, I > get 2.4 version instead. > > source("http://bioconductor.org/biocLite.R") > biocLite("ChIPpeakAnno") > > Best regards, > > Julie > > > On 3/14/13 8:02 PM, "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> wrote: > >> Dear Julie, >> >> I apologize for emailing you about this directly, but I'm in a bit of >> time-crunch, otherwise I'd post this to the mailing list. I'm using >> ChIPpeakAnno (see installed packages below), and while the other functions >> I've tested (BED2Ranges, annotatePeakInBatch, getAnnotation) in your package >> (2.6.0) work, assignChromosomeRegion gives me a "could not find error". >> >>> assignChromosomeRegion(peaks.RD=rda.dRF8,exon=Exon,TSS=TSS,utr5=utr5,u tr3=ut>>> r >>> 3,proximal.promoter.cutoff=1000,immediate.downstream.cutoff=1000) >> Error: could not find function "assignChromosomeRegion" >> >> Below I've listed my libraries, so that you can see what I have loaded in R >> (2.15). I'm running R in Rstudio (0.97.332). Although the function isn't >> found, when I enter by hand: "assignChromosomeRegion(" and hit tab, it does >> give me the parameter options for the function assignChromosomeRegion. But >> when I execute, it says it couldn't find the function. Can you help? >> Thank you and Best wishes, >> >> - Stefan. >> >> >> Packages in library >> ?/Library/Frameworks/R.framework/Versions/2.15/Resources/library?: >> >> AnnotationDbi Annotation Database Interface >> base The R Base Package >> Biobase Biobase: Base functions for Bioconductor >> BiocGenerics Generic functions for Bioconductor >> BiocInstaller Install/Update Bioconductor and CRAN Packages >> biomaRt Interface to BioMart databases (e.g. Ensembl, >> COSMIC >> ,Wormbase and Gramene) >> Biostrings String objects representing biological >> sequences, and >> matching algorithms >> bitops Functions for Bitwise operations >> boot Bootstrap Functions (originally by Angelo Canty >> for S) >> BSgenome Infrastructure for Biostrings-based genome data >> packages >> BSgenome.Ecoli.NCBI.20080805 >> Escherichia coli full genomes >> caTools Tools: moving window statistics, GIF, Base64, >> ROC AUC, etc. >> ChIPpeakAnno Batch annotation of the peaks identified from >> either >> ChIP-seq, ChIP-chip experiments or any >> experiments resulted >> in large number of chromosome ranges. >> class Functions for Classification >> cluster Cluster Analysis Extended Rousseeuw et al. >> codetools Code Analysis Tools for R >> colorspace Color Space Manipulation >> compiler The R Compiler Package >> datasets The R Datasets Package >> DBI R Database Interface >> dichromat Color Schemes for Dichromats >> digest Create cryptographic hash digests of R objects >> foreign Read Data Stored by Minitab, S, SAS, SPSS, >> Stata, Systat, >> dBase, ... >> gdata Various R programming tools for data >> manipulation >> GenomicRanges Representation and manipulation of genomic >> intervals >> ggplot2 An implementation of the Grammar of Graphics >> GO.db A set of annotation maps describing the entire >> Gene >> Ontology >> gplots Various R programming tools for plotting data >> graphics The R Graphics Package >> grDevices The R Graphics Devices and Support for Colours >> and Fonts >> grid The Grid Graphics Package >> gtable Arrange grobs in tables. >> gtools Various R programming tools >> IRanges Infrastructure for manipulating intervals on >> sequences >> KernSmooth Functions for kernel smoothing for Wand & Jones >> (1995) >> labeling Axis Labeling >> lattice Lattice Graphics >> limma Linear Models for Microarray Data >> MASS Support Functions and Datasets for Venables and >> Ripley's >> MASS >> Matrix Sparse and Dense Matrix Classes and Methods >> methods Formal Methods and Classes >> mgcv Mixed GAM Computation Vehicle with GCV/AIC/REML >> smoothness >> estimation >> multtest Resampling-based multiple hypothesis testing >> munsell Munsell colour system >> nlme Linear and Nonlinear Mixed Effects Models >> nnet Feed-forward Neural Networks and Multinomial >> Log-Linear >> Models >> org.Hs.eg.db Genome wide annotation for Human >> parallel Support for Parallel computation in R >> plyr Tools for splitting, applying and combining >> data >> proto Prototype object-based programming >> RColorBrewer ColorBrewer palettes >> RCurl General network (HTTP/FTP/...) client interface >> for R >> reshape2 Flexibly reshape data: a reboot of the reshape >> package. >> rpart Recursive Partitioning >> RSQLite SQLite interface for R >> scales Scale functions for graphics. >> spatial Functions for Kriging and Point Pattern >> Analysis >> splines Regression Spline Functions and Classes >> stats The R Stats Package >> stats4 Statistical Functions using S4 Classes >> stringr Make it easier to work with strings. >> survival Survival Analysis >> tcltk Tcl/Tk Interface >> tools Tools for Package Development >> utils The R Utils Package >> VennDiagram Generate high-resolution Venn and Euler plots >> XML Tools for parsing and generating XML within R >> and S-Plus. >> >> Packages in library ?/Applications/RStudio.app/Contents/Resources/R/library?: >> >> manipulate Interactive Plots for RStudio >> rstudio Tools and Utilities for RStudio > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 12.1 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Hi Julie, yes, that worked! Thank you for the quick help, I very much appreciate it! Thank you and Best wishes, - Stefan. ----- Original Message ----- From: "Lihua Zhu (Julie)" <julie.zhu@umassmed.edu> To: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu="">, "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> Cc: "<bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org=""> Sent: Thursday, March 14, 2013 8:50:51 PM Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno Stefan, Thanks for reporting this! Actually the function assignChromosomeRegion is not exported. Could you please try the following to see if you can use it? Thanks! ChIPpeakAnno:::assignChromosomeRegion Best regards, Julie On 3/14/13 7:36 PM, "Lihua Julie Zhu" <julie.zhu at="" umassmed.edu=""> wrote: > Stefan, > > Could you please send me the sessionInfo? I just want to make sure you > installed version 2.6.0. Thanks! > > I noticed that when I try to install ChIPpeakAnno with the following code, I > get 2.4 version instead. > > source("http://bioconductor.org/biocLite.R") > biocLite("ChIPpeakAnno") > > Best regards, > > Julie

ADD REPLY • link 12.1 years ago Stefan Pinter ▴ 40

0

Entering edit mode

Julie Zhu ★ 4.3k

@julie-zhu-3596

Last seen 18 months ago

United States

Stefan, You can download 2.6.1 to directly use assignChromosomeRegion without the package prefix (I updated last night and it might take sometime to propagate to the installation site). To find out what parameters supported by the function, in R, please type help( ChIPpeakAnno:::assignChromosomeRegion). You will see that indeed PeakLocForDistance is not supported. If needed, I can add such parameter. Also regarding your previous suggestion of adding enrichment status of feature length, do you mean enrichment of peaks in certain category of chromosome region? For example, a significant enrichment score with 90% peaks in promoter region would be interesting. Could you please keep the thread in the Bioc for others to contribute/benefit? Thanks! Best regards, Julie On 3/14/13 10:10 PM, "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> wrote: > PPS. I think PeakLocForDistance is not working in that function: > >> l.feat.dRF8.100k <-ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, >> utr5, utr3, proximal.promoter.cutoff=100000, >> immediate.downstream.cutoff=100000, PeakLocForDistance="middle") > Error in ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, utr5, : > unused argument(s) (PeakLocForDistance = "middle") >> l.feat.dRF8.100k <-ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, >> utr5, utr3, proximal.promoter.cutoff=100000, >> immediate.downstream.cutoff=100000, PeakLocForDistance= middle) > Error in ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, utr5, : > unused argument(s) (PeakLocForDistance = middle) >> l.feat.dRF8.100k <-ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, >> utr5, utr3, proximal.promoter.cutoff=100000, >> immediate.downstream.cutoff=100000, PeakLocForDistance = c("middle")) > Error in ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, utr5, : > unused argument(s) (PeakLocForDistance = c("middle")) > > ----- Original Message ----- > From: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> > To: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> > Sent: Thursday, March 14, 2013 9:39:10 PM > Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno > > PS. if I might add a simple feature addition to that function - it would be a > TRUE/FALSE trigger for whether to also report enrichment/depletion stats based > on feature lengths, i.e. 50% peaks labeled as intergenic (or enhancers) is > less interesting than 50% of peaks reported as exonic (much greater enrichment > as total exonic feature length is smaller). Thanks, Best...S > > ----- Original Message ----- > From: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> > To: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> > Cc: "bioconductor" <bioconductor at="" r-project.org=""> > Sent: Thursday, March 14, 2013 9:32:33 PM > Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno > > Hi Julie, > > yes, that worked! Thank you for the quick help, I very much appreciate it! > Thank you and Best wishes, > - Stefan. > > ----- Original Message ----- > From: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> > To: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu="">, "Stefan Pinter" > <pinter at="" molbio.mgh.harvard.edu=""> > Cc: "<bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org=""> > Sent: Thursday, March 14, 2013 8:50:51 PM > Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno > > Stefan, > > Thanks for reporting this! Actually the function assignChromosomeRegion is > not exported. > > Could you please try the following to see if you can use it? Thanks! > > ChIPpeakAnno:::assignChromosomeRegion > > Best regards, > > Julie > > > On 3/14/13 7:36 PM, "Lihua Julie Zhu" <julie.zhu at="" umassmed.edu=""> wrote: > >> Stefan, >> >> Could you please send me the sessionInfo? I just want to make sure you >> installed version 2.6.0. Thanks! >> >> I noticed that when I try to install ChIPpeakAnno with the following code, I >> get 2.4 version instead. >> >> source("http://bioconductor.org/biocLite.R") >> biocLite("ChIPpeakAnno") >> >> Best regards, >> >> Julie

ADD COMMENT • link 12.1 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Sure, thank you Julie. Yes, that is what I meant. Since all the necessary information (peak locations, feature locations, total count, total length) are already available in RangedData & Annotation, it would be very convenient to calculate enrichment/depletion and possibly even significance scores (by permutation) on the fly. The significance score may be overkill, but if the function at least reported enrichment/depletion scores, the user could always supply a number of shuffled ranges to build a random model of enrichment scores and calculate significance after. In addition, would it be possible to add another definition for PeakLocForDistance in annotatePeakInBatch? PeakLocForDistance = "start means using start of the peak to calculate the distance to feature" It would be helpful to have "closest", meaning distance to feature measured from peak start or peak end, whichever is closer. That would help with broad peaks, which using "middle" for isn't very helpful. Thank you and Best wishes, - Stefan. ----- Original Message ----- From: "Lihua Zhu (Julie)" <julie.zhu@umassmed.edu> To: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> Cc: "<bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org=""> Sent: Friday, March 15, 2013 8:06:03 AM Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno Stefan, You can download 2.6.1 to directly use assignChromosomeRegion without the package prefix (I updated last night and it might take sometime to propagate to the installation site). To find out what parameters supported by the function, in R, please type help( ChIPpeakAnno:::assignChromosomeRegion). You will see that indeed PeakLocForDistance is not supported. If needed, I can add such parameter. Also regarding your previous suggestion of adding enrichment status of feature length, do you mean enrichment of peaks in certain category of chromosome region? For example, a significant enrichment score with 90% peaks in promoter region would be interesting. Could you please keep the thread in the Bioc for others to contribute/benefit? Thanks! Best regards, Julie On 3/14/13 10:10 PM, "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> wrote: > PPS. I think PeakLocForDistance is not working in that function: > >> l.feat.dRF8.100k <-ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, >> utr5, utr3, proximal.promoter.cutoff=100000, >> immediate.downstream.cutoff=100000, PeakLocForDistance="middle") > Error in ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, utr5, : > unused argument(s) (PeakLocForDistance = "middle") >> l.feat.dRF8.100k <-ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, >> utr5, utr3, proximal.promoter.cutoff=100000, >> immediate.downstream.cutoff=100000, PeakLocForDistance= middle) > Error in ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, utr5, : > unused argument(s) (PeakLocForDistance = middle) >> l.feat.dRF8.100k <-ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, >> utr5, utr3, proximal.promoter.cutoff=100000, >> immediate.downstream.cutoff=100000, PeakLocForDistance = c("middle")) > Error in ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, utr5, : > unused argument(s) (PeakLocForDistance = c("middle")) > > ----- Original Message ----- > From: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> > To: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> > Sent: Thursday, March 14, 2013 9:39:10 PM > Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno > > PS. if I might add a simple feature addition to that function - it would be a > TRUE/FALSE trigger for whether to also report enrichment/depletion stats based > on feature lengths, i.e. 50% peaks labeled as intergenic (or enhancers) is > less interesting than 50% of peaks reported as exonic (much greater enrichment > as total exonic feature length is smaller). Thanks, Best...S > > ----- Original Message ----- > From: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> > To: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> > Cc: "bioconductor" <bioconductor at="" r-project.org=""> > Sent: Thursday, March 14, 2013 9:32:33 PM > Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno > > Hi Julie, > > yes, that worked! Thank you for the quick help, I very much appreciate it! > Thank you and Best wishes, > - Stefan. > > ----- Original Message ----- > From: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> > To: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu="">, "Stefan Pinter" > <pinter at="" molbio.mgh.harvard.edu=""> > Cc: "<bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org=""> > Sent: Thursday, March 14, 2013 8:50:51 PM > Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno > > Stefan, > > Thanks for reporting this! Actually the function assignChromosomeRegion is > not exported. > > Could you please try the following to see if you can use it? Thanks! > > ChIPpeakAnno:::assignChromosomeRegion > > Best regards, > > Julie > > > On 3/14/13 7:36 PM, "Lihua Julie Zhu" <julie.zhu at="" umassmed.edu=""> wrote: > >> Stefan, >> >> Could you please send me the sessionInfo? I just want to make sure you >> installed version 2.6.0. Thanks! >> >> I noticed that when I try to install ChIPpeakAnno with the following code, I >> get 2.4 version instead. >> >> source("http://bioconductor.org/biocLite.R") >> biocLite("ChIPpeakAnno") >> >> Best regards, >> >> Julie

ADD REPLY • link 12.1 years ago Stefan Pinter ▴ 40

0

Entering edit mode

Julie Zhu ★ 4.3k

@julie-zhu-3596

Last seen 18 months ago

United States

Stefan, Regarding the enrichment/depletion score, Does the following toy example illustrate how to calculate such score? If not, could you please give a toy example? Thanks! For example, if the total number of peaks = 100, number of peaks assigned to promoter = 90, number of peaks assigned to enhancer = 10, then the enrichment score = 90% and 10% for promoter region and enhancer region respectively. We will add closest to annotatePeakInBatch in the dev version. Thanks for the great suggestion! Please cc bioconductor at r-project.org. Thanks! Best regards, Julie On 3/15/13 12:26 PM, "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> wrote: > Sure, thank you Julie. > > Yes, that is what I meant. Since all the necessary information (peak > locations, feature locations, total count, total length) are already available > in RangedData & Annotation, it would be very convenient to calculate > enrichment/depletion and possibly even significance scores (by permutation) on > the fly. The significance score may be overkill, but if the function at least > reported enrichment/depletion scores, the user could always supply a number of > shuffled ranges to build a random model of enrichment scores and calculate > significance after. > > In addition, would it be possible to add another definition for > PeakLocForDistance in annotatePeakInBatch? > PeakLocForDistance = "start means using start of the peak to calculate the > distance to feature" > > It would be helpful to have "closest", meaning distance to feature measured > from peak start or peak end, whichever is closer. That would help with broad > peaks, which using "middle" for isn't very helpful. > > Thank you and Best wishes, > - Stefan. > > ----- Original Message ----- > From: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> > To: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> > Cc: "<bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org=""> > Sent: Friday, March 15, 2013 8:06:03 AM > Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno > > Stefan, > > You can download 2.6.1 to directly use assignChromosomeRegion without the > package prefix (I updated last night and it might take sometime to propagate > to the installation site). To find out what parameters supported by the > function, in R, please type help( ChIPpeakAnno:::assignChromosomeRegion). > You will see that indeed PeakLocForDistance is not supported. If needed, I > can add such parameter. > > Also regarding your previous suggestion of adding enrichment status of > feature length, do you mean enrichment of peaks in certain category of > chromosome region? For example, a significant enrichment score with 90% > peaks in promoter region would be interesting. > > Could you please keep the thread in the Bioc for others to > contribute/benefit? Thanks! > > Best regards, > > Julie > > > On 3/14/13 10:10 PM, "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> wrote: > >> PPS. I think PeakLocForDistance is not working in that function: >> >>> l.feat.dRF8.100k <-ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, >>> Exon, >>> utr5, utr3, proximal.promoter.cutoff=100000, >>> immediate.downstream.cutoff=100000, PeakLocForDistance="middle") >> Error in ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, utr5, : >> unused argument(s) (PeakLocForDistance = "middle") >>> l.feat.dRF8.100k <-ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, >>> Exon, >>> utr5, utr3, proximal.promoter.cutoff=100000, >>> immediate.downstream.cutoff=100000, PeakLocForDistance= middle) >> Error in ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, utr5, : >> unused argument(s) (PeakLocForDistance = middle) >>> l.feat.dRF8.100k <-ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, >>> Exon, >>> utr5, utr3, proximal.promoter.cutoff=100000, >>> immediate.downstream.cutoff=100000, PeakLocForDistance = c("middle")) >> Error in ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, utr5, : >> unused argument(s) (PeakLocForDistance = c("middle")) >> >> ----- Original Message ----- >> From: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> >> To: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> >> Sent: Thursday, March 14, 2013 9:39:10 PM >> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno >> >> PS. if I might add a simple feature addition to that function - it would be a >> TRUE/FALSE trigger for whether to also report enrichment/depletion stats >> based >> on feature lengths, i.e. 50% peaks labeled as intergenic (or enhancers) is >> less interesting than 50% of peaks reported as exonic (much greater >> enrichment >> as total exonic feature length is smaller). Thanks, Best...S >> >> ----- Original Message ----- >> From: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> >> To: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> >> Cc: "bioconductor" <bioconductor at="" r-project.org=""> >> Sent: Thursday, March 14, 2013 9:32:33 PM >> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno >> >> Hi Julie, >> >> yes, that worked! Thank you for the quick help, I very much appreciate it! >> Thank you and Best wishes, >> - Stefan. >> >> ----- Original Message ----- >> From: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> >> To: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu="">, "Stefan Pinter" >> <pinter at="" molbio.mgh.harvard.edu=""> >> Cc: "<bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org=""> >> Sent: Thursday, March 14, 2013 8:50:51 PM >> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno >> >> Stefan, >> >> Thanks for reporting this! Actually the function assignChromosomeRegion is >> not exported. >> >> Could you please try the following to see if you can use it? Thanks! >> >> ChIPpeakAnno:::assignChromosomeRegion >> >> Best regards, >> >> Julie >> >> >> On 3/14/13 7:36 PM, "Lihua Julie Zhu" <julie.zhu at="" umassmed.edu=""> wrote: >> >>> Stefan, >>> >>> Could you please send me the sessionInfo? I just want to make sure you >>> installed version 2.6.0. Thanks! >>> >>> I noticed that when I try to install ChIPpeakAnno with the following code, I >>> get 2.4 version instead. >>> >>> source("http://bioconductor.org/biocLite.R") >>> biocLite("ChIPpeakAnno") >>> >>> Best regards, >>> >>> Julie

ADD COMMENT • link 12.1 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Hi Julie, well, I was more thinking of something along the lines of either: a.) Over/Under-representation: Percentage of exons in genome = 2% Percentage of peaks overlapping exons = 10% Exons are 5-fold over-represented But I realize now, that will only work with a full annotation, not a custom annotation. b.) something like the jaccard index used in bedtools: total length of intersection / total length of union http://en.wikipedia.org/wiki/Jaccard_index Just suggestions, there are probably many more possible ways to calculate a score. Thank you, - Stefan. ----- Original Message ----- From: "Lihua Zhu (Julie)" <julie.zhu@umassmed.edu> To: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> Cc: "<bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org=""> Sent: Friday, March 15, 2013 1:28:09 PM Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno Stefan, Regarding the enrichment/depletion score, Does the following toy example illustrate how to calculate such score? If not, could you please give a toy example? Thanks! For example, if the total number of peaks = 100, number of peaks assigned to promoter = 90, number of peaks assigned to enhancer = 10, then the enrichment score = 90% and 10% for promoter region and enhancer region respectively. We will add closest to annotatePeakInBatch in the dev version. Thanks for the great suggestion! Please cc bioconductor at r-project.org. Thanks! Best regards, Julie On 3/15/13 12:26 PM, "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> wrote: > Sure, thank you Julie. > > Yes, that is what I meant. Since all the necessary information (peak > locations, feature locations, total count, total length) are already available > in RangedData & Annotation, it would be very convenient to calculate > enrichment/depletion and possibly even significance scores (by permutation) on > the fly. The significance score may be overkill, but if the function at least > reported enrichment/depletion scores, the user could always supply a number of > shuffled ranges to build a random model of enrichment scores and calculate > significance after. > > In addition, would it be possible to add another definition for > PeakLocForDistance in annotatePeakInBatch? > PeakLocForDistance = "start means using start of the peak to calculate the > distance to feature" > > It would be helpful to have "closest", meaning distance to feature measured > from peak start or peak end, whichever is closer. That would help with broad > peaks, which using "middle" for isn't very helpful. > > Thank you and Best wishes, > - Stefan. > > ----- Original Message ----- > From: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> > To: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> > Cc: "<bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org=""> > Sent: Friday, March 15, 2013 8:06:03 AM > Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno > > Stefan, > > You can download 2.6.1 to directly use assignChromosomeRegion without the > package prefix (I updated last night and it might take sometime to propagate > to the installation site). To find out what parameters supported by the > function, in R, please type help( ChIPpeakAnno:::assignChromosomeRegion). > You will see that indeed PeakLocForDistance is not supported. If needed, I > can add such parameter. > > Also regarding your previous suggestion of adding enrichment status of > feature length, do you mean enrichment of peaks in certain category of > chromosome region? For example, a significant enrichment score with 90% > peaks in promoter region would be interesting. > > Could you please keep the thread in the Bioc for others to > contribute/benefit? Thanks! > > Best regards, > > Julie > > > On 3/14/13 10:10 PM, "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> wrote: > >> PPS. I think PeakLocForDistance is not working in that function: >> >>> l.feat.dRF8.100k <-ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, >>> Exon, >>> utr5, utr3, proximal.promoter.cutoff=100000, >>> immediate.downstream.cutoff=100000, PeakLocForDistance="middle") >> Error in ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, utr5, : >> unused argument(s) (PeakLocForDistance = "middle") >>> l.feat.dRF8.100k <-ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, >>> Exon, >>> utr5, utr3, proximal.promoter.cutoff=100000, >>> immediate.downstream.cutoff=100000, PeakLocForDistance= middle) >> Error in ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, utr5, : >> unused argument(s) (PeakLocForDistance = middle) >>> l.feat.dRF8.100k <-ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, >>> Exon, >>> utr5, utr3, proximal.promoter.cutoff=100000, >>> immediate.downstream.cutoff=100000, PeakLocForDistance = c("middle")) >> Error in ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, utr5, : >> unused argument(s) (PeakLocForDistance = c("middle")) >> >> ----- Original Message ----- >> From: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> >> To: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> >> Sent: Thursday, March 14, 2013 9:39:10 PM >> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno >> >> PS. if I might add a simple feature addition to that function - it would be a >> TRUE/FALSE trigger for whether to also report enrichment/depletion stats >> based >> on feature lengths, i.e. 50% peaks labeled as intergenic (or enhancers) is >> less interesting than 50% of peaks reported as exonic (much greater >> enrichment >> as total exonic feature length is smaller). Thanks, Best...S >> >> ----- Original Message ----- >> From: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> >> To: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> >> Cc: "bioconductor" <bioconductor at="" r-project.org=""> >> Sent: Thursday, March 14, 2013 9:32:33 PM >> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno >> >> Hi Julie, >> >> yes, that worked! Thank you for the quick help, I very much appreciate it! >> Thank you and Best wishes, >> - Stefan. >> >> ----- Original Message ----- >> From: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> >> To: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu="">, "Stefan Pinter" >> <pinter at="" molbio.mgh.harvard.edu=""> >> Cc: "<bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org=""> >> Sent: Thursday, March 14, 2013 8:50:51 PM >> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno >> >> Stefan, >> >> Thanks for reporting this! Actually the function assignChromosomeRegion is >> not exported. >> >> Could you please try the following to see if you can use it? Thanks! >> >> ChIPpeakAnno:::assignChromosomeRegion >> >> Best regards, >> >> Julie >> >> >> On 3/14/13 7:36 PM, "Lihua Julie Zhu" <julie.zhu at="" umassmed.edu=""> wrote: >> >>> Stefan, >>> >>> Could you please send me the sessionInfo? I just want to make sure you >>> installed version 2.6.0. Thanks! >>> >>> I noticed that when I try to install ChIPpeakAnno with the following code, I >>> get 2.4 version instead. >>> >>> source("http://bioconductor.org/biocLite.R") >>> biocLite("ChIPpeakAnno") >>> >>> Best regards, >>> >>> Julie

ADD REPLY • link 12.1 years ago Stefan Pinter ▴ 40

0

Entering edit mode

Julie Zhu ★ 4.3k

@julie-zhu-3596

Last seen 18 months ago

United States

Stefan, Could you please send me a simple dataset to see what went wrong? Thanks for letting know! Thanks for hashing out the chromosome region enrichment score calculation! Looks like you are looking for something very similar to GO enrichment analysis (getEnrichedGO function in ChIPpeakAnno). For option a, as you mentioned, we will need the genome level estimation of the distribution for each category, which could be tricky. Once we have these estimation, then Hypergeometric test can be applied to find whether a category is enriched/depleted. Jaccard index (option b) is an interesting alternative. We could implement Jaccard index at the peak level first and add options for computing Jaccard index at the nucleotide level later. Best regards, Julie On 3/16/13 7:41 PM, "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> wrote: > Julie, I compared my results from ChipPeakAnno with a simple overlap in > bedtools: the % of peaks inside genes (exon + intron + 3utr + 5utr) reported > by ChIPpeakAnno is over an order of magnitude lower than what I learned from > bedtools. Browsing the data indicates that bedtools is right. Something is > off, do you have a test.data set you can check against? > Thanks, > -S. > > ----- Original Message ----- > From: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> > To: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> > Cc: "bioconductor" <bioconductor at="" r-project.org=""> > Sent: Saturday, March 16, 2013 1:30:55 PM > Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno > > Hi Julie, > > well, I was more thinking of something along the lines of either: > > a.) Over/Under-representation: > Percentage of exons in genome = 2% > Percentage of peaks overlapping exons = 10% > Exons are 5-fold over-represented > > But I realize now, that will only work with a full annotation, not a custom > annotation. > > b.) something like the jaccard index used in bedtools: > > total length of intersection / total length of union > > http://en.wikipedia.org/wiki/Jaccard_index > > Just suggestions, there are probably many more possible ways to calculate a > score. > Thank you, > - Stefan. > > ----- Original Message ----- > From: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> > To: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> > Cc: "<bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org=""> > Sent: Friday, March 15, 2013 1:28:09 PM > Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno > > Stefan, > > Regarding the enrichment/depletion score, Does the following toy example > illustrate how to calculate such score? If not, could you please give a toy > example? Thanks! > > For example, if the total number of peaks = 100, number of peaks assigned to > promoter = 90, number of peaks assigned to enhancer = 10, then the > enrichment score = 90% and 10% for promoter region and enhancer region > respectively. > > We will add closest to annotatePeakInBatch in the dev version. Thanks for > the great suggestion! > > Please cc bioconductor at r-project.org. Thanks! > > Best regards, > > Julie > > > On 3/15/13 12:26 PM, "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> wrote: > >> Sure, thank you Julie. >> >> Yes, that is what I meant. Since all the necessary information (peak >> locations, feature locations, total count, total length) are already >> available >> in RangedData & Annotation, it would be very convenient to calculate >> enrichment/depletion and possibly even significance scores (by permutation) >> on >> the fly. The significance score may be overkill, but if the function at least >> reported enrichment/depletion scores, the user could always supply a number >> of >> shuffled ranges to build a random model of enrichment scores and calculate >> significance after. >> >> In addition, would it be possible to add another definition for >> PeakLocForDistance in annotatePeakInBatch? >> PeakLocForDistance = "start means using start of the peak to calculate the >> distance to feature" >> >> It would be helpful to have "closest", meaning distance to feature measured >> from peak start or peak end, whichever is closer. That would help with broad >> peaks, which using "middle" for isn't very helpful. >> >> Thank you and Best wishes, >> - Stefan. >> >> ----- Original Message ----- >> From: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> >> To: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> >> Cc: "<bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org=""> >> Sent: Friday, March 15, 2013 8:06:03 AM >> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno >> >> Stefan, >> >> You can download 2.6.1 to directly use assignChromosomeRegion without the >> package prefix (I updated last night and it might take sometime to propagate >> to the installation site). To find out what parameters supported by the >> function, in R, please type help( ChIPpeakAnno:::assignChromosomeRegion). >> You will see that indeed PeakLocForDistance is not supported. If needed, I >> can add such parameter. >> >> Also regarding your previous suggestion of adding enrichment status of >> feature length, do you mean enrichment of peaks in certain category of >> chromosome region? For example, a significant enrichment score with 90% >> peaks in promoter region would be interesting. >> >> Could you please keep the thread in the Bioc for others to >> contribute/benefit? Thanks! >> >> Best regards, >> >> Julie >> >> >> On 3/14/13 10:10 PM, "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> wrote: >> >>> PPS. I think PeakLocForDistance is not working in that function: >>> >>>> l.feat.dRF8.100k <-ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, >>>> Exon, >>>> utr5, utr3, proximal.promoter.cutoff=100000, >>>> immediate.downstream.cutoff=100000, PeakLocForDistance="middle") >>> Error in ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, utr5, : >>> unused argument(s) (PeakLocForDistance = "middle") >>>> l.feat.dRF8.100k <-ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, >>>> Exon, >>>> utr5, utr3, proximal.promoter.cutoff=100000, >>>> immediate.downstream.cutoff=100000, PeakLocForDistance= middle) >>> Error in ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, utr5, : >>> unused argument(s) (PeakLocForDistance = middle) >>>> l.feat.dRF8.100k <-ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, >>>> Exon, >>>> utr5, utr3, proximal.promoter.cutoff=100000, >>>> immediate.downstream.cutoff=100000, PeakLocForDistance = c("middle")) >>> Error in ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, utr5, : >>> unused argument(s) (PeakLocForDistance = c("middle")) >>> >>> ----- Original Message ----- >>> From: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> >>> To: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> >>> Sent: Thursday, March 14, 2013 9:39:10 PM >>> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno >>> >>> PS. if I might add a simple feature addition to that function - it would be >>> a >>> TRUE/FALSE trigger for whether to also report enrichment/depletion stats >>> based >>> on feature lengths, i.e. 50% peaks labeled as intergenic (or enhancers) is >>> less interesting than 50% of peaks reported as exonic (much greater >>> enrichment >>> as total exonic feature length is smaller). Thanks, Best...S >>> >>> ----- Original Message ----- >>> From: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> >>> To: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> >>> Cc: "bioconductor" <bioconductor at="" r-project.org=""> >>> Sent: Thursday, March 14, 2013 9:32:33 PM >>> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno >>> >>> Hi Julie, >>> >>> yes, that worked! Thank you for the quick help, I very much appreciate it! >>> Thank you and Best wishes, >>> - Stefan. >>> >>> ----- Original Message ----- >>> From: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> >>> To: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu="">, "Stefan Pinter" >>> <pinter at="" molbio.mgh.harvard.edu=""> >>> Cc: "<bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org=""> >>> Sent: Thursday, March 14, 2013 8:50:51 PM >>> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno >>> >>> Stefan, >>> >>> Thanks for reporting this! Actually the function assignChromosomeRegion is >>> not exported. >>> >>> Could you please try the following to see if you can use it? Thanks! >>> >>> ChIPpeakAnno:::assignChromosomeRegion >>> >>> Best regards, >>> >>> Julie >>> >>> >>> On 3/14/13 7:36 PM, "Lihua Julie Zhu" <julie.zhu at="" umassmed.edu=""> wrote: >>> >>>> Stefan, >>>> >>>> Could you please send me the sessionInfo? I just want to make sure you >>>> installed version 2.6.0. Thanks! >>>> >>>> I noticed that when I try to install ChIPpeakAnno with the following code, >>>> I >>>> get 2.4 version instead. >>>> >>>> source("http://bioconductor.org/biocLite.R") >>>> biocLite("ChIPpeakAnno") >>>> >>>> Best regards, >>>> >>>> Julie

ADD COMMENT • link 12.1 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Hi Julie, maybe I'm misunderstanding something about this function. Below is what I've done as a positive control (using refseq genes as "peaks" compared to ensembl annotation), can you tell me whether these numbers make sense? I was expecting to see close to 100% overlap (according to bedtools closest on refseq genes against ensembl genes, 958/967 unique refseq X-linked genes are overlapping an ensembl gene (distance = 0)). But with prox. promoter cutoff = 0, I get only get 42% prox. promoter and 45% "enhancer", which I believe could also be called intergenic, right? Even with default of 1k cutoff, only 81% would be counted as "genic" (sum of everything - "Enhancer"). Thanks, - Stefan. First, I pulled in the ensembl gene set from archive, because my data are in mm9: ensembl67=useMart(host='may2012.archive.ensembl.org', biomart='ENSEMBL_MART_ENSEMBL', dataset='mmusculus_gene_ensembl') > TSS <- getAnnotation(ensembl67, featureType ="TSS") > Exon <- getAnnotation(ensembl67, featureType ="Exon") > utr5 <- getAnnotation(ensembl67, featureType ="5utr") > utr3 <- getAnnotation(ensembl67, featureType ="3utr") Then, I used all X-linked genes from the conservative mm9 refseq gene set (merged to contain unique names only) as my "peak" list [as a positive control]. > bed.rGm<-read.table("genes/chrX.rGm4.bed", header = FALSE) > rd.rGm<-BED2RangedData(bed.rGm, header=FALSE) > rd.rGm RangedData with 967 rows and 2 value columns across 1 space Finally, I annotated these "peaks" relative to TSS and used assignChromosomeRegion with 0k or 1k cutoff: > rda.rGm = annotatePeakInBatch(rd.rGm, AnnotationData=TSS, PeakLocForDistance = "middle") > l.feat0k.rGm <-ChIPpeakAnno:::assignChromosomeRegion(rda.rGm, TSS, Exon, utr5, utr3, proximal.promoter.cutoff=0, immediate.downstream.cutoff=0) > l.feat0k.rGm $Exon [1] 3.205791 $Intron [1] 0 $`5UTR` [1] 3.516029 $`3UTR` [1] 6.30817 $`Proximal Promoter%` [1] 41.98552 $`Immediate Downstream` [1] 0 $Enhancer [1] 44.98449 > l.feat1k.rGm <-ChIPpeakAnno:::assignChromosomeRegion(rda.rGm, TSS, Exon, utr5, utr3) > l.feat1k.rGm $Exon [1] 3.205791 $Intron [1] 0 $`5UTR` [1] 3.516029 $`3UTR` [1] 6.30817 $`Proximal Promoter%` [1] 80.97208 $`Immediate Downstream` [1] 0.1034126 $Enhancer [1] 5.894519 ----- Original Message ----- From: "Lihua Zhu (Julie)" <julie.zhu@umassmed.edu> To: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu="">, "<bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org=""> Cc: "Jianhong Ou" <jianhong.ou at="" umassmed.edu=""> Sent: Saturday, March 16, 2013 8:25:44 PM Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno Stefan, Could you please send me a simple dataset to see what went wrong? Thanks for letting know! Thanks for hashing out the chromosome region enrichment score calculation! Looks like you are looking for something very similar to GO enrichment analysis (getEnrichedGO function in ChIPpeakAnno). For option a, as you mentioned, we will need the genome level estimation of the distribution for each category, which could be tricky. Once we have these estimation, then Hypergeometric test can be applied to find whether a category is enriched/depleted. Jaccard index (option b) is an interesting alternative. We could implement Jaccard index at the peak level first and add options for computing Jaccard index at the nucleotide level later. Best regards, Julie On 3/16/13 7:41 PM, "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> wrote: > Julie, I compared my results from ChipPeakAnno with a simple overlap in > bedtools: the % of peaks inside genes (exon + intron + 3utr + 5utr) reported > by ChIPpeakAnno is over an order of magnitude lower than what I learned from > bedtools. Browsing the data indicates that bedtools is right. Something is > off, do you have a test.data set you can check against? > Thanks, > -S. > > ----- Original Message ----- > From: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> > To: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> > Cc: "bioconductor" <bioconductor at="" r-project.org=""> > Sent: Saturday, March 16, 2013 1:30:55 PM > Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno > > Hi Julie, > > well, I was more thinking of something along the lines of either: > > a.) Over/Under-representation: > Percentage of exons in genome = 2% > Percentage of peaks overlapping exons = 10% > Exons are 5-fold over-represented > > But I realize now, that will only work with a full annotation, not a custom > annotation. > > b.) something like the jaccard index used in bedtools: > > total length of intersection / total length of union > > http://en.wikipedia.org/wiki/Jaccard_index > > Just suggestions, there are probably many more possible ways to calculate a > score. > Thank you, > - Stefan. > > ----- Original Message ----- > From: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> > To: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> > Cc: "<bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org=""> > Sent: Friday, March 15, 2013 1:28:09 PM > Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno > > Stefan, > > Regarding the enrichment/depletion score, Does the following toy example > illustrate how to calculate such score? If not, could you please give a toy > example? Thanks! > > For example, if the total number of peaks = 100, number of peaks assigned to > promoter = 90, number of peaks assigned to enhancer = 10, then the > enrichment score = 90% and 10% for promoter region and enhancer region > respectively. > > We will add closest to annotatePeakInBatch in the dev version. Thanks for > the great suggestion! > > Please cc bioconductor at r-project.org. Thanks! > > Best regards, > > Julie > > > On 3/15/13 12:26 PM, "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> wrote: > >> Sure, thank you Julie. >> >> Yes, that is what I meant. Since all the necessary information (peak >> locations, feature locations, total count, total length) are already >> available >> in RangedData & Annotation, it would be very convenient to calculate >> enrichment/depletion and possibly even significance scores (by permutation) >> on >> the fly. The significance score may be overkill, but if the function at least >> reported enrichment/depletion scores, the user could always supply a number >> of >> shuffled ranges to build a random model of enrichment scores and calculate >> significance after. >> >> In addition, would it be possible to add another definition for >> PeakLocForDistance in annotatePeakInBatch? >> PeakLocForDistance = "start means using start of the peak to calculate the >> distance to feature" >> >> It would be helpful to have "closest", meaning distance to feature measured >> from peak start or peak end, whichever is closer. That would help with broad >> peaks, which using "middle" for isn't very helpful. >> >> Thank you and Best wishes, >> - Stefan. >> >> ----- Original Message ----- >> From: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> >> To: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> >> Cc: "<bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org=""> >> Sent: Friday, March 15, 2013 8:06:03 AM >> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno >> >> Stefan, >> >> You can download 2.6.1 to directly use assignChromosomeRegion without the >> package prefix (I updated last night and it might take sometime to propagate >> to the installation site). To find out what parameters supported by the >> function, in R, please type help( ChIPpeakAnno:::assignChromosomeRegion). >> You will see that indeed PeakLocForDistance is not supported. If needed, I >> can add such parameter. >> >> Also regarding your previous suggestion of adding enrichment status of >> feature length, do you mean enrichment of peaks in certain category of >> chromosome region? For example, a significant enrichment score with 90% >> peaks in promoter region would be interesting. >> >> Could you please keep the thread in the Bioc for others to >> contribute/benefit? Thanks! >> >> Best regards, >> >> Julie >> >> >> On 3/14/13 10:10 PM, "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> wrote: >> >>> PPS. I think PeakLocForDistance is not working in that function: >>> >>>> l.feat.dRF8.100k <-ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, >>>> Exon, >>>> utr5, utr3, proximal.promoter.cutoff=100000, >>>> immediate.downstream.cutoff=100000, PeakLocForDistance="middle") >>> Error in ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, utr5, : >>> unused argument(s) (PeakLocForDistance = "middle") >>>> l.feat.dRF8.100k <-ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, >>>> Exon, >>>> utr5, utr3, proximal.promoter.cutoff=100000, >>>> immediate.downstream.cutoff=100000, PeakLocForDistance= middle) >>> Error in ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, utr5, : >>> unused argument(s) (PeakLocForDistance = middle) >>>> l.feat.dRF8.100k <-ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, >>>> Exon, >>>> utr5, utr3, proximal.promoter.cutoff=100000, >>>> immediate.downstream.cutoff=100000, PeakLocForDistance = c("middle")) >>> Error in ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, utr5, : >>> unused argument(s) (PeakLocForDistance = c("middle")) >>> >>> ----- Original Message ----- >>> From: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> >>> To: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> >>> Sent: Thursday, March 14, 2013 9:39:10 PM >>> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno >>> >>> PS. if I might add a simple feature addition to that function - it would be >>> a >>> TRUE/FALSE trigger for whether to also report enrichment/depletion stats >>> based >>> on feature lengths, i.e. 50% peaks labeled as intergenic (or enhancers) is >>> less interesting than 50% of peaks reported as exonic (much greater >>> enrichment >>> as total exonic feature length is smaller). Thanks, Best...S >>> >>> ----- Original Message ----- >>> From: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> >>> To: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> >>> Cc: "bioconductor" <bioconductor at="" r-project.org=""> >>> Sent: Thursday, March 14, 2013 9:32:33 PM >>> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno >>> >>> Hi Julie, >>> >>> yes, that worked! Thank you for the quick help, I very much appreciate it! >>> Thank you and Best wishes, >>> - Stefan. >>> >>> ----- Original Message ----- >>> From: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> >>> To: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu="">, "Stefan Pinter" >>> <pinter at="" molbio.mgh.harvard.edu=""> >>> Cc: "<bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org=""> >>> Sent: Thursday, March 14, 2013 8:50:51 PM >>> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno >>> >>> Stefan, >>> >>> Thanks for reporting this! Actually the function assignChromosomeRegion is >>> not exported. >>> >>> Could you please try the following to see if you can use it? Thanks! >>> >>> ChIPpeakAnno:::assignChromosomeRegion >>> >>> Best regards, >>> >>> Julie >>> >>> >>> On 3/14/13 7:36 PM, "Lihua Julie Zhu" <julie.zhu at="" umassmed.edu=""> wrote: >>> >>>> Stefan, >>>> >>>> Could you please send me the sessionInfo? I just want to make sure you >>>> installed version 2.6.0. Thanks! >>>> >>>> I noticed that when I try to install ChIPpeakAnno with the following code, >>>> I >>>> get 2.4 version instead. >>>> >>>> source("http://bioconductor.org/biocLite.R") >>>> biocLite("ChIPpeakAnno") >>>> >>>> Best regards, >>>> >>>> Julie

ADD REPLY • link 12.1 years ago Stefan Pinter ▴ 40

0

Entering edit mode

Julie Zhu ★ 4.3k

@julie-zhu-3596

Last seen 18 months ago

United States

Stefan, Many thanks for the detailed information on your testing procedure and results! AssignChromosomeRegion is peak centric and it considers the the precedence of each region. Since peaks are usually in the promoter region, it got the highest precedence. The reason to have precedence is to ensure the peak number assigned to all different regions adds up to total peak number, which is different from nucleotide centric view implemented by bed tools. Thanks for bringing up a very good point! We probably need to add a parameter to have an option to choose between peak centric view and nucleotide centric view. Best regards, Julie On 3/17/13 3:47 PM, "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> wrote: > Hi Julie, > > maybe I'm misunderstanding something about this function. Below is what I've > done as a positive control (using refseq genes as "peaks" compared to ensembl > annotation), can you tell me whether these numbers make sense? I was expecting > to see close to 100% overlap (according to bedtools closest on refseq genes > against ensembl genes, 958/967 unique refseq X-linked genes are overlapping an > ensembl gene (distance = 0)). > > But with prox. promoter cutoff = 0, I get only get 42% prox. promoter and 45% > "enhancer", which I believe could also be called intergenic, right? Even with > default of 1k cutoff, only 81% would be counted as "genic" (sum of everything > - "Enhancer"). > Thanks, > - Stefan. > > > First, I pulled in the ensembl gene set from archive, because my data are in > mm9: > > ensembl67=useMart(host='may2012.archive.ensembl.org', > biomart='ENSEMBL_MART_ENSEMBL', dataset='mmusculus_gene_ensembl') >> TSS <- getAnnotation(ensembl67, featureType ="TSS") >> Exon <- getAnnotation(ensembl67, featureType ="Exon") >> utr5 <- getAnnotation(ensembl67, featureType ="5utr") >> utr3 <- getAnnotation(ensembl67, featureType ="3utr") > > Then, I used all X-linked genes from the conservative mm9 refseq gene set > (merged to contain unique names only) as my "peak" list [as a positive > control]. > >> bed.rGm<-read.table("genes/chrX.rGm4.bed", header = FALSE) >> rd.rGm<-BED2RangedData(bed.rGm, header=FALSE) >> rd.rGm > RangedData with 967 rows and 2 value columns across 1 space > > Finally, I annotated these "peaks" relative to TSS and used > assignChromosomeRegion with 0k or 1k cutoff: > >> rda.rGm = annotatePeakInBatch(rd.rGm, AnnotationData=TSS, PeakLocForDistance >> = "middle") >> l.feat0k.rGm <-ChIPpeakAnno:::assignChromosomeRegion(rda.rGm, TSS, Exon, >> utr5, utr3, proximal.promoter.cutoff=0, immediate.downstream.cutoff=0) >> l.feat0k.rGm > $Exon > [1] 3.205791 > > $Intron > [1] 0 > > $`5UTR` > [1] 3.516029 > > $`3UTR` > [1] 6.30817 > > $`Proximal Promoter%` > [1] 41.98552 > > $`Immediate Downstream` > [1] 0 > > $Enhancer > [1] 44.98449 > >> l.feat1k.rGm <-ChIPpeakAnno:::assignChromosomeRegion(rda.rGm, TSS, Exon, >> utr5, utr3) >> l.feat1k.rGm > $Exon > [1] 3.205791 > > $Intron > [1] 0 > > $`5UTR` > [1] 3.516029 > > $`3UTR` > [1] 6.30817 > > $`Proximal Promoter%` > [1] 80.97208 > > $`Immediate Downstream` > [1] 0.1034126 > > $Enhancer > [1] 5.894519 > > ----- Original Message ----- > From: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> > To: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu="">, > "<bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org=""> > Cc: "Jianhong Ou" <jianhong.ou at="" umassmed.edu=""> > Sent: Saturday, March 16, 2013 8:25:44 PM > Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno > > Stefan, > > Could you please send me a simple dataset to see what went wrong? Thanks for > letting know! > > Thanks for hashing out the chromosome region enrichment score calculation! > Looks like you are looking for something very similar to GO enrichment > analysis (getEnrichedGO function in ChIPpeakAnno). > > For option a, as you mentioned, we will need the genome level estimation of > the distribution for each category, which could be tricky. Once we have > these estimation, then Hypergeometric test can be applied to find whether a > category is enriched/depleted. > > Jaccard index (option b) is an interesting alternative. We could implement > Jaccard index at the peak level first and add options for computing Jaccard > index at the nucleotide level later. > > Best regards, > > Julie > > > On 3/16/13 7:41 PM, "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> wrote: > >> Julie, I compared my results from ChipPeakAnno with a simple overlap in >> bedtools: the % of peaks inside genes (exon + intron + 3utr + 5utr) reported >> by ChIPpeakAnno is over an order of magnitude lower than what I learned from >> bedtools. Browsing the data indicates that bedtools is right. Something is >> off, do you have a test.data set you can check against? >> Thanks, >> -S. >> >> ----- Original Message ----- >> From: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> >> To: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> >> Cc: "bioconductor" <bioconductor at="" r-project.org=""> >> Sent: Saturday, March 16, 2013 1:30:55 PM >> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno >> >> Hi Julie, >> >> well, I was more thinking of something along the lines of either: >> >> a.) Over/Under-representation: >> Percentage of exons in genome = 2% >> Percentage of peaks overlapping exons = 10% >> Exons are 5-fold over-represented >> >> But I realize now, that will only work with a full annotation, not a custom >> annotation. >> >> b.) something like the jaccard index used in bedtools: >> >> total length of intersection / total length of union >> >> http://en.wikipedia.org/wiki/Jaccard_index >> >> Just suggestions, there are probably many more possible ways to calculate a >> score. >> Thank you, >> - Stefan. >> >> ----- Original Message ----- >> From: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> >> To: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> >> Cc: "<bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org=""> >> Sent: Friday, March 15, 2013 1:28:09 PM >> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno >> >> Stefan, >> >> Regarding the enrichment/depletion score, Does the following toy example >> illustrate how to calculate such score? If not, could you please give a toy >> example? Thanks! >> >> For example, if the total number of peaks = 100, number of peaks assigned to >> promoter = 90, number of peaks assigned to enhancer = 10, then the >> enrichment score = 90% and 10% for promoter region and enhancer region >> respectively. >> >> We will add closest to annotatePeakInBatch in the dev version. Thanks for >> the great suggestion! >> >> Please cc bioconductor at r-project.org. Thanks! >> >> Best regards, >> >> Julie >> >> >> On 3/15/13 12:26 PM, "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> wrote: >> >>> Sure, thank you Julie. >>> >>> Yes, that is what I meant. Since all the necessary information (peak >>> locations, feature locations, total count, total length) are already >>> available >>> in RangedData & Annotation, it would be very convenient to calculate >>> enrichment/depletion and possibly even significance scores (by permutation) >>> on >>> the fly. The significance score may be overkill, but if the function at >>> least >>> reported enrichment/depletion scores, the user could always supply a number >>> of >>> shuffled ranges to build a random model of enrichment scores and calculate >>> significance after. >>> >>> In addition, would it be possible to add another definition for >>> PeakLocForDistance in annotatePeakInBatch? >>> PeakLocForDistance = "start means using start of the peak to calculate the >>> distance to feature" >>> >>> It would be helpful to have "closest", meaning distance to feature measured >>> from peak start or peak end, whichever is closer. That would help with broad >>> peaks, which using "middle" for isn't very helpful. >>> >>> Thank you and Best wishes, >>> - Stefan. >>> >>> ----- Original Message ----- >>> From: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> >>> To: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> >>> Cc: "<bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org=""> >>> Sent: Friday, March 15, 2013 8:06:03 AM >>> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno >>> >>> Stefan, >>> >>> You can download 2.6.1 to directly use assignChromosomeRegion without the >>> package prefix (I updated last night and it might take sometime to propagate >>> to the installation site). To find out what parameters supported by the >>> function, in R, please type help( ChIPpeakAnno:::assignChromosomeRegion). >>> You will see that indeed PeakLocForDistance is not supported. If needed, I >>> can add such parameter. >>> >>> Also regarding your previous suggestion of adding enrichment status of >>> feature length, do you mean enrichment of peaks in certain category of >>> chromosome region? For example, a significant enrichment score with 90% >>> peaks in promoter region would be interesting. >>> >>> Could you please keep the thread in the Bioc for others to >>> contribute/benefit? Thanks! >>> >>> Best regards, >>> >>> Julie >>> >>> >>> On 3/14/13 10:10 PM, "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> wrote: >>> >>>> PPS. I think PeakLocForDistance is not working in that function: >>>> >>>>> l.feat.dRF8.100k <-ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, >>>>> Exon, >>>>> utr5, utr3, proximal.promoter.cutoff=100000, >>>>> immediate.downstream.cutoff=100000, PeakLocForDistance="middle") >>>> Error in ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, utr5, >>>> : >>>> unused argument(s) (PeakLocForDistance = "middle") >>>>> l.feat.dRF8.100k <-ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, >>>>> Exon, >>>>> utr5, utr3, proximal.promoter.cutoff=100000, >>>>> immediate.downstream.cutoff=100000, PeakLocForDistance= middle) >>>> Error in ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, utr5, >>>> : >>>> unused argument(s) (PeakLocForDistance = middle) >>>>> l.feat.dRF8.100k <-ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, >>>>> Exon, >>>>> utr5, utr3, proximal.promoter.cutoff=100000, >>>>> immediate.downstream.cutoff=100000, PeakLocForDistance = c("middle")) >>>> Error in ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, utr5, >>>> : >>>> unused argument(s) (PeakLocForDistance = c("middle")) >>>> >>>> ----- Original Message ----- >>>> From: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> >>>> To: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> >>>> Sent: Thursday, March 14, 2013 9:39:10 PM >>>> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno >>>> >>>> PS. if I might add a simple feature addition to that function - it would be >>>> a >>>> TRUE/FALSE trigger for whether to also report enrichment/depletion stats >>>> based >>>> on feature lengths, i.e. 50% peaks labeled as intergenic (or enhancers) is >>>> less interesting than 50% of peaks reported as exonic (much greater >>>> enrichment >>>> as total exonic feature length is smaller). Thanks, Best...S >>>> >>>> ----- Original Message ----- >>>> From: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> >>>> To: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> >>>> Cc: "bioconductor" <bioconductor at="" r-project.org=""> >>>> Sent: Thursday, March 14, 2013 9:32:33 PM >>>> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno >>>> >>>> Hi Julie, >>>> >>>> yes, that worked! Thank you for the quick help, I very much appreciate it! >>>> Thank you and Best wishes, >>>> - Stefan. >>>> >>>> ----- Original Message ----- >>>> From: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> >>>> To: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu="">, "Stefan Pinter" >>>> <pinter at="" molbio.mgh.harvard.edu=""> >>>> Cc: "<bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org=""> >>>> Sent: Thursday, March 14, 2013 8:50:51 PM >>>> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno >>>> >>>> Stefan, >>>> >>>> Thanks for reporting this! Actually the function assignChromosomeRegion is >>>> not exported. >>>> >>>> Could you please try the following to see if you can use it? Thanks! >>>> >>>> ChIPpeakAnno:::assignChromosomeRegion >>>> >>>> Best regards, >>>> >>>> Julie >>>> >>>> >>>> On 3/14/13 7:36 PM, "Lihua Julie Zhu" <julie.zhu at="" umassmed.edu=""> wrote: >>>> >>>>> Stefan, >>>>> >>>>> Could you please send me the sessionInfo? I just want to make sure you >>>>> installed version 2.6.0. Thanks! >>>>> >>>>> I noticed that when I try to install ChIPpeakAnno with the following code, >>>>> I >>>>> get 2.4 version instead. >>>>> >>>>> source("http://bioconductor.org/biocLite.R") >>>>> biocLite("ChIPpeakAnno") >>>>> >>>>> Best regards, >>>>> >>>>> Julie

ADD COMMENT • link 12.1 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Julie Zhu ★ 4.3k

@julie-zhu-3596

Last seen 18 months ago

United States

Stefan, FYI, Jianhong will modify the function with two new parameters, one is NucleotideLevel (TRUE or FALSE) to allow both peak centric and nucleotide centric view. Another parameter will be precedence, which is only applicable to peak centric view (NucleotideLevel = FALSE). If no precedence specified, double count will be enabled, which means that if a peak overlap with both promoter and 5'UTR, then both promoter and 5'UTR will be incremented. If a precedence order is specified, for example, if promoter is specified before 5'UTR, then only promoter will be incremented for the same example. In addition, Jaccard index will be computed. Please let us know your thoughts on this. Many thanks for your brilliant ideas! Best regards, Julie On 3/18/13 9:11 AM, "Lihua Julie Zhu" <julie.zhu at="" umassmed.edu=""> wrote: > Stefan, > > Many thanks for the detailed information on your testing procedure and > results! > > AssignChromosomeRegion is peak centric and it considers the the precedence > of each region. Since peaks are usually in the promoter region, it got the > highest precedence. The reason to have precedence is to ensure the peak > number assigned to all different regions adds up to total peak number, which > is different from nucleotide centric view implemented by bed tools. > > Thanks for bringing up a very good point! We probably need to add a > parameter to have an option to choose between peak centric view and > nucleotide centric view. > > Best regards, > > Julie > > > On 3/17/13 3:47 PM, "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> wrote: > >> Hi Julie, >> >> maybe I'm misunderstanding something about this function. Below is what I've >> done as a positive control (using refseq genes as "peaks" compared to ensembl >> annotation), can you tell me whether these numbers make sense? I was >> expecting >> to see close to 100% overlap (according to bedtools closest on refseq genes >> against ensembl genes, 958/967 unique refseq X-linked genes are overlapping >> an >> ensembl gene (distance = 0)). >> >> But with prox. promoter cutoff = 0, I get only get 42% prox. promoter and >> 45% >> "enhancer", which I believe could also be called intergenic, right? Even with >> default of 1k cutoff, only 81% would be counted as "genic" (sum of everything >> - "Enhancer"). >> Thanks, >> - Stefan. >> >> >> First, I pulled in the ensembl gene set from archive, because my data are in >> mm9: >> >> ensembl67=useMart(host='may2012.archive.ensembl.org', >> biomart='ENSEMBL_MART_ENSEMBL', dataset='mmusculus_gene_ensembl') >>> TSS <- getAnnotation(ensembl67, featureType ="TSS") >>> Exon <- getAnnotation(ensembl67, featureType ="Exon") >>> utr5 <- getAnnotation(ensembl67, featureType ="5utr") >>> utr3 <- getAnnotation(ensembl67, featureType ="3utr") >> >> Then, I used all X-linked genes from the conservative mm9 refseq gene set >> (merged to contain unique names only) as my "peak" list [as a positive >> control]. >> >>> bed.rGm<-read.table("genes/chrX.rGm4.bed", header = FALSE) >>> rd.rGm<-BED2RangedData(bed.rGm, header=FALSE) >>> rd.rGm >> RangedData with 967 rows and 2 value columns across 1 space >> >> Finally, I annotated these "peaks" relative to TSS and used >> assignChromosomeRegion with 0k or 1k cutoff: >> >>> rda.rGm = annotatePeakInBatch(rd.rGm, AnnotationData=TSS, PeakLocForDistance >>> = "middle") >>> l.feat0k.rGm <-ChIPpeakAnno:::assignChromosomeRegion(rda.rGm, TSS, Exon, >>> utr5, utr3, proximal.promoter.cutoff=0, immediate.downstream.cutoff=0) >>> l.feat0k.rGm >> $Exon >> [1] 3.205791 >> >> $Intron >> [1] 0 >> >> $`5UTR` >> [1] 3.516029 >> >> $`3UTR` >> [1] 6.30817 >> >> $`Proximal Promoter%` >> [1] 41.98552 >> >> $`Immediate Downstream` >> [1] 0 >> >> $Enhancer >> [1] 44.98449 >> >>> l.feat1k.rGm <-ChIPpeakAnno:::assignChromosomeRegion(rda.rGm, TSS, Exon, >>> utr5, utr3) >>> l.feat1k.rGm >> $Exon >> [1] 3.205791 >> >> $Intron >> [1] 0 >> >> $`5UTR` >> [1] 3.516029 >> >> $`3UTR` >> [1] 6.30817 >> >> $`Proximal Promoter%` >> [1] 80.97208 >> >> $`Immediate Downstream` >> [1] 0.1034126 >> >> $Enhancer >> [1] 5.894519 >> >> ----- Original Message ----- >> From: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> >> To: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu="">, >> "<bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org=""> >> Cc: "Jianhong Ou" <jianhong.ou at="" umassmed.edu=""> >> Sent: Saturday, March 16, 2013 8:25:44 PM >> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno >> >> Stefan, >> >> Could you please send me a simple dataset to see what went wrong? Thanks for >> letting know! >> >> Thanks for hashing out the chromosome region enrichment score calculation! >> Looks like you are looking for something very similar to GO enrichment >> analysis (getEnrichedGO function in ChIPpeakAnno). >> >> For option a, as you mentioned, we will need the genome level estimation of >> the distribution for each category, which could be tricky. Once we have >> these estimation, then Hypergeometric test can be applied to find whether a >> category is enriched/depleted. >> >> Jaccard index (option b) is an interesting alternative. We could implement >> Jaccard index at the peak level first and add options for computing Jaccard >> index at the nucleotide level later. >> >> Best regards, >> >> Julie >> >> >> On 3/16/13 7:41 PM, "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> wrote: >> >>> Julie, I compared my results from ChipPeakAnno with a simple overlap in >>> bedtools: the % of peaks inside genes (exon + intron + 3utr + 5utr) reported >>> by ChIPpeakAnno is over an order of magnitude lower than what I learned from >>> bedtools. Browsing the data indicates that bedtools is right. Something is >>> off, do you have a test.data set you can check against? >>> Thanks, >>> -S. >>> >>> ----- Original Message ----- >>> From: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> >>> To: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> >>> Cc: "bioconductor" <bioconductor at="" r-project.org=""> >>> Sent: Saturday, March 16, 2013 1:30:55 PM >>> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno >>> >>> Hi Julie, >>> >>> well, I was more thinking of something along the lines of either: >>> >>> a.) Over/Under-representation: >>> Percentage of exons in genome = 2% >>> Percentage of peaks overlapping exons = 10% >>> Exons are 5-fold over-represented >>> >>> But I realize now, that will only work with a full annotation, not a custom >>> annotation. >>> >>> b.) something like the jaccard index used in bedtools: >>> >>> total length of intersection / total length of union >>> >>> http://en.wikipedia.org/wiki/Jaccard_index >>> >>> Just suggestions, there are probably many more possible ways to calculate a >>> score. >>> Thank you, >>> - Stefan. >>> >>> ----- Original Message ----- >>> From: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> >>> To: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> >>> Cc: "<bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org=""> >>> Sent: Friday, March 15, 2013 1:28:09 PM >>> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno >>> >>> Stefan, >>> >>> Regarding the enrichment/depletion score, Does the following toy example >>> illustrate how to calculate such score? If not, could you please give a toy >>> example? Thanks! >>> >>> For example, if the total number of peaks = 100, number of peaks assigned to >>> promoter = 90, number of peaks assigned to enhancer = 10, then the >>> enrichment score = 90% and 10% for promoter region and enhancer region >>> respectively. >>> >>> We will add closest to annotatePeakInBatch in the dev version. Thanks for >>> the great suggestion! >>> >>> Please cc bioconductor at r-project.org. Thanks! >>> >>> Best regards, >>> >>> Julie >>> >>> >>> On 3/15/13 12:26 PM, "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> wrote: >>> >>>> Sure, thank you Julie. >>>> >>>> Yes, that is what I meant. Since all the necessary information (peak >>>> locations, feature locations, total count, total length) are already >>>> available >>>> in RangedData & Annotation, it would be very convenient to calculate >>>> enrichment/depletion and possibly even significance scores (by permutation) >>>> on >>>> the fly. The significance score may be overkill, but if the function at >>>> least >>>> reported enrichment/depletion scores, the user could always supply a number >>>> of >>>> shuffled ranges to build a random model of enrichment scores and calculate >>>> significance after. >>>> >>>> In addition, would it be possible to add another definition for >>>> PeakLocForDistance in annotatePeakInBatch? >>>> PeakLocForDistance = "start means using start of the peak to calculate the >>>> distance to feature" >>>> >>>> It would be helpful to have "closest", meaning distance to feature measured >>>> from peak start or peak end, whichever is closer. That would help with >>>> broad >>>> peaks, which using "middle" for isn't very helpful. >>>> >>>> Thank you and Best wishes, >>>> - Stefan. >>>> >>>> ----- Original Message ----- >>>> From: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> >>>> To: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> >>>> Cc: "<bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org=""> >>>> Sent: Friday, March 15, 2013 8:06:03 AM >>>> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno >>>> >>>> Stefan, >>>> >>>> You can download 2.6.1 to directly use assignChromosomeRegion without the >>>> package prefix (I updated last night and it might take sometime to >>>> propagate >>>> to the installation site). To find out what parameters supported by the >>>> function, in R, please type help( ChIPpeakAnno:::assignChromosomeRegion). >>>> You will see that indeed PeakLocForDistance is not supported. If needed, I >>>> can add such parameter. >>>> >>>> Also regarding your previous suggestion of adding enrichment status of >>>> feature length, do you mean enrichment of peaks in certain category of >>>> chromosome region? For example, a significant enrichment score with 90% >>>> peaks in promoter region would be interesting. >>>> >>>> Could you please keep the thread in the Bioc for others to >>>> contribute/benefit? Thanks! >>>> >>>> Best regards, >>>> >>>> Julie >>>> >>>> >>>> On 3/14/13 10:10 PM, "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> wrote: >>>> >>>>> PPS. I think PeakLocForDistance is not working in that function: >>>>> >>>>>> l.feat.dRF8.100k <-ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, >>>>>> Exon, >>>>>> utr5, utr3, proximal.promoter.cutoff=100000, >>>>>> immediate.downstream.cutoff=100000, PeakLocForDistance="middle") >>>>> Error in ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, utr5, >>>>> : >>>>> unused argument(s) (PeakLocForDistance = "middle") >>>>>> l.feat.dRF8.100k <-ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, >>>>>> Exon, >>>>>> utr5, utr3, proximal.promoter.cutoff=100000, >>>>>> immediate.downstream.cutoff=100000, PeakLocForDistance= middle) >>>>> Error in ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, utr5, >>>>> : >>>>> unused argument(s) (PeakLocForDistance = middle) >>>>>> l.feat.dRF8.100k <-ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, >>>>>> Exon, >>>>>> utr5, utr3, proximal.promoter.cutoff=100000, >>>>>> immediate.downstream.cutoff=100000, PeakLocForDistance = c("middle")) >>>>> Error in ChIPpeakAnno:::assignChromosomeRegion(rda.dRF8, TSS, Exon, utr5, >>>>> : >>>>> unused argument(s) (PeakLocForDistance = c("middle")) >>>>> >>>>> ----- Original Message ----- >>>>> From: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> >>>>> To: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> >>>>> Sent: Thursday, March 14, 2013 9:39:10 PM >>>>> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno >>>>> >>>>> PS. if I might add a simple feature addition to that function - it would >>>>> be >>>>> a >>>>> TRUE/FALSE trigger for whether to also report enrichment/depletion stats >>>>> based >>>>> on feature lengths, i.e. 50% peaks labeled as intergenic (or enhancers) is >>>>> less interesting than 50% of peaks reported as exonic (much greater >>>>> enrichment >>>>> as total exonic feature length is smaller). Thanks, Best...S >>>>> >>>>> ----- Original Message ----- >>>>> From: "Stefan Pinter" <pinter at="" molbio.mgh.harvard.edu=""> >>>>> To: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> >>>>> Cc: "bioconductor" <bioconductor at="" r-project.org=""> >>>>> Sent: Thursday, March 14, 2013 9:32:33 PM >>>>> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno >>>>> >>>>> Hi Julie, >>>>> >>>>> yes, that worked! Thank you for the quick help, I very much appreciate it! >>>>> Thank you and Best wishes, >>>>> - Stefan. >>>>> >>>>> ----- Original Message ----- >>>>> From: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu=""> >>>>> To: "Lihua Zhu (Julie)" <julie.zhu at="" umassmed.edu="">, "Stefan Pinter" >>>>> <pinter at="" molbio.mgh.harvard.edu=""> >>>>> Cc: "<bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org=""> >>>>> Sent: Thursday, March 14, 2013 8:50:51 PM >>>>> Subject: Re: [BioC] assignChromosomeRegion in 2.6.0 ChIPpeakAnno >>>>> >>>>> Stefan, >>>>> >>>>> Thanks for reporting this! Actually the function assignChromosomeRegion >>>>> is >>>>> not exported. >>>>> >>>>> Could you please try the following to see if you can use it? Thanks! >>>>> >>>>> ChIPpeakAnno:::assignChromosomeRegion >>>>> >>>>> Best regards, >>>>> >>>>> Julie >>>>> >>>>> >>>>> On 3/14/13 7:36 PM, "Lihua Julie Zhu" <julie.zhu at="" umassmed.edu=""> wrote: >>>>> >>>>>> Stefan, >>>>>> >>>>>> Could you please send me the sessionInfo? I just want to make sure you >>>>>> installed version 2.6.0. Thanks! >>>>>> >>>>>> I noticed that when I try to install ChIPpeakAnno with the following >>>>>> code, >>>>>> I >>>>>> get 2.4 version instead. >>>>>> >>>>>> source("http://bioconductor.org/biocLite.R") >>>>>> biocLite("ChIPpeakAnno") >>>>>> >>>>>> Best regards, >>>>>> >>>>>> Julie >

ADD COMMENT • link 12.1 years ago Julie Zhu ★ 4.3k

Login before adding your answer.