Add extra columns to GRanges Metadata

0

Entering edit mode

Paul Leo ▴ 970

@paul-leo-2092

Last seen 10.5 years ago

I want to add extra column(s) to the elementMetaData of a GRanges object is there a short hand way of doing that? Other than extracting the meta data as a data frame ... adding what I want... and then reassigning: THAT IS NOT: extra.blank<-matrix(data=NA,nrow=length(a.grs),ncol=length(missing.met a.cols)) old.meta<-as.data.frame(values(a.grs)) new.meta<-cbind(old.meta,extra.blank) values(a.grs)<-new.meta I'm missing something obvious ...! Thanks Paul Wish LIST: Other wise it would be extremely helpful to have a native c(..., .interesction) c(..., .union) where you could combine GRanges with different metaData , where intersection just kept the metadata in common and union just added NA for missing metavalues. .... for use in combining vcf files for say snp, and deletion adat where the meta data might have different attributes and it's not convenient to use a list..

SNP SNP • 12k views

ADD COMMENT • link updated 13.0 years ago by Tim Triche ★ 4.2k • written 13.0 years ago by Paul Leo ▴ 970

0

Entering edit mode

Tim Triche ★ 4.2k

@tim-triche-3561

Last seen 4.4 years ago

United States

setMethod("$", "GRanges", function(x, name) { # {{{ elementMetadata(x)[, name] }) # }}} setMethod("$<-", "GRanges", function(x, name, value) { # {{{ elementMetadata(x)[[ name ]] <- value return(x) }) # }}} problem solved :-) Of course you might want to play with SummarizedExperiment as well. On Sun, Feb 19, 2012 at 10:26 PM, Paul Leo <p.leo@uq.edu.au> wrote: > > I want to add extra column(s) to the elementMetaData of a GRanges object > is there a short hand way of doing that? > > Other than extracting the meta data as a data frame ... adding what I > want... and then reassigning: > > THAT IS NOT: > > extra.blank<-matrix(data=NA,nrow=length(a.grs),ncol=length(missing.m eta.cols)) > old.meta<-as.data.frame(values(a.grs)) > new.meta<-cbind(old.meta,extra.blank) > values(a.grs)<-new.meta > > > I'm missing something obvious ...! > > Thanks > Paul > > > Wish LIST: > Other wise it would be extremely helpful to have a native > > c(..., .interesction) > c(..., .union) > > where you could combine GRanges with different metaData , where > intersection just kept the metadata in common and union just added NA > for missing metavalues. > > .... for use in combining vcf files for say snp, and deletion adat where > the meta data might have different attributes and it's not convenient to > use a list.. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]

ADD COMMENT • link 13.0 years ago Tim Triche ★ 4.2k

0

Entering edit mode

Also handy, IMHO: # intersection setGeneric('%i%', # {{{ function(x, y) standardGeneric('%i%') ) # }}} setMethod('%i%', c('ANY','ANY'), function(x, y) { # {{{ intersect(x, y) }) # }}} # union setGeneric('%u%', # {{{ function(x, y) standardGeneric('%u%') ) # }}} setMethod('%u%', c('ANY','ANY'), function(x, y) { # {{{ union(x, y) }) # }}} # setdiff setGeneric('%d%', # {{{ function(x, y) standardGeneric('%d%') ) # }}} setMethod('%d%', c('ANY','ANY'), function(x, y) { # {{{ setdiff(x, y) }) # }}} # subsetByOverlaps setGeneric('%s%', # {{{ function(x, y) standardGeneric('%s%') ) # }}} setMethod('%s%', c('GRanges','GRanges'), function(x, y) { # {{{ subsetByOverlaps(x, y) }) # }}} Reason being that something like ( hESC.H3K4ME1 %s% hESC.P300 ) %u% ( hESC.H3K4ME3 %s% hESC.H3K27ME3) can be very handy at times, and all of these operators are rather naturally associative IMHO. On Sun, Feb 19, 2012 at 11:03 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > setMethod("$", "GRanges", function(x, name) { # {{{ > elementMetadata(x)[, name] > }) # }}} > > setMethod("$<-", "GRanges", function(x, name, value) { # {{{ > elementMetadata(x)[[ name ]] <- value > return(x) > }) # }}} > > problem solved :-) > > Of course you might want to play with SummarizedExperiment as well. > > > > On Sun, Feb 19, 2012 at 10:26 PM, Paul Leo <p.leo@uq.edu.au> wrote: > >> >> I want to add extra column(s) to the elementMetaData of a GRanges object >> is there a short hand way of doing that? >> >> Other than extracting the meta data as a data frame ... adding what I >> want... and then reassigning: >> >> THAT IS NOT: >> >> extra.blank<-matrix(data=NA,nrow=length(a.grs),ncol=length(missing. meta.cols)) >> old.meta<-as.data.frame(values(a.grs)) >> new.meta<-cbind(old.meta,extra.blank) >> values(a.grs)<-new.meta >> >> >> I'm missing something obvious ...! >> >> Thanks >> Paul >> >> >> Wish LIST: >> Other wise it would be extremely helpful to have a native >> >> c(..., .interesction) >> c(..., .union) >> >> where you could combine GRanges with different metaData , where >> intersection just kept the metadata in common and union just added NA >> for missing metavalues. >> >> .... for use in combining vcf files for say snp, and deletion adat where >> the meta data might have different attributes and it's not convenient to >> use a list.. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > -- > *A model is a lie that helps you see the truth.* > * > * > Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> > > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]

ADD REPLY • link 13.0 years ago Tim Triche ★ 4.2k

0

Entering edit mode

Hi Tim, These functions are quite handy ... definitely going to poach them. Also ... you need an editor with better code folding support ;-) Pual, Aslo, to do what you want to do, namely: > extra.blank<-matrix(data=NA,nrow=length(a.grs),ncol=length(missing.m eta.cols)) > old.meta<-as.data.frame(values(a.grs)) > new.meta<-cbind(old.meta,extra.blank) > values(a.grs)<-new.meta Using vanilla GenomicRanges functionality, you just do this: R> values(a.grs) <- cbind(values(a.grs), DataFrame(extra.blank)) HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 13.0 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

On Mon, Feb 20, 2012 at 10:36 AM, Steve Lianoglou < mailinglist.honeypot@gmail.com> wrote: > Hi Tim, > > These functions are quite handy ... definitely going to poach them. > I wonder if we should keep poaching these in the wild or if Bioc Core will ever bow to popular pressure and allow their inclusion in the base packages? ;-) > Also ... you need an editor with better code folding support ;-) > > Pual, > > Aslo, to do what you want to do, namely: > > > > extra.blank<-matrix(data=NA,nrow=length(a.grs),ncol=length(missing.m eta.cols)) > > old.meta<-as.data.frame(values(a.grs)) > > new.meta<-cbind(old.meta,extra.blank) > > values(a.grs)<-new.meta > > Using vanilla GenomicRanges functionality, you just do this: > > R> values(a.grs) <- cbind(values(a.grs), DataFrame(extra.blank)) > > HTH, > > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 13.0 years ago Michael Lawrence ★ 11k

0

Entering edit mode

if these don't belong in BiocGenerics, I don't know what does :-D On Mon, Feb 20, 2012 at 11:04 AM, Michael Lawrence < lawrence.michael@gene.com> wrote: > > > On Mon, Feb 20, 2012 at 10:36 AM, Steve Lianoglou < > mailinglist.honeypot@gmail.com> wrote: > >> Hi Tim, >> >> These functions are quite handy ... definitely going to poach them. >> > > > I wonder if we should keep poaching these in the wild or if Bioc Core will > ever bow to popular pressure and allow their inclusion in the base > packages? ;-) > > >> Also ... you need an editor with better code folding support ;-) >> >> Pual, >> >> Aslo, to do what you want to do, namely: >> >> > >> extra.blank<-matrix(data=NA,nrow=length(a.grs),ncol=length(missing. meta.cols)) >> > old.meta<-as.data.frame(values(a.grs)) >> > new.meta<-cbind(old.meta,extra.blank) >> > values(a.grs)<-new.meta >> >> Using vanilla GenomicRanges functionality, you just do this: >> >> R> values(a.grs) <- cbind(values(a.grs), DataFrame(extra.blank)) >> >> HTH, >> >> -steve >> >> -- >> Steve Lianoglou >> Graduate Student: Computational Systems Biology >> | Memorial Sloan-Kettering Cancer Center >> | Weill Medical College of Cornell University >> Contact Info: http://cbio.mskcc.org/~lianos/contact >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]

ADD REPLY • link 13.0 years ago Tim Triche ★ 4.2k

0

Entering edit mode

+1 for Tim's methods.....especially the $ and $<- method, it's always in a wishlist for me :) well, maybe "$ " could be sometimes confusing when people trying to treat GRanges as a general format similar to data.frame, and do something like gr$seqnames <- blabla, the good part about current API is that it remind me the difference bettween elementMetadata and three required fields... but it will be convenient if general $ method support replacement? even for seqnames, strand and ranges part? with default checking? other methods mentioned by Tim are also handy if default of those function is under assumption. On Mon, Feb 20, 2012 at 1:04 PM, Michael Lawrence <lawrence.michael@gene.com> wrote: > On Mon, Feb 20, 2012 at 10:36 AM, Steve Lianoglou < > mailinglist.honeypot@gmail.com> wrote: > > > Hi Tim, > > > > These functions are quite handy ... definitely going to poach them. > > > > > I wonder if we should keep poaching these in the wild or if Bioc Core will > ever bow to popular pressure and allow their inclusion in the base > packages? ;-) > > > > Also ... you need an editor with better code folding support ;-) > > > > Pual, > > > > Aslo, to do what you want to do, namely: > > > > > > > > extra.blank<-matrix(data=NA,nrow=length(a.grs),ncol=length(missing.m eta.cols)) > > > old.meta<-as.data.frame(values(a.grs)) > > > new.meta<-cbind(old.meta,extra.blank) > > > values(a.grs)<-new.meta > > > > Using vanilla GenomicRanges functionality, you just do this: > > > > R> values(a.grs) <- cbind(values(a.grs), DataFrame(extra.blank)) > > > > HTH, > > > > -steve > > > > -- > > Steve Lianoglou > > Graduate Student: Computational Systems Biology > > | Memorial Sloan-Kettering Cancer Center > > | Weill Medical College of Cornell University > > Contact Info: http://cbio.mskcc.org/~lianos/contact > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Tengfei Yin MCDB PhD student 1620 Howe Hall, 2274, Iowa State University Ames, IA,50011-2274 Homepage: www.tengfei.name [[alternative HTML version deleted]]

ADD REPLY • link 13.0 years ago Tengfei Yin ▴ 420

0

Entering edit mode

On 02/20/2012 11:50 AM, Tengfei Yin wrote: > +1 for Tim's methods.....especially the $ and $<- method, it's always in a > wishlist for me :) > > well, maybe "$ " could be sometimes confusing when people trying to treat > GRanges as a general format similar to data.frame, and do something like > gr$seqnames<- blabla, the good part about current API is that it remind me > the difference bettween elementMetadata and three required fields... but GRanges extends Vector and > gr = GRanges("A", IRanges(1:10, 20), X=1:10, Y=1:10) > length(gr) [1] 10 > length(values(gr)) [1] 2 shows the geometry -- one would expect gr$foo to access the range named 'foo' rather than the column of elementMetadata named 'foo'. I think this is the crux of the resistance. When Tim says On 02/20/2012 11:45 AM, Tim Triche, Jr. wrote: > if these don't belong in BiocGenerics, I don't know what does :-D I'm not sure if the reference is to "intersect" (which is already in BiocGenerics), "%i%" etc., or to $ and $<-. For $ and $<-, they don't need to be in BiocGenerics because they are already made generics by the methods package -- BiocGenerics is meant to provide a place to define generics that would otherwise be defined independently in multiple packages. Martin > it will be convenient if general $ method support replacement? even for > seqnames, strand and ranges part? with default checking? > > other methods mentioned by Tim are also handy if default of those function > is under assumption. > > On Mon, Feb 20, 2012 at 1:04 PM, Michael Lawrence<lawrence.michael at="" gene.com="">> wrote: > >> On Mon, Feb 20, 2012 at 10:36 AM, Steve Lianoglou< >> mailinglist.honeypot at gmail.com> wrote: >> >>> Hi Tim, >>> >>> These functions are quite handy ... definitely going to poach them. >>> >> >> >> I wonder if we should keep poaching these in the wild or if Bioc Core will >> ever bow to popular pressure and allow their inclusion in the base >> packages? ;-) >> >> >>> Also ... you need an editor with better code folding support ;-) >>> >>> Pual, >>> >>> Aslo, to do what you want to do, namely: >>> >>>> >>> >> extra.blank<-matrix(data=NA,nrow=length(a.grs),ncol=length(missing. meta.cols)) >>>> old.meta<-as.data.frame(values(a.grs)) >>>> new.meta<-cbind(old.meta,extra.blank) >>>> values(a.grs)<-new.meta >>> >>> Using vanilla GenomicRanges functionality, you just do this: >>> >>> R> values(a.grs)<- cbind(values(a.grs), DataFrame(extra.blank)) >>> >>> HTH, >>> >>> -steve >>> >>> -- >>> Steve Lianoglou >>> Graduate Student: Computational Systems Biology >>> | Memorial Sloan-Kettering Cancer Center >>> | Weill Medical College of Cornell University >>> Contact Info: http://cbio.mskcc.org/~lianos/contact >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793

ADD REPLY • link 13.0 years ago Martin Morgan 25k

0

Entering edit mode

On Mon, Feb 20, 2012 at 4:33 PM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 02/20/2012 11:50 AM, Tengfei Yin wrote: > >> +1 for Tim's methods.....especially the $ and $<- method, it's always in a >> wishlist for me :) >> >> well, maybe "$ " could be sometimes confusing when people trying to treat >> GRanges as a general format similar to data.frame, and do something like >> gr$seqnames<- blabla, the good part about current API is that it remind me >> the difference bettween elementMetadata and three required fields... but >> > > GRanges extends Vector and > > > gr = GRanges("A", IRanges(1:10, 20), X=1:10, Y=1:10) > > length(gr) > [1] 10 > > length(values(gr)) > [1] 2 > > shows the geometry -- one would expect gr$foo to access the range named > 'foo' rather than the column of elementMetadata named 'foo'. I think this > is the crux of the resistance. > > Right, thanks for reiterating this. The question is whether the convenience of the shortcut outweighs the conceptual inconsistency. Probably depends heavily on one's perspective. I would favor the perspective of the frequent user. > When Tim says > > > On 02/20/2012 11:45 AM, Tim Triche, Jr. wrote: > > if these don't belong in BiocGenerics, I don't know what does :-D > > I'm not sure if the reference is to "intersect" (which is already in > BiocGenerics), "%i%" etc., or to $ and $<-. For $ and $<-, they don't need > to be in BiocGenerics because they are already made generics by the methods > package -- BiocGenerics is meant to provide a place to define generics that > would otherwise be defined independently in multiple packages. > > Martin > > > it will be convenient if general $ method support replacement? even for >> seqnames, strand and ranges part? with default checking? >> >> other methods mentioned by Tim are also handy if default of those function >> is under assumption. >> >> On Mon, Feb 20, 2012 at 1:04 PM, Michael Lawrence<lawrence.michael@**>> gene.com <lawrence.michael@gene.com> >> >>> wrote: >>> >> >> On Mon, Feb 20, 2012 at 10:36 AM, Steve Lianoglou< >>> mailinglist.honeypot@gmail.com**> wrote: >>> >>> Hi Tim, >>>> >>>> These functions are quite handy ... definitely going to poach them. >>>> >>>> >>> >>> I wonder if we should keep poaching these in the wild or if Bioc Core >>> will >>> ever bow to popular pressure and allow their inclusion in the base >>> packages? ;-) >>> >>> >>> Also ... you need an editor with better code folding support ;-) >>>> >>>> Pual, >>>> >>>> Aslo, to do what you want to do, namely: >>>> >>>> >>>>> >>>> extra.blank<-matrix(data=NA,**nrow=length(a.grs),ncol=** >>> length(missing.meta.cols)) >>> >>>> old.meta<-as.data.frame(**values(a.grs)) >>>>> new.meta<-cbind(old.meta,**extra.blank) >>>>> values(a.grs)<-new.meta >>>>> >>>> >>>> Using vanilla GenomicRanges functionality, you just do this: >>>> >>>> R> values(a.grs)<- cbind(values(a.grs), DataFrame(extra.blank)) >>>> >>>> HTH, >>>> >>>> -steve >>>> >>>> -- >>>> Steve Lianoglou >>>> Graduate Student: Computational Systems Biology >>>> | Memorial Sloan-Kettering Cancer Center >>>> | Weill Medical College of Cornell University >>>> Contact Info: http://cbio.mskcc.org/~lianos/**contact<http: cbio="" .mskcc.org="" %7elianos="" contact=""> >>>> >>>> ______________________________**_________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>> Search the archives: >>>> http://news.gmane.org/gmane.**science.biology.informatics.**condu ctor<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>>> >>>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: >>> http://news.gmane.org/gmane.**science.biology.informatics.**conduc tor<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>> >>> >> >> >> > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 > [[alternative HTML version deleted]]

ADD REPLY • link 13.0 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Well in the end I did this the way I always do it: extracted the elementMetaData modified it and reassigned it in it's entirety. It works fine but it seems a little ham fisted hence my original post. I had to do this in the end ANYWAY because c(,) , it appears, requires the elementMetadata of the GRanges in the same order. see example below. Something I seemed to have avoided until now. Given this I don't *think* that the other solns will work anyway. But there are indeed other issues that I did not appreciate that you point out. Many thanks . I guess I'm after this functionality since for combining mutation data from various sources/programs , SNP , indels, large indels, CNV etc, the quality,genotype, annotation data I put in the elementMetaData. To take along for the analysis and filtering. I could have used GrangeList... but in the final disease modelling is not convenient to use a Granges object So the ordering thing is a little bit of a pain. But the problem is solvable with the current data structures (which are just fab- thanks). Though it would be handy to add extra metadata in a computationally rapid way and not to worry about their order. Cheers Paul PS: Grange combning when metaData in different orders example: gr1 <-GRanges(seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), ranges = IRanges(1:10, end = 7:16, names = head(letters, 10)), strand = Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)), score = 1:10, GC = seq(1, 0, length=10)) gr2<-GRanges(seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), ranges = IRanges(1:10, end = 7:16, names = head(letters, 10)), strand = Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)), GC = seq(1, 0, length=10), score = 1:10 ) gr1 gr2 gr1 <-GRanges(seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), ranges = IRanges(1:10, end = 7:16, names = head(letters, 10)), strand = Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)), score = 1:10, GC = seq(1, 0, length=10)) gr2<-GRanges(seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), ranges = IRanges(1:10, end = 7:16, names = head(letters, 10)), strand = Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)), GC = seq(1, 0, length=10), score = 1:10 ) gr1 gr2 gr.all<-c(gr1,gr2) Error in .Method(..., deparse.level = deparse.level) : column names for arg 2 do not match those of first arg gr2<-GRanges(seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), ranges = IRanges(1:10, end = 7:16, names = head(letters, 10)), strand = Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)), score = 1:10,GC = seq(1, 0, length=10) ) gr.all<-c(gr1,gr2) > sessionInfo() R Under development (unstable) (2011-11-23 r57740) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8 [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] multicore_0.1-7 biomaRt_2.11.1 Rsamtools_1.7.23 [4] Biostrings_2.23.6 GenomicFeatures_1.7.10 AnnotationDbi_1.17.14 [7] Biobase_2.15.3 GenomicRanges_1.7.15 IRanges_1.13.20 [10] BiocGenerics_0.1.4 loaded via a namespace (and not attached): [1] bitops_1.0-4.1 BSgenome_1.23.2 DBI_0.2-5 RCurl_1.9-5 [5] RSQLite_0.11.1 rtracklayer_1.15.6 tools_2.15.0 XML_3.9-2 [9] zlibbioc_1.1.0 > -----Original Message----- From: Michael Lawrence <lawrence.michael@gene.com> To: Martin Morgan <mtmorgan at="" fhcrc.org=""> Cc: Michael Lawrence <lawrence.michael at="" gene.com="">, bioconductor <bioconductor at="" r-project.org=""> Subject: Re: [BioC] Add extra columns to GRanges Metadata Date: Mon, 20 Feb 2012 16:38:13 -0800 On Mon, Feb 20, 2012 at 4:33 PM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: > On 02/20/2012 11:50 AM, Tengfei Yin wrote: > >> +1 for Tim's methods.....especially the $ and $<- method, it's always in a >> wishlist for me :) >> >> well, maybe "$ " could be sometimes confusing when people trying to treat >> GRanges as a general format similar to data.frame, and do something like >> gr$seqnames<- blabla, the good part about current API is that it remind me >> the difference bettween elementMetadata and three required fields... but >> > > GRanges extends Vector and > > > gr = GRanges("A", IRanges(1:10, 20), X=1:10, Y=1:10) > > length(gr) > [1] 10 > > length(values(gr)) > [1] 2 > > shows the geometry -- one would expect gr$foo to access the range named > 'foo' rather than the column of elementMetadata named 'foo'. I think this > is the crux of the resistance. > > Right, thanks for reiterating this. The question is whether the convenience of the shortcut outweighs the conceptual inconsistency. Probably depends heavily on one's perspective. I would favor the perspective of the frequent user. > When Tim says > > > On 02/20/2012 11:45 AM, Tim Triche, Jr. wrote: > > if these don't belong in BiocGenerics, I don't know what does :-D > > I'm not sure if the reference is to "intersect" (which is already in > BiocGenerics), "%i%" etc., or to $ and $<-. For $ and $<-, they don't need > to be in BiocGenerics because they are already made generics by the methods > package -- BiocGenerics is meant to provide a place to define generics that > would otherwise be defined independently in multiple packages. > > Martin > > > it will be convenient if general $ method support replacement? even for >> seqnames, strand and ranges part? with default checking? >> >> other methods mentioned by Tim are also handy if default of those function >> is under assumption. >> >> On Mon, Feb 20, 2012 at 1:04 PM, Michael Lawrence<lawrence.michael@**>> gene.com <lawrence.michael at="" gene.com=""> >> >>> wrote: >>> >> >> On Mon, Feb 20, 2012 at 10:36 AM, Steve Lianoglou< >>> mailinglist.honeypot at gmail.com**> wrote: >>> >>> Hi Tim, >>>> >>>> These functions are quite handy ... definitely going to poach them. >>>> >>>> >>> >>> I wonder if we should keep poaching these in the wild or if Bioc Core >>> will >>> ever bow to popular pressure and allow their inclusion in the base >>> packages? ;-) >>> >>> >>> Also ... you need an editor with better code folding support ;-) >>>> >>>> Pual, >>>> >>>> Aslo, to do what you want to do, namely: >>>> >>>> >>>>> >>>> extra.blank<-matrix(data=NA,**nrow=length(a.grs),ncol=** >>> length(missing.meta.cols)) >>> >>>> old.meta<-as.data.frame(**values(a.grs)) >>>>> new.meta<-cbind(old.meta,**extra.blank) >>>>> values(a.grs)<-new.meta >>>>> >>>> >>>> Using vanilla GenomicRanges functionality, you just do this: >>>> >>>> R> values(a.grs)<- cbind(values(a.grs), DataFrame(extra.blank)) >>>> >>>> HTH, >>>> >>>> -steve >>>> >>>> -- >>>> Steve Lianoglou >>>> Graduate Student: Computational Systems Biology >>>> | Memorial Sloan-Kettering Cancer Center >>>> | Weill Medical College of Cornell University >>>> Contact Info: http://cbio.mskcc.org/~lianos/**contact<http: cbio="" .mskcc.org="" %7elianos="" contact=""> >>>> >>>> ______________________________**_________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>> Search the archives: >>>> http://news.gmane.org/gmane.**science.biology.informatics.**condu ctor<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>>> >>>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: >>> http://news.gmane.org/gmane.**science.biology.informatics.**conduc tor<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>> >>> >> >> >> > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 > [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 13.0 years ago Paul Leo ▴ 970

0

Entering edit mode

On 02/20/2012 10:35 PM, Paul Leo wrote: > Well in the end I did this the way I always do it: extracted the > elementMetaData modified it and reassigned it in it's entirety. > > It works fine but it seems a little ham fisted hence my original post. > > I had to do this in the end ANYWAY because c(,) , it appears, requires > the elementMetadata of the GRanges in the same order. see example below. > Something I seemed to have avoided until now. > > Given this I don't *think* that the other solns will work anyway. But > there are indeed other issues that I did not appreciate that you point > out. Many thanks . > > I guess I'm after this functionality since for combining mutation data > from various sources/programs , SNP , indels, large indels, CNV etc, the > quality,genotype, annotation data I put in the elementMetaData. To take > along for the analysis and filtering. > > I could have used GrangeList... but in the final disease modelling is > not convenient to use a Granges object > > > So the ordering thing is a little bit of a pain. But the problem is > solvable with the current data structures (which are just fab- thanks). > > Though it would be handy to add extra metadata in a computationally > rapid way and not to worry about their order. Yes c() is a little bit rigid as it requires the colnames() of elementMetadata(gr1) and elementMetadata(gr2) to be identical. Maybe we could make it a little more forgiving by allowing the colnames to be the same but not necessarily in the same order. Then c(gr1, gr2) would reorder the cols in gr2 like in gr1 before doing the combining. Does that sound reasonable? Also I'm with Martin on why we should resist the temptation to make 'gr$foo' a convenience for 'elementMetadata(gr)$foo'. Yes elementMetadata is a long and ugly accessor name (and also a long and ugly slot name) and this is why values() was added as an alias for elementMetadata() a couple of years ago on popular demand. If that's not short enough, we can add another alias like emd(): emd(gr) <- cbind(emd(gr), DataFrame(colA, colB)) I would actually prefer this alias (or anything else that is short enough) over values(). The values in a GRanges object are its elements (the ranges), like the values in a numeric vector are the numbers in that vector. Cheers, H. > > Cheers > Paul > > > > > > > PS: Grange combning when metaData in different orders example: > > gr1<-GRanges(seqnames = > Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), > ranges = > IRanges(1:10, end = 7:16, names = head(letters, 10)), > strand = > Rle(strand(c("-", "+", "*", "+", "-")), > c(1, 2, 2, 3, 2)), > score = 1:10, > GC = seq(1, 0, length=10)) > gr2<-GRanges(seqnames = > Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), > ranges = > IRanges(1:10, end = 7:16, names = head(letters, 10)), > strand = > Rle(strand(c("-", "+", "*", "+", "-")), > c(1, 2, 2, 3, 2)), > GC = seq(1, 0, length=10), > score = 1:10 > ) > > gr1 > gr2 > > gr1<-GRanges(seqnames = > Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), > ranges = > IRanges(1:10, end = 7:16, names = head(letters, 10)), > strand = > Rle(strand(c("-", "+", "*", "+", "-")), > c(1, 2, 2, 3, 2)), > score = 1:10, > GC = seq(1, 0, length=10)) > gr2<-GRanges(seqnames = > Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), > ranges = > IRanges(1:10, end = 7:16, names = head(letters, 10)), > strand = > Rle(strand(c("-", "+", "*", "+", "-")), > c(1, 2, 2, 3, 2)), > GC = seq(1, 0, length=10), > score = 1:10 > ) > gr1 > gr2 > > gr.all<-c(gr1,gr2) > Error in .Method(..., deparse.level = deparse.level) : > column names for arg 2 do not match those of first arg > > gr2<-GRanges(seqnames = > Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), > ranges = > IRanges(1:10, end = 7:16, names = head(letters, 10)), > strand = > Rle(strand(c("-", "+", "*", "+", "-")), > c(1, 2, 2, 3, 2)), > score = 1:10,GC = seq(1, 0, length=10) > ) > gr.all<-c(gr1,gr2) > > >> sessionInfo() > R Under development (unstable) (2011-11-23 r57740) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8 > [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] multicore_0.1-7 biomaRt_2.11.1 > Rsamtools_1.7.23 > [4] Biostrings_2.23.6 GenomicFeatures_1.7.10 > AnnotationDbi_1.17.14 > [7] Biobase_2.15.3 GenomicRanges_1.7.15 > IRanges_1.13.20 > [10] BiocGenerics_0.1.4 > > loaded via a namespace (and not attached): > [1] bitops_1.0-4.1 BSgenome_1.23.2 DBI_0.2-5 > RCurl_1.9-5 > [5] RSQLite_0.11.1 rtracklayer_1.15.6 tools_2.15.0 > XML_3.9-2 > [9] zlibbioc_1.1.0 >> > > -----Original Message----- > From: Michael Lawrence<lawrence.michael at="" gene.com=""> > To: Martin Morgan<mtmorgan at="" fhcrc.org=""> > Cc: Michael Lawrence<lawrence.michael at="" gene.com="">, bioconductor > <bioconductor at="" r-project.org=""> > Subject: Re: [BioC] Add extra columns to GRanges Metadata > Date: Mon, 20 Feb 2012 16:38:13 -0800 > > On Mon, Feb 20, 2012 at 4:33 PM, Martin Morgan<mtmorgan at="" fhcrc.org=""> wrote: > >> On 02/20/2012 11:50 AM, Tengfei Yin wrote: >> >>> +1 for Tim's methods.....especially the $ and $<- method, it's always in a >>> wishlist for me :) >>> >>> well, maybe "$ " could be sometimes confusing when people trying to treat >>> GRanges as a general format similar to data.frame, and do something like >>> gr$seqnames<- blabla, the good part about current API is that it remind me >>> the difference bettween elementMetadata and three required fields... but >>> >> >> GRanges extends Vector and >> >>> gr = GRanges("A", IRanges(1:10, 20), X=1:10, Y=1:10) >>> length(gr) >> [1] 10 >>> length(values(gr)) >> [1] 2 >> >> shows the geometry -- one would expect gr$foo to access the range named >> 'foo' rather than the column of elementMetadata named 'foo'. I think this >> is the crux of the resistance. >> >> > Right, thanks for reiterating this. The question is whether the convenience > of the shortcut outweighs the conceptual inconsistency. Probably depends > heavily on one's perspective. I would favor the perspective of the frequent > user. > > >> When Tim says >> >> >> On 02/20/2012 11:45 AM, Tim Triche, Jr. wrote: >>> if these don't belong in BiocGenerics, I don't know what does :-D >> >> I'm not sure if the reference is to "intersect" (which is already in >> BiocGenerics), "%i%" etc., or to $ and $<-. For $ and $<-, they don't need >> to be in BiocGenerics because they are already made generics by the methods >> package -- BiocGenerics is meant to provide a place to define generics that >> would otherwise be defined independently in multiple packages. >> >> Martin >> >> >> it will be convenient if general $ method support replacement? even for >>> seqnames, strand and ranges part? with default checking? >>> >>> other methods mentioned by Tim are also handy if default of those function >>> is under assumption. >>> >>> On Mon, Feb 20, 2012 at 1:04 PM, Michael Lawrence<lawrence.michael@**>>> gene.com<lawrence.michael at="" gene.com=""> >>> >>>> wrote: >>>> >>> >>> On Mon, Feb 20, 2012 at 10:36 AM, Steve Lianoglou< >>>> mailinglist.honeypot at gmail.com**> wrote: >>>> >>>> Hi Tim, >>>>> >>>>> These functions are quite handy ... definitely going to poach them. >>>>> >>>>> >>>> >>>> I wonder if we should keep poaching these in the wild or if Bioc Core >>>> will >>>> ever bow to popular pressure and allow their inclusion in the base >>>> packages? ;-) >>>> >>>> >>>> Also ... you need an editor with better code folding support ;-) >>>>> >>>>> Pual, >>>>> >>>>> Aslo, to do what you want to do, namely: >>>>> >>>>> >>>>>> >>>>> extra.blank<-matrix(data=NA,**nrow=length(a.grs),ncol=** >>>> length(missing.meta.cols)) >>>> >>>>> old.meta<-as.data.frame(**values(a.grs)) >>>>>> new.meta<-cbind(old.meta,**extra.blank) >>>>>> values(a.grs)<-new.meta >>>>>> >>>>> >>>>> Using vanilla GenomicRanges functionality, you just do this: >>>>> >>>>> R> values(a.grs)<- cbind(values(a.grs), DataFrame(extra.blank)) >>>>> >>>>> HTH, >>>>> >>>>> -steve >>>>> >>>>> -- >>>>> Steve Lianoglou >>>>> Graduate Student: Computational Systems Biology >>>>> | Memorial Sloan-Kettering Cancer Center >>>>> | Weill Medical College of Cornell University >>>>> Contact Info: http://cbio.mskcc.org/~lianos/**contact<http: cbi="" o.mskcc.org="" %7elianos="" contact=""> >>>>> >>>>> ______________________________**_________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: sta="" t.ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.**science.biology.informatics.**cond uctor<http: news.gmane.org="" gmane.science.biology.informatics.conducto="" r=""> >>>>> >>>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________**_________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>> Search the archives: >>>> http://news.gmane.org/gmane.**science.biology.informatics.**condu ctor<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>>> >>>> >>> >>> >>> >> >> -- >> Computational Biology >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 >> >> Location: M1-B861 >> Telephone: 206 667-2793 >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

ADD REPLY • link 13.0 years ago Hervé Pagès 16k

0

Entering edit mode

Hi Herve, My opinion: I am happy to continue with values(gr) <- cbind(vales(gr), DataFrame(colA, colB)) .... now it is clear it is best practice (I was originally not certain on that). I'm just fine with "values" no other alias seems necessary. I would be delighted however not to have to bother with ordering down the track, that was confusing when I encountered it. Your solution for that sounds very reasonable. Thanks Paul -----Original Message----- From: Hervé Pagès <hpages@fhcrc.org> To: p.leo at uq.edu.au Cc: bioconductor <bioconductor at="" r-project.org=""> Subject: Re: [BioC] Add extra columns to GRanges Metadata Date: Tue, 21 Feb 2012 21:14:19 -0800 On 02/20/2012 10:35 PM, Paul Leo wrote: > Well in the end I did this the way I always do it: extracted the > elementMetaData modified it and reassigned it in it's entirety. > > It works fine but it seems a little ham fisted hence my original post. > > I had to do this in the end ANYWAY because c(,) , it appears, requires > the elementMetadata of the GRanges in the same order. see example below. > Something I seemed to have avoided until now. > > Given this I don't *think* that the other solns will work anyway. But > there are indeed other issues that I did not appreciate that you point > out. Many thanks . > > I guess I'm after this functionality since for combining mutation data > from various sources/programs , SNP , indels, large indels, CNV etc, the > quality,genotype, annotation data I put in the elementMetaData. To take > along for the analysis and filtering. > > I could have used GrangeList... but in the final disease modelling is > not convenient to use a Granges object > > > So the ordering thing is a little bit of a pain. But the problem is > solvable with the current data structures (which are just fab- thanks). > > Though it would be handy to add extra metadata in a computationally > rapid way and not to worry about their order. Yes c() is a little bit rigid as it requires the colnames() of elementMetadata(gr1) and elementMetadata(gr2) to be identical. Maybe we could make it a little more forgiving by allowing the colnames to be the same but not necessarily in the same order. Then c(gr1, gr2) would reorder the cols in gr2 like in gr1 before doing the combining. Does that sound reasonable? Also I'm with Martin on why we should resist the temptation to make 'gr$foo' a convenience for 'elementMetadata(gr)$foo'. Yes elementMetadata is a long and ugly accessor name (and also a long and ugly slot name) and this is why values() was added as an alias for elementMetadata() a couple of years ago on popular demand. If that's not short enough, we can add another alias like emd(): emd(gr) <- cbind(emd(gr), DataFrame(colA, colB)) I would actually prefer this alias (or anything else that is short enough) over values(). The values in a GRanges object are its elements (the ranges), like the values in a numeric vector are the numbers in that vector. Cheers, H. > > Cheers > Paul > > > > > > > PS: Grange combning when metaData in different orders example: > > gr1<-GRanges(seqnames = > Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), > ranges = > IRanges(1:10, end = 7:16, names = head(letters, 10)), > strand = > Rle(strand(c("-", "+", "*", "+", "-")), > c(1, 2, 2, 3, 2)), > score = 1:10, > GC = seq(1, 0, length=10)) > gr2<-GRanges(seqnames = > Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), > ranges = > IRanges(1:10, end = 7:16, names = head(letters, 10)), > strand = > Rle(strand(c("-", "+", "*", "+", "-")), > c(1, 2, 2, 3, 2)), > GC = seq(1, 0, length=10), > score = 1:10 > ) > > gr1 > gr2 > > gr1<-GRanges(seqnames = > Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), > ranges = > IRanges(1:10, end = 7:16, names = head(letters, 10)), > strand = > Rle(strand(c("-", "+", "*", "+", "-")), > c(1, 2, 2, 3, 2)), > score = 1:10, > GC = seq(1, 0, length=10)) > gr2<-GRanges(seqnames = > Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), > ranges = > IRanges(1:10, end = 7:16, names = head(letters, 10)), > strand = > Rle(strand(c("-", "+", "*", "+", "-")), > c(1, 2, 2, 3, 2)), > GC = seq(1, 0, length=10), > score = 1:10 > ) > gr1 > gr2 > > gr.all<-c(gr1,gr2) > Error in .Method(..., deparse.level = deparse.level) : > column names for arg 2 do not match those of first arg > > gr2<-GRanges(seqnames = > Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), > ranges = > IRanges(1:10, end = 7:16, names = head(letters, 10)), > strand = > Rle(strand(c("-", "+", "*", "+", "-")), > c(1, 2, 2, 3, 2)), > score = 1:10,GC = seq(1, 0, length=10) > ) > gr.all<-c(gr1,gr2) > > >> sessionInfo() > R Under development (unstable) (2011-11-23 r57740) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8 > [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] multicore_0.1-7 biomaRt_2.11.1 > Rsamtools_1.7.23 > [4] Biostrings_2.23.6 GenomicFeatures_1.7.10 > AnnotationDbi_1.17.14 > [7] Biobase_2.15.3 GenomicRanges_1.7.15 > IRanges_1.13.20 > [10] BiocGenerics_0.1.4 > > loaded via a namespace (and not attached): > [1] bitops_1.0-4.1 BSgenome_1.23.2 DBI_0.2-5 > RCurl_1.9-5 > [5] RSQLite_0.11.1 rtracklayer_1.15.6 tools_2.15.0 > XML_3.9-2 > [9] zlibbioc_1.1.0 >> > > -----Original Message----- > From: Michael Lawrence<lawrence.michael at="" gene.com=""> > To: Martin Morgan<mtmorgan at="" fhcrc.org=""> > Cc: Michael Lawrence<lawrence.michael at="" gene.com="">, bioconductor > <bioconductor at="" r-project.org=""> > Subject: Re: [BioC] Add extra columns to GRanges Metadata > Date: Mon, 20 Feb 2012 16:38:13 -0800 > > On Mon, Feb 20, 2012 at 4:33 PM, Martin Morgan<mtmorgan at="" fhcrc.org=""> wrote: > >> On 02/20/2012 11:50 AM, Tengfei Yin wrote: >> >>> +1 for Tim's methods.....especially the $ and $<- method, it's always in a >>> wishlist for me :) >>> >>> well, maybe "$ " could be sometimes confusing when people trying to treat >>> GRanges as a general format similar to data.frame, and do something like >>> gr$seqnames<- blabla, the good part about current API is that it remind me >>> the difference bettween elementMetadata and three required fields... but >>> >> >> GRanges extends Vector and >> >>> gr = GRanges("A", IRanges(1:10, 20), X=1:10, Y=1:10) >>> length(gr) >> [1] 10 >>> length(values(gr)) >> [1] 2 >> >> shows the geometry -- one would expect gr$foo to access the range named >> 'foo' rather than the column of elementMetadata named 'foo'. I think this >> is the crux of the resistance. >> >> > Right, thanks for reiterating this. The question is whether the convenience > of the shortcut outweighs the conceptual inconsistency. Probably depends > heavily on one's perspective. I would favor the perspective of the frequent > user. > > >> When Tim says >> >> >> On 02/20/2012 11:45 AM, Tim Triche, Jr. wrote: >>> if these don't belong in BiocGenerics, I don't know what does :-D >> >> I'm not sure if the reference is to "intersect" (which is already in >> BiocGenerics), "%i%" etc., or to $ and $<-. For $ and $<-, they don't need >> to be in BiocGenerics because they are already made generics by the methods >> package -- BiocGenerics is meant to provide a place to define generics that >> would otherwise be defined independently in multiple packages. >> >> Martin >> >> >> it will be convenient if general $ method support replacement? even for >>> seqnames, strand and ranges part? with default checking? >>> >>> other methods mentioned by Tim are also handy if default of those function >>> is under assumption. >>> >>> On Mon, Feb 20, 2012 at 1:04 PM, Michael Lawrence<lawrence.michael@**>>> gene.com<lawrence.michael at="" gene.com=""> >>> >>>> wrote: >>>> >>> >>> On Mon, Feb 20, 2012 at 10:36 AM, Steve Lianoglou< >>>> mailinglist.honeypot at gmail.com**> wrote: >>>> >>>> Hi Tim, >>>>> >>>>> These functions are quite handy ... definitely going to poach them. >>>>> >>>>> >>>> >>>> I wonder if we should keep poaching these in the wild or if Bioc Core >>>> will >>>> ever bow to popular pressure and allow their inclusion in the base >>>> packages? ;-) >>>> >>>> >>>> Also ... you need an editor with better code folding support ;-) >>>>> >>>>> Pual, >>>>> >>>>> Aslo, to do what you want to do, namely: >>>>> >>>>> >>>>>> >>>>> extra.blank<-matrix(data=NA,**nrow=length(a.grs),ncol=** >>>> length(missing.meta.cols)) >>>> >>>>> old.meta<-as.data.frame(**values(a.grs)) >>>>>> new.meta<-cbind(old.meta,**extra.blank) >>>>>> values(a.grs)<-new.meta >>>>>> >>>>> >>>>> Using vanilla GenomicRanges functionality, you just do this: >>>>> >>>>> R> values(a.grs)<- cbind(values(a.grs), DataFrame(extra.blank)) >>>>> >>>>> HTH, >>>>> >>>>> -steve >>>>> >>>>> -- >>>>> Steve Lianoglou >>>>> Graduate Student: Computational Systems Biology >>>>> | Memorial Sloan-Kettering Cancer Center >>>>> | Weill Medical College of Cornell University >>>>> Contact Info: http://cbio.mskcc.org/~lianos/**contact<http: cbi="" o.mskcc.org="" %7elianos="" contact=""> >>>>> >>>>> ______________________________**_________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: sta="" t.ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.**science.biology.informatics.**cond uctor<http: news.gmane.org="" gmane.science.biology.informatics.conducto="" r=""> >>>>> >>>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________**_________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>> Search the archives: >>>> http://news.gmane.org/gmane.**science.biology.informatics.**condu ctor<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>>> >>>> >>> >>> >>> >> >> -- >> Computational Biology >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 >> >> Location: M1-B861 >> Telephone: 206 667-2793 >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 13.0 years ago Paul Leo ▴ 970

0

Entering edit mode

Juicy! Thanks -----Original Message----- From: Tim Triche, Jr. <tim.triche@gmail.com> Reply-to: <ttriche at="" usc.edu=""> To: p.leo at uq.edu.au Cc: bioconductor <bioconductor at="" r-project.org=""> Subject: Re: [BioC] Add extra columns to GRanges Metadata Date: Sun, 19 Feb 2012 23:06:59 -0800 Also handy, IMHO: # intersection setGeneric('%i%', # {{{ function(x, y) standardGeneric('%i%') ) # }}} setMethod('%i%', c('ANY','ANY'), function(x, y) { # {{{ intersect(x, y) }) # }}} # union setGeneric('%u%', # {{{ function(x, y) standardGeneric('%u%') ) # }}} setMethod('%u%', c('ANY','ANY'), function(x, y) { # {{{ union(x, y) }) # }}} # setdiff setGeneric('%d%', # {{{ function(x, y) standardGeneric('%d%') ) # }}} setMethod('%d%', c('ANY','ANY'), function(x, y) { # {{{ setdiff(x, y) }) # }}} # subsetByOverlaps setGeneric('%s%', # {{{ function(x, y) standardGeneric('%s%') ) # }}} setMethod('%s%', c('GRanges','GRanges'), function(x, y) { # {{{ subsetByOverlaps(x, y) }) # }}} Reason being that something like ( hESC.H3K4ME1 %s% hESC.P300 ) %u% ( hESC.H3K4ME3 %s% hESC.H3K27ME3) can be very handy at times, and all of these operators are rather naturally associative IMHO. On Sun, Feb 19, 2012 at 11:03 PM, Tim Triche, Jr. <tim.triche at="" gmail.com=""> wrote: setMethod("$", "GRanges", function(x, name) { # {{{ elementMetadata(x)[, name] }) # }}} setMethod("$<-", "GRanges", function(x, name, value) { # {{{ elementMetadata(x)[[ name ]] <- value return(x) }) # }}} problem solved :-) Of course you might want to play with SummarizedExperiment as well. On Sun, Feb 19, 2012 at 10:26 PM, Paul Leo <p.leo at="" uq.edu.au=""> wrote: I want to add extra column(s) to the elementMetaData of a GRanges object is there a short hand way of doing that? Other than extracting the meta data as a data frame ... adding what I want... and then reassigning: THAT IS NOT: extra.blank<-matrix(data=NA,nrow=length(a.grs),ncol=le ngth(missing.meta.cols)) old.meta<-as.data.frame(values(a.grs)) new.meta<-cbind(old.meta,extra.blank) values(a.grs)<-new.meta I'm missing something obvious ...! Thanks Paul Wish LIST: Other wise it would be extremely helpful to have a native c(..., .interesction) c(..., .union) where you could combine GRanges with different metaData , where intersection just kept the metadata in common and union just added NA for missing metavalues. .... for use in combining vcf files for say snp, and deletion adat where the meta data might have different attributes and it's not convenient to use a list.. _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- A model is a lie that helps you see the truth. Howard Skipper -- A model is a lie that helps you see the truth. Howard Skipper

ADD REPLY • link 13.0 years ago Paul Leo ▴ 970

Login before adding your answer.