Subscripting GenomicRanges objects with [[ or $

0

Entering edit mode

Tim Yates ▴ 250

@tim-yates-4040

Last seen 10.4 years ago

Hi again, One of the really nice things about the RangedData object is that it could be treated (in general) the same way you would treat a data.frame, so it was possible to write methods that handled both object types the same way. I have a method which currently accepts a data.frame or a RangedData object which I want to extend to allowing GRanges objects as well Without the [[ or $ subscript operators being implemented would I need to have a switch based on the class of the parameter? As the values(obj)[['field']] method only works for GRanges objects (for RangedData, this method does not cause an error, it just returns NULL), I guess I would need to write something like this: .get.column = function( obj, field ) { if( class( obj ) == 'GRanges' ) { values(obj)[[ field ]] } else { obj[[ field ]] } } Then, call .get.column(obj,'name') wherever I used to simply use obj[['name']] before introducing GenomicRanges? Tim On 27/08/2010 15:02, "Martin Morgan" <mtmorgan at="" fhcrc.org=""> wrote: > On 08/27/2010 03:03 AM, Tim Yates wrote: >> Hi Richard, >> >> Ahhh..cool, yeah that works. Shame it's not a unified interface across all >> three datatypes though. > > These were intentional design decisions to reduce ambiguities in which > of the components of these complex arguments subscript operations were > meant to apply, in the long run making it easier to write unambiguous > and easy to read code. Martin > >> >> Thanks for pointing me in the right direction though :-) >> >> Tim >> >> On 27/08/2010 10:31, "Richard Pearson" <richard.pearson at="" well.ox.ac.uk=""> >> wrote: >> >>> Hi Tim >>> >>> I think you need the values accessor method here: >>> >>> print( valuesmy.gr)[[ 'name' ]] ) >>> >>> Cheers >>> >>> Richard >>> >>> >>> Tim Yates wrote: >>>> Hi all, >>>> >>>> I'm trying to move to using GRanges objects for storing my genomic features >>>> rather than IRanges objects that I use currently. >>>> >>>> However, I cannot seem to subscript the Genomic Ranges object to extract a >>>> single column from the meta-data of the object. >>>> >>>> Hopefully this code explains what I am trying to do, and someone can point >>>> me in the right direction? >>>> >>>> Cheers, >>>> >>>> Tim >>>> >>>>> library(GenomicRanges) >>>> Loading required package: IRanges >>>> >>>> Attaching package: 'IRanges' >>>> >>>> >>>> The following object(s) are masked from package:base : >>>> >>>> cbind, >>>> Map, >>>> mapply, >>>> order, >>>> paste, >>>> pmax, >>>> pmax.int, >>>> pmin, >>>> pmin.int, >>>> rbind, >>>> rep.int, >>>> table >>>> >>>>> library(GenomicRanges) >>>>> my.starts = c( 10, 100, 1000 ) >>>>> my.ends = c( 20, 200, 2000 ) >>>>> my.spaces = c( '1', '2', '3' ) >>>>> my.strands = c( '+', '+', '-' ) >>>>> my.names = c( 'seq1', 'seq2', 'seq3' ) >>>>> my.delta = c( 1.23, 2.34, 3.45 ) >>>>> >>>>> my.df = data.frame( start=my.starts, end=my.ends, space=my.spaces, >>>> strand=my.strands, name=my.names, delta=my.delta ) >>>>> my.rd = as( my.df, 'RangedData' ) >>>>> my.gr = as( my.rd, 'GRanges' ) >>>>> >>>> >>>> # Extract the name field from each of these objects using [[ >>>> >>>>> print( my.df[[ 'name' ]] ) >>>> [1] seq1 seq2 seq3 >>>> Levels: seq1 seq2 seq3 >>>>> print( my.rd[[ 'name' ]] ) >>>> [1] seq1 seq2 seq3 >>>> Levels: seq1 seq2 seq3 >>>>> print( my.gr[[ 'name' ]] ) >>>> Error in my.gr[["name"]] : missing '[[' method for Sequence class GRanges >>>> >>>> # Extract the name field from each of these objects using $ >>>> >>>>> print( my.df$'name' ) >>>> [1] seq1 seq2 seq3 >>>> Levels: seq1 seq2 seq3 >>>>> print( my.rd$'name' ) >>>> [1] seq1 seq2 seq3 >>>> Levels: seq1 seq2 seq3 >>>>> print( my.gr$'name' ) >>>> Error in x[[name, exact = FALSE]] : >>>> missing '[[' method for Sequence class GRanges >>>>> sessionInfo() >>>> R version 2.10.1 (2009-12-14) >>>> x86_64-apple-darwin9.8.0 >>>> >>>> locale: >>>> [1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] GenomicRanges_1.0.8 IRanges_1.6.15 >>>> -------------------------------------------------------- >>>> This email is confidential and intended solely for the u...{{dropped:15}} >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >> -------------------------------------------------------- >> This email is confidential and intended solely for the u...{{dropped:12}} >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > -------------------------------------------------------- This email is confidential and intended solely for the u...{{dropped:12}}

IRanges IRanges • 1.4k views

ADD COMMENT • link updated 14.4 years ago by Michael Lawrence ★ 11k • written 14.4 years ago by Tim Yates ▴ 250

0

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 3.2 years ago

United States

On Wed, Sep 1, 2010 at 3:07 AM, Tim Yates <tyates@picr.man.ac.uk> wrote: > Hi again, > > One of the really nice things about the RangedData object is that it could > be treated (in general) the same way you would treat a data.frame, so it > was > possible to write methods that handled both object types the same way. > > This was one of the design goals. Unfortunately, RangedData has some strange behavior due to its internal structure. For example, it is not possible to reorder rows across spaces (chromosomes). Usually, this is not a big deal, but it can bite you. GRanges takes a simpler, flatter approach, but it was designed as a set of ranges with formal treatment of spaces, strands + extra information, rather than as a data frame with formal treatment of spaces and ranges (RangedData). I have a method which currently accepts a data.frame or a RangedData object > which I want to extend to allowing GRanges objects as well > > Without the [[ or $ subscript operators being implemented would I need to > have a switch based on the class of the parameter? > > As the values(obj)[['field']] method only works for GRanges objects (for > RangedData, this method does not cause an error, it just returns NULL), Yes, there is an unfortunate conflict here. values() for RangedData returns the DataFrameList, so its names are the names of the chromosomes. I think you're better off adding a [[ method for GRanges objects, rather than a .get.column(). Michael > I > guess I would need to write something like this: > > .get.column = function( obj, field ) { > if( class( obj ) == 'GRanges' ) { > values(obj)[[ field ]] > } > else { > obj[[ field ]] > } > } > > Then, call > > .get.column(obj,'name') > > wherever I used to simply use > > obj[['name']] > > before introducing GenomicRanges? > > Tim > > On 27/08/2010 15:02, "Martin Morgan" <mtmorgan@fhcrc.org> wrote: > > > On 08/27/2010 03:03 AM, Tim Yates wrote: > >> Hi Richard, > >> > >> Ahhh..cool, yeah that works. Shame it's not a unified interface across > all > >> three datatypes though. > > > > These were intentional design decisions to reduce ambiguities in which > > of the components of these complex arguments subscript operations were > > meant to apply, in the long run making it easier to write unambiguous > > and easy to read code. Martin > > > >> > >> Thanks for pointing me in the right direction though :-) > >> > >> Tim > >> > >> On 27/08/2010 10:31, "Richard Pearson" <richard.pearson@well.ox.ac.uk> > >> wrote: > >> > >>> Hi Tim > >>> > >>> I think you need the values accessor method here: > >>> > >>> print( valuesmy.gr)[[ 'name' ]] ) > >>> > >>> Cheers > >>> > >>> Richard > >>> > >>> > >>> Tim Yates wrote: > >>>> Hi all, > >>>> > >>>> I'm trying to move to using GRanges objects for storing my genomic > features > >>>> rather than IRanges objects that I use currently. > >>>> > >>>> However, I cannot seem to subscript the Genomic Ranges object to > extract a > >>>> single column from the meta-data of the object. > >>>> > >>>> Hopefully this code explains what I am trying to do, and someone can > point > >>>> me in the right direction? > >>>> > >>>> Cheers, > >>>> > >>>> Tim > >>>> > >>>>> library(GenomicRanges) > >>>> Loading required package: IRanges > >>>> > >>>> Attaching package: 'IRanges' > >>>> > >>>> > >>>> The following object(s) are masked from package:base : > >>>> > >>>> cbind, > >>>> Map, > >>>> mapply, > >>>> order, > >>>> paste, > >>>> pmax, > >>>> pmax.int, > >>>> pmin, > >>>> pmin.int, > >>>> rbind, > >>>> rep.int, > >>>> table > >>>> > >>>>> library(GenomicRanges) > >>>>> my.starts = c( 10, 100, 1000 ) > >>>>> my.ends = c( 20, 200, 2000 ) > >>>>> my.spaces = c( '1', '2', '3' ) > >>>>> my.strands = c( '+', '+', '-' ) > >>>>> my.names = c( 'seq1', 'seq2', 'seq3' ) > >>>>> my.delta = c( 1.23, 2.34, 3.45 ) > >>>>> > >>>>> my.df = data.frame( start=my.starts, end=my.ends, space=my.spaces, > >>>> strand=my.strands, name=my.names, delta=my.delta ) > >>>>> my.rd = as( my.df, 'RangedData' ) > >>>>> my.gr = as( my.rd, 'GRanges' ) > >>>>> > >>>> > >>>> # Extract the name field from each of these objects using [[ > >>>> > >>>>> print( my.df[[ 'name' ]] ) > >>>> [1] seq1 seq2 seq3 > >>>> Levels: seq1 seq2 seq3 > >>>>> print( my.rd[[ 'name' ]] ) > >>>> [1] seq1 seq2 seq3 > >>>> Levels: seq1 seq2 seq3 > >>>>> print( my.gr[[ 'name' ]] ) > >>>> Error in my.gr[["name"]] : missing '[[' method for Sequence class > GRanges > >>>> > >>>> # Extract the name field from each of these objects using $ > >>>> > >>>>> print( my.df$'name' ) > >>>> [1] seq1 seq2 seq3 > >>>> Levels: seq1 seq2 seq3 > >>>>> print( my.rd$'name' ) > >>>> [1] seq1 seq2 seq3 > >>>> Levels: seq1 seq2 seq3 > >>>>> print( my.gr$'name' ) > >>>> Error in x[[name, exact = FALSE]] : > >>>> missing '[[' method for Sequence class GRanges > >>>>> sessionInfo() > >>>> R version 2.10.1 (2009-12-14) > >>>> x86_64-apple-darwin9.8.0 > >>>> > >>>> locale: > >>>> [1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 > >>>> > >>>> attached base packages: > >>>> [1] stats graphics grDevices utils datasets methods base > >>>> > >>>> other attached packages: > >>>> [1] GenomicRanges_1.0.8 IRanges_1.6.15 > >>>> -------------------------------------------------------- > >>>> This email is confidential and intended solely for the > u...{{dropped:15}} > >>>> > >>>> _______________________________________________ > >>>> Bioconductor mailing list > >>>> Bioconductor@stat.math.ethz.ch > >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>>> Search the archives: > >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >>>> > >> -------------------------------------------------------- > >> This email is confidential and intended solely for the > u...{{dropped:12}} > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@stat.math.ethz.ch > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -------------------------------------------------------- > This email is confidential and intended solely for the...{{dropped:13}}

ADD COMMENT • link 14.4 years ago Michael Lawrence ★ 11k

0

Entering edit mode

I am not sure where the design will lead, but another aspect of GRanges is that it has an accompanying GRangesList class for housing information such as the constituent exons in a transcript. There is a benefit for developers and script writers to having a similar mechanism for extracting these metadata columns for both class types. For a GRangesList, the [[/$ operators pull out a GRanges object for the selected transcript. So even if [[ and $ methods were added for GRanges, there would still be an issue for GRangesList objects. Cheers, Patrick Quoting Michael Lawrence <lawrence.michael at="" gene.com="">: > On Wed, Sep 1, 2010 at 3:07 AM, Tim Yates <tyates at="" picr.man.ac.uk=""> wrote: > >> Hi again, >> >> One of the really nice things about the RangedData object is that it could >> be treated (in general) the same way you would treat a data.frame, so it >> was >> possible to write methods that handled both object types the same way. >> >> > This was one of the design goals. Unfortunately, RangedData has some strange > behavior due to its internal structure. For example, it is not possible to > reorder rows across spaces (chromosomes). Usually, this is not a big deal, > but it can bite you. GRanges takes a simpler, flatter approach, but it was > designed as a set of ranges with formal treatment of spaces, strands + extra > information, rather than as a data frame with formal treatment of spaces and > ranges (RangedData). > > I have a method which currently accepts a data.frame or a RangedData object >> which I want to extend to allowing GRanges objects as well >> >> Without the [[ or $ subscript operators being implemented would I need to >> have a switch based on the class of the parameter? >> >> As the values(obj)[['field']] method only works for GRanges objects (for >> RangedData, this method does not cause an error, it just returns NULL), > > > > Yes, there is an unfortunate conflict here. values() for RangedData returns > the DataFrameList, so its names are the names of the chromosomes. I think > you're better off adding a [[ method for GRanges objects, rather than a > .get.column(). > > Michael > > >> I >> guess I would need to write something like this: >> >> .get.column = function( obj, field ) { >> if( class( obj ) == 'GRanges' ) { >> values(obj)[[ field ]] >> } >> else { >> obj[[ field ]] >> } >> } >> >> Then, call >> >> .get.column(obj,'name') >> >> wherever I used to simply use >> >> obj[['name']] >> >> before introducing GenomicRanges? >> >> Tim >> >> On 27/08/2010 15:02, "Martin Morgan" <mtmorgan at="" fhcrc.org=""> wrote: >> >> > On 08/27/2010 03:03 AM, Tim Yates wrote: >> >> Hi Richard, >> >> >> >> Ahhh..cool, yeah that works. Shame it's not a unified interface across >> all >> >> three datatypes though. >> > >> > These were intentional design decisions to reduce ambiguities in which >> > of the components of these complex arguments subscript operations were >> > meant to apply, in the long run making it easier to write unambiguous >> > and easy to read code. Martin >> > >> >> >> >> Thanks for pointing me in the right direction though :-) >> >> >> >> Tim >> >> >> >> On 27/08/2010 10:31, "Richard Pearson" <richard.pearson at="" well.ox.ac.uk=""> >> >> wrote: >> >> >> >>> Hi Tim >> >>> >> >>> I think you need the values accessor method here: >> >>> >> >>> print( valuesmy.gr)[[ 'name' ]] ) >> >>> >> >>> Cheers >> >>> >> >>> Richard >> >>> >> >>> >> >>> Tim Yates wrote: >> >>>> Hi all, >> >>>> >> >>>> I'm trying to move to using GRanges objects for storing my genomic >> features >> >>>> rather than IRanges objects that I use currently. >> >>>> >> >>>> However, I cannot seem to subscript the Genomic Ranges object to >> extract a >> >>>> single column from the meta-data of the object. >> >>>> >> >>>> Hopefully this code explains what I am trying to do, and someone can >> point >> >>>> me in the right direction? >> >>>> >> >>>> Cheers, >> >>>> >> >>>> Tim >> >>>> >> >>>>> library(GenomicRanges) >> >>>> Loading required package: IRanges >> >>>> >> >>>> Attaching package: 'IRanges' >> >>>> >> >>>> >> >>>> The following object(s) are masked from package:base : >> >>>> >> >>>> cbind, >> >>>> Map, >> >>>> mapply, >> >>>> order, >> >>>> paste, >> >>>> pmax, >> >>>> pmax.int, >> >>>> pmin, >> >>>> pmin.int, >> >>>> rbind, >> >>>> rep.int, >> >>>> table >> >>>> >> >>>>> library(GenomicRanges) >> >>>>> my.starts = c( 10, 100, 1000 ) >> >>>>> my.ends = c( 20, 200, 2000 ) >> >>>>> my.spaces = c( '1', '2', '3' ) >> >>>>> my.strands = c( '+', '+', '-' ) >> >>>>> my.names = c( 'seq1', 'seq2', 'seq3' ) >> >>>>> my.delta = c( 1.23, 2.34, 3.45 ) >> >>>>> >> >>>>> my.df = data.frame( start=my.starts, end=my.ends, space=my.spaces, >> >>>> strand=my.strands, name=my.names, delta=my.delta ) >> >>>>> my.rd = as( my.df, 'RangedData' ) >> >>>>> my.gr = as( my.rd, 'GRanges' ) >> >>>>> >> >>>> >> >>>> # Extract the name field from each of these objects using [[ >> >>>> >> >>>>> print( my.df[[ 'name' ]] ) >> >>>> [1] seq1 seq2 seq3 >> >>>> Levels: seq1 seq2 seq3 >> >>>>> print( my.rd[[ 'name' ]] ) >> >>>> [1] seq1 seq2 seq3 >> >>>> Levels: seq1 seq2 seq3 >> >>>>> print( my.gr[[ 'name' ]] ) >> >>>> Error in my.gr[["name"]] : missing '[[' method for Sequence class >> GRanges >> >>>> >> >>>> # Extract the name field from each of these objects using $ >> >>>> >> >>>>> print( my.df$'name' ) >> >>>> [1] seq1 seq2 seq3 >> >>>> Levels: seq1 seq2 seq3 >> >>>>> print( my.rd$'name' ) >> >>>> [1] seq1 seq2 seq3 >> >>>> Levels: seq1 seq2 seq3 >> >>>>> print( my.gr$'name' ) >> >>>> Error in x[[name, exact = FALSE]] : >> >>>> missing '[[' method for Sequence class GRanges >> >>>>> sessionInfo() >> >>>> R version 2.10.1 (2009-12-14) >> >>>> x86_64-apple-darwin9.8.0 >> >>>> >> >>>> locale: >> >>>> [1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 >> >>>> >> >>>> attached base packages: >> >>>> [1] stats graphics grDevices utils datasets methods base >> >>>> >> >>>> other attached packages: >> >>>> [1] GenomicRanges_1.0.8 IRanges_1.6.15 >> >>>> -------------------------------------------------------- >> >>>> This email is confidential and intended solely for the >> u...{{dropped:15}} >> >>>> >> >>>> _______________________________________________ >> >>>> Bioconductor mailing list >> >>>> Bioconductor at stat.math.ethz.ch >> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >>>> Search the archives: >> >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >>>> >> >> -------------------------------------------------------- >> >> This email is confidential and intended solely for the >> u...{{dropped:12}} >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor at stat.math.ethz.ch >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> -------------------------------------------------------- >> This email is confidential and intended solely for the...{{dropped:13}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 14.4 years ago Patrick Aboyoun ★ 1.6k

0

Entering edit mode

I agree that inconsistencies are undesirable, but there are already enough inconsistencies between GRangesList and GRanges that writing a method for their union would not be a trivial exercise. In this case, it would only be a short-cut that would need to be avoided. A warning to this effect in the documentation may be sufficient. Michael On Wed, Sep 1, 2010 at 9:27 AM, Patrick Aboyoun <paboyoun@fhcrc.org> wrote: > I am not sure where the design will lead, but another aspect of GRanges is > that it has an accompanying GRangesList class for housing information such > as the constituent exons in a transcript. There is a benefit for developers > and script writers to having a similar mechanism for extracting these > metadata columns for both class types. For a GRangesList, the [[/$ operators > pull out a GRanges object for the selected transcript. So even if [[ and $ > methods were added for GRanges, there would still be an issue for > GRangesList objects. > > > Cheers, > Patrick > > > > Quoting Michael Lawrence <lawrence.michael@gene.com>: > > On Wed, Sep 1, 2010 at 3:07 AM, Tim Yates <tyates@picr.man.ac.uk> wrote: >> >> Hi again, >>> >>> One of the really nice things about the RangedData object is that it >>> could >>> be treated (in general) the same way you would treat a data.frame, so it >>> was >>> possible to write methods that handled both object types the same way. >>> >>> >>> This was one of the design goals. Unfortunately, RangedData has some >> strange >> behavior due to its internal structure. For example, it is not possible to >> reorder rows across spaces (chromosomes). Usually, this is not a big deal, >> but it can bite you. GRanges takes a simpler, flatter approach, but it was >> designed as a set of ranges with formal treatment of spaces, strands + >> extra >> information, rather than as a data frame with formal treatment of spaces >> and >> ranges (RangedData). >> >> I have a method which currently accepts a data.frame or a RangedData >> object >> >>> which I want to extend to allowing GRanges objects as well >>> >>> Without the [[ or $ subscript operators being implemented would I need to >>> have a switch based on the class of the parameter? >>> >>> As the values(obj)[['field']] method only works for GRanges objects (for >>> RangedData, this method does not cause an error, it just returns NULL), >>> >> >> >> >> Yes, there is an unfortunate conflict here. values() for RangedData >> returns >> the DataFrameList, so its names are the names of the chromosomes. I think >> you're better off adding a [[ method for GRanges objects, rather than a >> .get.column(). >> >> Michael >> >> >> I >>> guess I would need to write something like this: >>> >>> .get.column = function( obj, field ) { >>> if( class( obj ) == 'GRanges' ) { >>> values(obj)[[ field ]] >>> } >>> else { >>> obj[[ field ]] >>> } >>> } >>> >>> Then, call >>> >>> .get.column(obj,'name') >>> >>> wherever I used to simply use >>> >>> obj[['name']] >>> >>> before introducing GenomicRanges? >>> >>> Tim >>> >>> On 27/08/2010 15:02, "Martin Morgan" <mtmorgan@fhcrc.org> wrote: >>> >>> > On 08/27/2010 03:03 AM, Tim Yates wrote: >>> >> Hi Richard, >>> >> >>> >> Ahhh..cool, yeah that works. Shame it's not a unified interface across >>> all >>> >> three datatypes though. >>> > >>> > These were intentional design decisions to reduce ambiguities in which >>> > of the components of these complex arguments subscript operations were >>> > meant to apply, in the long run making it easier to write unambiguous >>> > and easy to read code. Martin >>> > >>> >> >>> >> Thanks for pointing me in the right direction though :-) >>> >> >>> >> Tim >>> >> >>> >> On 27/08/2010 10:31, "Richard Pearson" <richard.pearson@well.ox.ac.uk>>> > >>> >> wrote: >>> >> >>> >>> Hi Tim >>> >>> >>> >>> I think you need the values accessor method here: >>> >>> >>> >>> print( valuesmy.gr)[[ 'name' ]] ) >>> >>> >>> >>> Cheers >>> >>> >>> >>> Richard >>> >>> >>> >>> >>> >>> Tim Yates wrote: >>> >>>> Hi all, >>> >>>> >>> >>>> I'm trying to move to using GRanges objects for storing my genomic >>> features >>> >>>> rather than IRanges objects that I use currently. >>> >>>> >>> >>>> However, I cannot seem to subscript the Genomic Ranges object to >>> extract a >>> >>>> single column from the meta-data of the object. >>> >>>> >>> >>>> Hopefully this code explains what I am trying to do, and someone can >>> point >>> >>>> me in the right direction? >>> >>>> >>> >>>> Cheers, >>> >>>> >>> >>>> Tim >>> >>>> >>> >>>>> library(GenomicRanges) >>> >>>> Loading required package: IRanges >>> >>>> >>> >>>> Attaching package: 'IRanges' >>> >>>> >>> >>>> >>> >>>> The following object(s) are masked from package:base : >>> >>>> >>> >>>> cbind, >>> >>>> Map, >>> >>>> mapply, >>> >>>> order, >>> >>>> paste, >>> >>>> pmax, >>> >>>> pmax.int, >>> >>>> pmin, >>> >>>> pmin.int, >>> >>>> rbind, >>> >>>> rep.int, >>> >>>> table >>> >>>> >>> >>>>> library(GenomicRanges) >>> >>>>> my.starts = c( 10, 100, 1000 ) >>> >>>>> my.ends = c( 20, 200, 2000 ) >>> >>>>> my.spaces = c( '1', '2', '3' ) >>> >>>>> my.strands = c( '+', '+', '-' ) >>> >>>>> my.names = c( 'seq1', 'seq2', 'seq3' ) >>> >>>>> my.delta = c( 1.23, 2.34, 3.45 ) >>> >>>>> >>> >>>>> my.df = data.frame( start=my.starts, end=my.ends, space=my.spaces, >>> >>>> strand=my.strands, name=my.names, delta=my.delta ) >>> >>>>> my.rd = as( my.df, 'RangedData' ) >>> >>>>> my.gr = as( my.rd, 'GRanges' ) >>> >>>>> >>> >>>> >>> >>>> # Extract the name field from each of these objects using [[ >>> >>>> >>> >>>>> print( my.df[[ 'name' ]] ) >>> >>>> [1] seq1 seq2 seq3 >>> >>>> Levels: seq1 seq2 seq3 >>> >>>>> print( my.rd[[ 'name' ]] ) >>> >>>> [1] seq1 seq2 seq3 >>> >>>> Levels: seq1 seq2 seq3 >>> >>>>> print( my.gr[[ 'name' ]] ) >>> >>>> Error in my.gr[["name"]] : missing '[[' method for Sequence class >>> GRanges >>> >>>> >>> >>>> # Extract the name field from each of these objects using $ >>> >>>> >>> >>>>> print( my.df$'name' ) >>> >>>> [1] seq1 seq2 seq3 >>> >>>> Levels: seq1 seq2 seq3 >>> >>>>> print( my.rd$'name' ) >>> >>>> [1] seq1 seq2 seq3 >>> >>>> Levels: seq1 seq2 seq3 >>> >>>>> print( my.gr$'name' ) >>> >>>> Error in x[[name, exact = FALSE]] : >>> >>>> missing '[[' method for Sequence class GRanges >>> >>>>> sessionInfo() >>> >>>> R version 2.10.1 (2009-12-14) >>> >>>> x86_64-apple-darwin9.8.0 >>> >>>> >>> >>>> locale: >>> >>>> [1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 >>> >>>> >>> >>>> attached base packages: >>> >>>> [1] stats graphics grDevices utils datasets methods base >>> >>>> >>> >>>> other attached packages: >>> >>>> [1] GenomicRanges_1.0.8 IRanges_1.6.15 >>> >>>> -------------------------------------------------------- >>> >>>> This email is confidential and intended solely for the >>> u...{{dropped:15}} >>> >>>> >>> >>>> _______________________________________________ >>> >>>> Bioconductor mailing list >>> >>>> Bioconductor@stat.math.ethz.ch >>> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> >>>> Search the archives: >>> >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>>> >>> >> -------------------------------------------------------- >>> >> This email is confidential and intended solely for the >>> u...{{dropped:12}} >>> >> >>> >> _______________________________________________ >>> >> Bioconductor mailing list >>> >> Bioconductor@stat.math.ethz.ch >>> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> >> Search the archives: >>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> > >>> -------------------------------------------------------- >>> This email is confidential and intended solely for the...{{dropped:13}} >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > > > [[alternative HTML version deleted]]

ADD REPLY • link 14.4 years ago Michael Lawrence ★ 11k

0

Entering edit mode

After sleeping on it overnight, I think I might go the .get.column route It would be possible for me to do something like: If( length( showMethods( '[[', classes='RangedData', inherited=F, showEmpty=F, printTo=F ) ) == 0 ) { setMethod("[[", "GRanges", function(x, i, j, ...) { ...code... } } But I worry that this would firstly pollute the GRanges namespace globally from an external location (which could result in bugs that are hard to track down, but will be blamed on the GRanges package), and secondly break if the '[[' method was defined for GRanges elsewhere with a different meaning than I am expecting in my package. Cheers, Tim On 02/09/2010 20:08, "Michael Lawrence" <lawrence.michael at="" gene.com=""> wrote: I agree that inconsistencies are undesirable, but there are already enough inconsistencies between GRangesList and GRanges that writing a method for their union would not be a trivial exercise. In this case, it would only be a short-cut that would need to be avoided. A warning to this effect in the documentation may be sufficient. Michael On Wed, Sep 1, 2010 at 9:27 AM, Patrick Aboyoun <paboyoun at="" fhcrc.org=""> wrote: I am not sure where the design will lead, but another aspect of GRanges is that it has an accompanying GRangesList class for housing information such as the constituent exons in a transcript. There is a benefit for developers and script writers to having a similar mechanism for extracting these metadata columns for both class types. For a GRangesList, the [[/$ operators pull out a GRanges object for the selected transcript. So even if [[ and $ methods were added for GRanges, there would still be an issue for GRangesList objects. Cheers, Patrick Quoting Michael Lawrence <lawrence.michael at="" gene.com="">: On Wed, Sep 1, 2010 at 3:07 AM, Tim Yates <tyates at="" picr.man.ac.uk=""> wrote: Hi again, One of the really nice things about the RangedData object is that it could be treated (in general) the same way you would treat a data.frame, so it was possible to write methods that handled both object types the same way. This was one of the design goals. Unfortunately, RangedData has some strange behavior due to its internal structure. For example, it is not possible to reorder rows across spaces (chromosomes). Usually, this is not a big deal, but it can bite you. GRanges takes a simpler, flatter approach, but it was designed as a set of ranges with formal treatment of spaces, strands + extra information, rather than as a data frame with formal treatment of spaces and ranges (RangedData). I have a method which currently accepts a data.frame or a RangedData object which I want to extend to allowing GRanges objects as well Without the [[ or $ subscript operators being implemented would I need to have a switch based on the class of the parameter? As the values(obj)[['field']] method only works for GRanges objects (for RangedData, this method does not cause an error, it just returns NULL), Yes, there is an unfortunate conflict here. values() for RangedData returns the DataFrameList, so its names are the names of the chromosomes. I think you're better off adding a [[ method for GRanges objects, rather than a .get.column(). Michael I guess I would need to write something like this: .get.column = function( obj, field ) { if( class( obj ) == 'GRanges' ) { values(obj)[[ field ]] } else { obj[[ field ]] } } Then, call .get.column(obj,'name') wherever I used to simply use obj[['name']] before introducing GenomicRanges? Tim On 27/08/2010 15:02, "Martin Morgan" <mtmorgan at="" fhcrc.org=""> wrote: > On 08/27/2010 03:03 AM, Tim Yates wrote: >> Hi Richard, >> >> Ahhh..cool, yeah that works. Shame it's not a unified interface across all >> three datatypes though. > > These were intentional design decisions to reduce ambiguities in which > of the components of these complex arguments subscript operations were > meant to apply, in the long run making it easier to write unambiguous > and easy to read code. Martin > >> >> Thanks for pointing me in the right direction though :-) >> >> Tim >> >> On 27/08/2010 10:31, "Richard Pearson" <richard.pearson at="" well.ox.ac.uk=""> >> wrote: >> >>> Hi Tim >>> >>> I think you need the values accessor method here: >>> >>> print( valuesmy.gr <http: my.gr=""> )[[ 'name' ]] ) >>> >>> Cheers >>> >>> Richard >>> >>> >>> Tim Yates wrote: >>>> Hi all, >>>> >>>> I'm trying to move to using GRanges objects for storing my genomic features >>>> rather than IRanges objects that I use currently. >>>> >>>> However, I cannot seem to subscript the Genomic Ranges object to extract a >>>> single column from the meta-data of the object. >>>> >>>> Hopefully this code explains what I am trying to do, and someone can point >>>> me in the right direction? >>>> >>>> Cheers, >>>> >>>> Tim >>>> >>>>> library(GenomicRanges) >>>> Loading required package: IRanges >>>> >>>> Attaching package: 'IRanges' >>>> >>>> >>>> The following object(s) are masked from package:base : >>>> >>>> cbind, >>>> Map, >>>> mapply, >>>> order, >>>> paste, >>>> pmax, >>>> pmax.int <http: pmax.int=""> , >>>> pmin, >>>> pmin.int <http: pmin.int=""> , >>>> rbind, >>>> rep.int <http: rep.int=""> , >>>> table >>>> >>>>> library(GenomicRanges) >>>>> my.starts = c( 10, 100, 1000 ) >>>>> my.ends = c( 20, 200, 2000 ) >>>>> my.spaces = c( '1', '2', '3' ) >>>>> my.strands = c( '+', '+', '-' ) >>>>> my.names = c( 'seq1', 'seq2', 'seq3' ) >>>>> my.delta = c( 1.23, 2.34, 3.45 ) >>>>> >>>>> my.df = data.frame( start=my.starts, end=my.ends, space=my.spaces, >>>> strand=my.strands, name=my.names, delta=my.delta ) >>>>> my.rd = as( my.df, 'RangedData' ) >>>>> my.gr <http: my.gr=""> = as( my.rd, 'GRanges' ) >>>>> >>>> >>>> # Extract the name field from each of these objects using [[ >>>> >>>>> print( my.df[[ 'name' ]] ) >>>> [1] seq1 seq2 seq3 >>>> Levels: seq1 seq2 seq3 >>>>> print( my.rd[[ 'name' ]] ) >>>> [1] seq1 seq2 seq3 >>>> Levels: seq1 seq2 seq3 >>>>> print( my.gr <http: my.gr=""> [[ 'name' ]] ) >>>> Error in my.gr <http: my.gr=""> [["name"]] : missing '[[' method for Sequence class GRanges >>>> >>>> # Extract the name field from each of these objects using $ >>>> >>>>> print( my.df$'name' ) >>>> [1] seq1 seq2 seq3 >>>> Levels: seq1 seq2 seq3 >>>>> print( my.rd$'name' ) >>>> [1] seq1 seq2 seq3 >>>> Levels: seq1 seq2 seq3 >>>>> print( my.gr <http: my.gr=""> $'name' ) >>>> Error in x[[name, exact = FALSE]] : >>>> missing '[[' method for Sequence class GRanges >>>>> sessionInfo() >>>> R version 2.10.1 (2009-12-14) >>>> x86_64-apple-darwin9.8.0 >>>> >>>> locale: >>>> [1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] GenomicRanges_1.0.8 IRanges_1.6.15 >>>> -------------------------------------------------------- >>>> This email is confidential and intended solely for the u...{{dropped:15}} >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >> -------------------------------------------------------- >> This email is confidential and intended solely for the u...{{dropped:12}} >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > -------------------------------------------------------- This email is confidential and intended solely for the...{{dropped:13}} _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 14.4 years ago Tim Yates ▴ 250

0

Entering edit mode

On Fri, Sep 3, 2010 at 1:04 AM, Tim Yates <tyates@picr.man.ac.uk> wrote: > After sleeping on it overnight, I think I might go the .get.column route > > It would be possible for me to do something like: > > If( length( showMethods( '[[', classes='RangedData', inherited=F, > showEmpty=F, printTo=F ) ) == 0 ) { > setMethod("[[", "GRanges", > function(x, i, j, ...) { ...code... } > } > > But I worry that this would firstly pollute the GRanges namespace globally > from an external location (which could result in bugs that are hard to track > down, but will be blamed on the GRanges package), and secondly break if the > '[[' method was defined for GRanges elsewhere with a different meaning than > I am expecting in my package. > > I think your two worries are the same worries shared by every R package. Name collisions are always a possibility, even with namespaces, unless the user avoids attaching packages and qualifies every function call with its namespace (obviously unrealistic). Furthermore, the ability to extend another package by adding methods on a class is a feature that is worth the added complexity, in my opinion. That said, conceptually it would not make sense for an arbitrary package to add a [[ method on GRanges, i.e., it would belong in a package that primarily provides general extensions. You're always free to keep the [[ within your package namespace (might not be possible due to lack of granularity in export of methods). > Cheers, > > Tim > > On 02/09/2010 20:08, "Michael Lawrence" <lawrence.michael@gene.com> wrote: > > > > I agree that inconsistencies are undesirable, but there are already > enough inconsistencies between GRangesList and GRanges that writing a method > for their union would not be a trivial exercise. In this case, it would only > be a short-cut that would need to be avoided. A warning to this effect in > the documentation may be sufficient. > > Michael > > On Wed, Sep 1, 2010 at 9:27 AM, Patrick Aboyoun <paboyoun@fhcrc.org> > wrote: > > > I am not sure where the design will lead, but another aspect > of GRanges is that it has an accompanying GRangesList class for housing > information such as the constituent exons in a transcript. There is a > benefit for developers and script writers to having a similar mechanism for > extracting these metadata columns for both class types. For a GRangesList, > the [[/$ operators pull out a GRanges object for the selected transcript. So > even if [[ and $ methods were added for GRanges, there would still be an > issue for GRangesList objects. > > > Cheers, > Patrick > > > > Quoting Michael Lawrence <lawrence.michael@gene.com>: > > > > On Wed, Sep 1, 2010 at 3:07 AM, Tim Yates < > tyates@picr.man.ac.uk> wrote: > > > > Hi again, > > One of the really nice things about the > RangedData object is that it could > be treated (in general) the same way you > would treat a data.frame, so it > was > possible to write methods that handled both > object types the same way. > > > > > This was one of the design goals. Unfortunately, > RangedData has some strange > behavior due to its internal structure. For example, > it is not possible to > reorder rows across spaces (chromosomes). Usually, > this is not a big deal, > but it can bite you. GRanges takes a simpler, > flatter approach, but it was > designed as a set of ranges with formal treatment of > spaces, strands + extra > information, rather than as a data frame with formal > treatment of spaces and > ranges (RangedData). > > I have a method which currently accepts a data.frame > or a RangedData object > > > which I want to extend to allowing GRanges > objects as well > > Without the [[ or $ subscript operators > being implemented would I need to > have a switch based on the class of the > parameter? > > As the values(obj)[['field']] method only > works for GRanges objects (for > RangedData, this method does not cause an > error, it just returns NULL), > > > > > > Yes, there is an unfortunate conflict here. values() > for RangedData returns > the DataFrameList, so its names are the names of the > chromosomes. I think > you're better off adding a [[ method for GRanges > objects, rather than a > .get.column(). > > Michael > > > > > I > guess I would need to write something like > this: > > .get.column = function( obj, field ) { > if( class( obj ) == 'GRanges' ) { > values(obj)[[ field ]] > } > else { > obj[[ field ]] > } > } > > Then, call > > .get.column(obj,'name') > > wherever I used to simply use > > obj[['name']] > > before introducing GenomicRanges? > > Tim > > On 27/08/2010 15:02, "Martin Morgan" < > mtmorgan@fhcrc.org> wrote: > > > On 08/27/2010 03:03 AM, Tim Yates wrote: > >> Hi Richard, > >> > >> Ahhh..cool, yeah that works. Shame it's > not a unified interface across > all > >> three datatypes though. > > > > These were intentional design decisions to > reduce ambiguities in which > > of the components of these complex > arguments subscript operations were > > meant to apply, in the long run making it > easier to write unambiguous > > and easy to read code. Martin > > > >> > >> Thanks for pointing me in the right > direction though :-) > >> > >> Tim > >> > >> On 27/08/2010 10:31, "Richard Pearson" < > richard.pearson@well.ox.ac.uk> > >> wrote: > >> > >>> Hi Tim > >>> > >>> I think you need the values accessor > method here: > >>> > >>> print( valuesmy.gr <http: my.gr=""> )[[ > 'name' ]] ) > >>> > >>> Cheers > >>> > >>> Richard > >>> > >>> > >>> Tim Yates wrote: > >>>> Hi all, > >>>> > >>>> I'm trying to move to using GRanges > objects for storing my genomic > features > >>>> rather than IRanges objects that I use > currently. > >>>> > >>>> However, I cannot seem to subscript the > Genomic Ranges object to > extract a > >>>> single column from the meta-data of the > object. > >>>> > >>>> Hopefully this code explains what I am > trying to do, and someone can > point > >>>> me in the right direction? > >>>> > >>>> Cheers, > >>>> > >>>> Tim > >>>> > >>>>> library(GenomicRanges) > >>>> Loading required package: IRanges > >>>> > >>>> Attaching package: 'IRanges' > >>>> > >>>> > >>>> The following object(s) are masked > from package:base : > >>>> > >>>> cbind, > >>>> Map, > >>>> mapply, > >>>> order, > >>>> paste, > >>>> pmax, > >>>> pmax.int <http: pmax.int=""> , > >>>> pmin, > >>>> pmin.int <http: pmin.int=""> , > >>>> rbind, > >>>> rep.int <http: rep.int=""> , > >>>> table > >>>> > >>>>> library(GenomicRanges) > >>>>> my.starts = c( 10, 100, 1000 > ) > >>>>> my.ends = c( 20, 200, 2000 > ) > >>>>> my.spaces = c( '1', '2', '3' > ) > >>>>> my.strands = c( '+', '+', '-' > ) > >>>>> my.names = c( 'seq1', 'seq2', 'seq3' > ) > >>>>> my.delta = c( 1.23, 2.34, 3.45 > ) > >>>>> > >>>>> my.df = data.frame( start=my.starts, > end=my.ends, space=my.spaces, > >>>> strand=my.strands, name=my.names, > delta=my.delta ) > >>>>> my.rd = as( my.df, 'RangedData' ) > >>>>> my.gr <http: my.gr=""> = as( my.rd, > 'GRanges' ) > >>>>> > >>>> > >>>> # Extract the name field from each of > these objects using [[ > >>>> > >>>>> print( my.df[[ 'name' ]] ) > >>>> [1] seq1 seq2 seq3 > >>>> Levels: seq1 seq2 seq3 > >>>>> print( my.rd[[ 'name' ]] ) > >>>> [1] seq1 seq2 seq3 > >>>> Levels: seq1 seq2 seq3 > >>>>> print( my.gr <http: my.gr=""> [[ > 'name' ]] ) > >>>> Error in my.gr <http: my.gr=""> > [["name"]] : missing '[[' method for Sequence class > GRanges > >>>> > >>>> # Extract the name field from each of > these objects using $ > >>>> > >>>>> print( my.df$'name' ) > >>>> [1] seq1 seq2 seq3 > >>>> Levels: seq1 seq2 seq3 > >>>>> print( my.rd$'name' ) > >>>> [1] seq1 seq2 seq3 > >>>> Levels: seq1 seq2 seq3 > >>>>> print( my.gr <http: my.gr=""> $'name' > ) > >>>> Error in x[[name, exact = FALSE]] : > >>>> missing '[[' method for Sequence > class GRanges > >>>>> sessionInfo() > >>>> R version 2.10.1 (2009-12-14) > >>>> x86_64-apple-darwin9.8.0 > >>>> > >>>> locale: > >>>> [1] > en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 > >>>> > >>>> attached base packages: > >>>> [1] stats graphics grDevices utils > datasets methods base > >>>> > >>>> other attached packages: > >>>> [1] GenomicRanges_1.0.8 IRanges_1.6.15 > >>>> > -------------------------------------------------------- > >>>> This email is confidential and intended > solely for the > u...{{dropped:15}} > >>>> > >>>> > _______________________________________________ > >>>> Bioconductor mailing list > >>>> Bioconductor@stat.math.ethz.ch > >>>> > https://stat.ethz.ch/mailman/listinfo/bioconductor > >>>> Search the archives: > >>>> > http://news.gmane.org/gmane.science.biology.informatics.conductor > >>>> > >> > -------------------------------------------------------- > >> This email is confidential and intended > solely for the > u...{{dropped:12}} > >> > >> > _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@stat.math.ethz.ch > >> > https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -------------------------------------------------------- > This email is confidential and intended > solely for the...{{dropped:13}} > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > > > > [[alternative HTML version deleted]]

ADD REPLY • link 14.4 years ago Michael Lawrence ★ 11k

Login before adding your answer.