reducing RangedData

0

Entering edit mode

Robert Castelo ★ 3.4k

@rcastelo

Last seen 9 days ago

Barcelona/Universitat Pompeu Fabra

dear list, i'd like to project a set of exons on genomic space but keeping the strand information. let's assume for simplicity that no exons overlap in opposite strands so no conflicts need to be resolved regarding that case. in principle this is easy without keeping the strand using a RangeList and the reduce method. however i've been struggling for a while without success to figure out how can i "reduce" my coordinates without loosing the strand information. my guess is that to carry out the strand information i need to use a RangedData object: sta <- c(1, 1, 1) end <- c(3, 4, 5) str <- c("+", "+", "-") rd <- RangedData(IRanges(start=sta, end=end), strand=str, space=chr) rd RangedData: 3 ranges by 1 columns columns(1): strand sequences(2): chr1 chr2 and then use rdapply to reduce on each of the chromosomes but i either don't know how to directly apply reduce: params <- RDApplyParams(rd, reduce) rdapply(params) Error in function (classes, fdef, mtable) : unable to find an inherited method for function "reduce", for signature "RangedData" or i loose the strand information: params <- RDApplyParams(rd, function(rd) reduce(ranges(rd))) rdapply(params) $chr1 RangesList: 1 range 1: 1:4 $chr2 RangesList: 1 range 1: 1:5 thanks for your help! robert. sessionInfo() R version 2.9.1 (2009-06-26) i386-apple-darwin8.11.1 locale: en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rtracklayer_1.4.1 RCurl_0.98-1 [3] bitops_1.0-4.1 GOstats_2.10.0 [5] graph_1.22.2 Category_2.10.1 [7] geneplotter_1.22.0 lattice_0.17-25 [9] limma_2.18.2 bgSpJncArrayDm1.db_1.0.1 [11] org.Dm.eg.db_2.2.11 RSQLite_0.7-1 [13] DBI_0.2-4 BSgenome.Dmelanogaster.UCSC.dm1_1.0.0 [15] BSgenome_1.12.3 RColorBrewer_1.0-2 [17] Biostrings_2.12.8 IRanges_1.2.3 [19] annotate_1.22.0 AnnotationDbi_1.6.1 [21] Biobase_2.4.1 xtable_1.5-5 loaded via a namespace (and not attached): [1] genefilter_1.24.2 GO.db_2.2.11 grid_2.9.1 GSEABase_1.6.1 [5] RBGL_1.20.0 splines_2.9.1 survival_2.35-4 tools_2.9.1 [9] XML_2.5-3

GO GO • 1.5k views

ADD COMMENT • link 15.5 years ago Robert Castelo ★ 3.4k

0

Entering edit mode

Patrick Aboyoun ★ 1.6k

@patrick-aboyoun-6734

Last seen 10.6 years ago

United States

Robert, I wrote you some code that you can use below. I've seen other reduction operations where the user wanted to reduce a RangedData type object by more than one column (e.g. strand and score), so perhaps an appropriate signature for a reduce method for RangedData would be reduce(x, by, ...) If this method where written, the remaining question would be should it support reducing transformations of the remaining values columns or should non "by" columns be dropped from the output. I'm inclined towards the latter, since a reduce(x, by, valuesFUN) method wouldn't be much simpler than the rdapply method given below. Would a straight-forward reduce(x = rd, by = "strand") approach for RangedData meet your needs? > suppressMessages(library(IRanges)) > sta <- c(1, 2, 5, 6, 2, 3, 4, 5) > end <- c(3, 4, 5, 7, 2, 4, 6, 7) > str <- rep(c("+", "+", "-", "-"), 2) > chr <- rep(c("chr1", "chr2"), each = 4) > rd <- RangedData(IRanges(start=sta, end=end), strand=str, space=chr) > rd RangedData: 8 ranges by 1 columns columns(1): strand sequences(2): chr1 chr2 > > # Interesting bit starts here > reduceStranded <- function(x) { + strand <- x$strand + rngs <- ranges(x)[[1]] + ans <- IRangesList("+" = reduce(rngs[strand == "+"]), + "-" = reduce(rngs[strand == "-"])) + RangedData(unlist(ans, use.names = FALSE), + strand = rep(c("+", "-"), elementLengths(ans)), space = names(x)) + } > params <- RDApplyParams(rd, reduceStranded, reducerFun = function(x) do.call(c, unname(x))) > rdapply(params) RangedData: 4 ranges by 1 columns columns(1): strand sequences(2): chr1 chr2 > as.data.frame(rdapply(params)) space start end width strand 1 chr1 1 4 4 + 2 chr1 5 7 3 - 3 chr2 2 4 3 + 4 chr2 4 7 4 - > sessionInfo() R version 2.9.2 Patched (2009-08-24 r49420) i386-apple-darwin9.8.0 locale: C/en_US.UTF-8/C/C/C/C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] IRanges_1.2.3 Robert Castelo wrote: > dear list, > > i'd like to project a set of exons on genomic space but keeping > the strand information. let's assume for simplicity that no exons > overlap in opposite strands so no conflicts need to be resolved > regarding that case. in principle this is easy without keeping the > strand using a RangeList and the reduce method. however i've been > struggling for a while without success to figure out how can i > "reduce" my coordinates without loosing the strand information. > > my guess is that to carry out the strand information i need to > use a RangedData object: > > sta <- c(1, 1, 1) > end <- c(3, 4, 5) > str <- c("+", "+", "-") > rd <- RangedData(IRanges(start=sta, end=end), strand=str, space=chr) > rd > RangedData: 3 ranges by 1 columns > columns(1): strand > sequences(2): chr1 chr2 > > and then use rdapply to reduce on each of the chromosomes but i > either don't know how to directly apply reduce: > > params <- RDApplyParams(rd, reduce) > rdapply(params) > Error in function (classes, fdef, mtable) : > unable to find an inherited method for function "reduce", for > signature "RangedData" > > > or i loose the strand information: > > params <- RDApplyParams(rd, function(rd) reduce(ranges(rd))) > rdapply(params) > $chr1 > RangesList: 1 range > 1: 1:4 > > $chr2 > RangesList: 1 range > 1: 1:5 > > thanks for your help! > robert. > > > > sessionInfo() > R version 2.9.1 (2009-06-26) > i386-apple-darwin8.11.1 > > locale: > en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] rtracklayer_1.4.1 RCurl_0.98-1 > [3] bitops_1.0-4.1 GOstats_2.10.0 > [5] graph_1.22.2 Category_2.10.1 > [7] geneplotter_1.22.0 lattice_0.17-25 > [9] limma_2.18.2 bgSpJncArrayDm1.db_1.0.1 > [11] org.Dm.eg.db_2.2.11 RSQLite_0.7-1 > [13] DBI_0.2-4 > BSgenome.Dmelanogaster.UCSC.dm1_1.0.0 > [15] BSgenome_1.12.3 RColorBrewer_1.0-2 > [17] Biostrings_2.12.8 IRanges_1.2.3 > [19] annotate_1.22.0 AnnotationDbi_1.6.1 > [21] Biobase_2.4.1 xtable_1.5-5 > > loaded via a namespace (and not attached): > [1] genefilter_1.24.2 GO.db_2.2.11 grid_2.9.1 GSEABase_1.6.1 > [5] RBGL_1.20.0 splines_2.9.1 survival_2.35-4 tools_2.9.1 > [9] XML_2.5-3 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 15.5 years ago Patrick Aboyoun ★ 1.6k

0

Entering edit mode

Robert Castelo ★ 3.4k

@rcastelo

Last seen 9 days ago

Barcelona/Universitat Pompeu Fabra

hi Patrick, thanks a lot, the concise code snippet below is what i was looking for, i'd had written at least twice as much to do the same job :) regarding whether it would be nice to have a reduction operation with a signature that includes other stuff i think that for the particular case of the strand it is definitely useful to project (reduce in RangedData terminology) genomic coordinates on genomic space taking it into account because many of the functional elements in the genome are stranded and projecting doesn't imply always that one is not interested in the strand of the projected intervals. with respect to whether other information contained in additional columns should somehow be "reduced" i think it would suffice to enable the reduction of the "names" of a RangedData (taking, for instance, arbitrarily one name between two overlapping ranges) because those names would allow to go back to the original (un-reduced) data and pull whatever we're interested in (actually, including the strand). my two-cents about this, and thanks again for your quick help. robert. On Thu, 2009-10-22 at 00:58 -0700, Patrick Aboyoun wrote: > Robert, > I wrote you some code that you can use below. I've seen other reduction > operations where the user wanted to reduce a RangedData type object by > more than one column (e.g. strand and score), so perhaps an appropriate > signature for a reduce method for RangedData would be > > reduce(x, by, ...) > > If this method where written, the remaining question would be should it > support reducing transformations of the remaining values columns or > should non "by" columns be dropped from the output. I'm inclined towards > the latter, since a reduce(x, by, valuesFUN) method wouldn't be much > simpler than the rdapply method given below. Would a straight- forward > reduce(x = rd, by = "strand") approach for RangedData meet your needs? > > > > suppressMessages(library(IRanges)) > > sta <- c(1, 2, 5, 6, 2, 3, 4, 5) > > end <- c(3, 4, 5, 7, 2, 4, 6, 7) > > str <- rep(c("+", "+", "-", "-"), 2) > > chr <- rep(c("chr1", "chr2"), each = 4) > > rd <- RangedData(IRanges(start=sta, end=end), strand=str, space=chr) > > rd > RangedData: 8 ranges by 1 columns > columns(1): strand > sequences(2): chr1 chr2 > > > > # Interesting bit starts here > > reduceStranded <- function(x) { > + strand <- x$strand > + rngs <- ranges(x)[[1]] > + ans <- IRangesList("+" = reduce(rngs[strand == "+"]), > + "-" = reduce(rngs[strand == "-"])) > + RangedData(unlist(ans, use.names = FALSE), > + strand = rep(c("+", "-"), elementLengths(ans)), > space = names(x)) > + } > > params <- RDApplyParams(rd, reduceStranded, reducerFun = function(x) > do.call(c, unname(x))) > > rdapply(params) > RangedData: 4 ranges by 1 columns > columns(1): strand > sequences(2): chr1 chr2 > > as.data.frame(rdapply(params)) > space start end width strand > 1 chr1 1 4 4 + > 2 chr1 5 7 3 - > 3 chr2 2 4 3 + > 4 chr2 4 7 4 - > > sessionInfo() > R version 2.9.2 Patched (2009-08-24 r49420) > i386-apple-darwin9.8.0 > > locale: > C/en_US.UTF-8/C/C/C/C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] IRanges_1.2.3 > > > > > Robert Castelo wrote: > > dear list, > > > > i'd like to project a set of exons on genomic space but keeping > > the strand information. let's assume for simplicity that no exons > > overlap in opposite strands so no conflicts need to be resolved > > regarding that case. in principle this is easy without keeping the > > strand using a RangeList and the reduce method. however i've been > > struggling for a while without success to figure out how can i > > "reduce" my coordinates without loosing the strand information. > > > > my guess is that to carry out the strand information i need to > > use a RangedData object: > > > > sta <- c(1, 1, 1) > > end <- c(3, 4, 5) > > str <- c("+", "+", "-") > > rd <- RangedData(IRanges(start=sta, end=end), strand=str, space=chr) > > rd > > RangedData: 3 ranges by 1 columns > > columns(1): strand > > sequences(2): chr1 chr2 > > > > and then use rdapply to reduce on each of the chromosomes but i > > either don't know how to directly apply reduce: > > > > params <- RDApplyParams(rd, reduce) > > rdapply(params) > > Error in function (classes, fdef, mtable) : > > unable to find an inherited method for function "reduce", for > > signature "RangedData" > > > > > > or i loose the strand information: > > > > params <- RDApplyParams(rd, function(rd) reduce(ranges(rd))) > > rdapply(params) > > $chr1 > > RangesList: 1 range > > 1: 1:4 > > > > $chr2 > > RangesList: 1 range > > 1: 1:5 > > > > thanks for your help! > > robert. > > > > > > > > sessionInfo() > > R version 2.9.1 (2009-06-26) > > i386-apple-darwin8.11.1 > > > > locale: > > en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > other attached packages: > > [1] rtracklayer_1.4.1 RCurl_0.98-1 > > [3] bitops_1.0-4.1 GOstats_2.10.0 > > [5] graph_1.22.2 Category_2.10.1 > > [7] geneplotter_1.22.0 lattice_0.17-25 > > [9] limma_2.18.2 bgSpJncArrayDm1.db_1.0.1 > > [11] org.Dm.eg.db_2.2.11 RSQLite_0.7-1 > > [13] DBI_0.2-4 > > BSgenome.Dmelanogaster.UCSC.dm1_1.0.0 > > [15] BSgenome_1.12.3 RColorBrewer_1.0-2 > > [17] Biostrings_2.12.8 IRanges_1.2.3 > > [19] annotate_1.22.0 AnnotationDbi_1.6.1 > > [21] Biobase_2.4.1 xtable_1.5-5 > > > > loaded via a namespace (and not attached): > > [1] genefilter_1.24.2 GO.db_2.2.11 grid_2.9.1 GSEABase_1.6.1 > > [5] RBGL_1.20.0 splines_2.9.1 survival_2.35-4 tools_2.9.1 > > [9] XML_2.5-3 > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD COMMENT • link 15.5 years ago Robert Castelo ★ 3.4k

0

Entering edit mode

Robert, To address your case of data extraction from the original RangedData object, I would recommend using "[" with i = <<rangeslist>>, e.g. origRD[ranges(reducedRd),]. The situation becomes more involved when, for example, someone wants to sum up the "score" column values for all the ranges that overlapped. These data summary operations are open ended and don't lend themselves to simple interfaces, although we can provide targeted functions, or add arguments to the reduce function, that support specific data summary operations on ancillary columns. Patrick Robert Castelo wrote: > hi Patrick, > > thanks a lot, the concise code snippet below is what i was looking for, > i'd had written at least twice as much to do the same job :) > > regarding whether it would be nice to have a reduction operation with a > signature that includes other stuff i think that for the particular case > of the strand it is definitely useful to project (reduce in RangedData > terminology) genomic coordinates on genomic space taking it into account > because many of the functional elements in the genome are stranded and > projecting doesn't imply always that one is not interested in the strand > of the projected intervals. > > with respect to whether other information contained in additional > columns should somehow be "reduced" i think it would suffice to enable > the reduction of the "names" of a RangedData (taking, for instance, > arbitrarily one name between two overlapping ranges) because those names > would allow to go back to the original (un-reduced) data and pull > whatever we're interested in (actually, including the strand). > > my two-cents about this, and thanks again for your quick help. > > robert. > > On Thu, 2009-10-22 at 00:58 -0700, Patrick Aboyoun wrote: > >> Robert, >> I wrote you some code that you can use below. I've seen other reduction >> operations where the user wanted to reduce a RangedData type object by >> more than one column (e.g. strand and score), so perhaps an appropriate >> signature for a reduce method for RangedData would be >> >> reduce(x, by, ...) >> >> If this method where written, the remaining question would be should it >> support reducing transformations of the remaining values columns or >> should non "by" columns be dropped from the output. I'm inclined towards >> the latter, since a reduce(x, by, valuesFUN) method wouldn't be much >> simpler than the rdapply method given below. Would a straight- forward >> reduce(x = rd, by = "strand") approach for RangedData meet your needs? >> >> >> > suppressMessages(library(IRanges)) >> > sta <- c(1, 2, 5, 6, 2, 3, 4, 5) >> > end <- c(3, 4, 5, 7, 2, 4, 6, 7) >> > str <- rep(c("+", "+", "-", "-"), 2) >> > chr <- rep(c("chr1", "chr2"), each = 4) >> > rd <- RangedData(IRanges(start=sta, end=end), strand=str, space=chr) >> > rd >> RangedData: 8 ranges by 1 columns >> columns(1): strand >> sequences(2): chr1 chr2 >> > >> > # Interesting bit starts here >> > reduceStranded <- function(x) { >> + strand <- x$strand >> + rngs <- ranges(x)[[1]] >> + ans <- IRangesList("+" = reduce(rngs[strand == "+"]), >> + "-" = reduce(rngs[strand == "-"])) >> + RangedData(unlist(ans, use.names = FALSE), >> + strand = rep(c("+", "-"), elementLengths(ans)), >> space = names(x)) >> + } >> > params <- RDApplyParams(rd, reduceStranded, reducerFun = function(x) >> do.call(c, unname(x))) >> > rdapply(params) >> RangedData: 4 ranges by 1 columns >> columns(1): strand >> sequences(2): chr1 chr2 >> > as.data.frame(rdapply(params)) >> space start end width strand >> 1 chr1 1 4 4 + >> 2 chr1 5 7 3 - >> 3 chr2 2 4 3 + >> 4 chr2 4 7 4 - >> > sessionInfo() >> R version 2.9.2 Patched (2009-08-24 r49420) >> i386-apple-darwin9.8.0 >> >> locale: >> C/en_US.UTF-8/C/C/C/C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] IRanges_1.2.3 >> >> >> >> >> Robert Castelo wrote: >> >>> dear list, >>> >>> i'd like to project a set of exons on genomic space but keeping >>> the strand information. let's assume for simplicity that no exons >>> overlap in opposite strands so no conflicts need to be resolved >>> regarding that case. in principle this is easy without keeping the >>> strand using a RangeList and the reduce method. however i've been >>> struggling for a while without success to figure out how can i >>> "reduce" my coordinates without loosing the strand information. >>> >>> my guess is that to carry out the strand information i need to >>> use a RangedData object: >>> >>> sta <- c(1, 1, 1) >>> end <- c(3, 4, 5) >>> str <- c("+", "+", "-") >>> rd <- RangedData(IRanges(start=sta, end=end), strand=str, space=chr) >>> rd >>> RangedData: 3 ranges by 1 columns >>> columns(1): strand >>> sequences(2): chr1 chr2 >>> >>> and then use rdapply to reduce on each of the chromosomes but i >>> either don't know how to directly apply reduce: >>> >>> params <- RDApplyParams(rd, reduce) >>> rdapply(params) >>> Error in function (classes, fdef, mtable) : >>> unable to find an inherited method for function "reduce", for >>> signature "RangedData" >>> >>> >>> or i loose the strand information: >>> >>> params <- RDApplyParams(rd, function(rd) reduce(ranges(rd))) >>> rdapply(params) >>> $chr1 >>> RangesList: 1 range >>> 1: 1:4 >>> >>> $chr2 >>> RangesList: 1 range >>> 1: 1:5 >>> >>> thanks for your help! >>> robert. >>> >>> >>> >>> sessionInfo() >>> R version 2.9.1 (2009-06-26) >>> i386-apple-darwin8.11.1 >>> >>> locale: >>> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] rtracklayer_1.4.1 RCurl_0.98-1 >>> [3] bitops_1.0-4.1 GOstats_2.10.0 >>> [5] graph_1.22.2 Category_2.10.1 >>> [7] geneplotter_1.22.0 lattice_0.17-25 >>> [9] limma_2.18.2 bgSpJncArrayDm1.db_1.0.1 >>> [11] org.Dm.eg.db_2.2.11 RSQLite_0.7-1 >>> [13] DBI_0.2-4 >>> BSgenome.Dmelanogaster.UCSC.dm1_1.0.0 >>> [15] BSgenome_1.12.3 RColorBrewer_1.0-2 >>> [17] Biostrings_2.12.8 IRanges_1.2.3 >>> [19] annotate_1.22.0 AnnotationDbi_1.6.1 >>> [21] Biobase_2.4.1 xtable_1.5-5 >>> >>> loaded via a namespace (and not attached): >>> [1] genefilter_1.24.2 GO.db_2.2.11 grid_2.9.1 GSEABase_1.6.1 >>> [5] RBGL_1.20.0 splines_2.9.1 survival_2.35-4 tools_2.9.1 >>> [9] XML_2.5-3 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> > >

ADD REPLY • link 15.5 years ago Patrick Aboyoun ★ 1.6k

0

Entering edit mode

Robert, I just added a reduce method for RangedData objects to BioC 2.5 for R 2.10, which is the current development branch and will become the release branch on October 28. The signature for this method is reduce(x, by, with.inframe.attrib = FALSE) where by specifies the grouping variables from the values columns. So if you just have strand in your values columns, a simple reduce(rd) call will do. If you have columns in addition to your strand column that you don't want to use for grouping during the reduction operation, the call would become reduce(rd, "strand"). I also recently changed the show methods for many IRanges objects to display the first 10 or so values so the user feels more connected to their data. Here is what the code looks like now: > suppressMessages(library(IRanges)) > sta <- c(1, 2, 5, 6, 2, 3, 4, 5) > end <- c(3, 4, 5, 7, 2, 4, 6, 7) > str <- rep(c("+", "+", "-", "-"), 2) > chr <- rep(c("chr1", "chr2"), each = 4) > rd <- RangedData(IRanges(start=sta, end=end), strand=str, space=chr) > reduce(rd) RangedData with 4 rows and 1 value column across 2 spaces space ranges | strand <character> <iranges> | <character> 1 chr1 [5, 7] | - 2 chr1 [1, 4] | + 3 chr2 [4, 7] | - 4 chr2 [2, 4] | + > # if strand is a factor variable, the results are > reduce(RangedData(IRanges(start=sta, end=end), strand=factor(str), space=chr)) RangedData with 4 rows and 1 value column across 2 spaces space ranges | strand <character> <iranges> | <factor> 1 chr1 [5, 7] | - 2 chr1 [1, 4] | + 3 chr2 [4, 7] | - 4 chr2 [2, 4] | + > sessionInfo() R version 2.10.0 RC (2009-10-18 r50160) i386-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] IRanges_1.3.97 Patrick Aboyoun wrote: > Robert, > To address your case of data extraction from the original RangedData > object, I would recommend using "[" with i = <<rangeslist>>, e.g. > origRD[ranges(reducedRd),]. > > The situation becomes more involved when, for example, someone wants > to sum up the "score" column values for all the ranges that > overlapped. These data summary operations are open ended and don't > lend themselves to simple interfaces, although we can provide targeted > functions, or add arguments to the reduce function, that support > specific data summary operations on ancillary columns. > > > Patrick > > > Robert Castelo wrote: >> hi Patrick, >> >> thanks a lot, the concise code snippet below is what i was looking for, >> i'd had written at least twice as much to do the same job :) >> >> regarding whether it would be nice to have a reduction operation with a >> signature that includes other stuff i think that for the particular case >> of the strand it is definitely useful to project (reduce in RangedData >> terminology) genomic coordinates on genomic space taking it into account >> because many of the functional elements in the genome are stranded and >> projecting doesn't imply always that one is not interested in the strand >> of the projected intervals. >> >> with respect to whether other information contained in additional >> columns should somehow be "reduced" i think it would suffice to enable >> the reduction of the "names" of a RangedData (taking, for instance, >> arbitrarily one name between two overlapping ranges) because those names >> would allow to go back to the original (un-reduced) data and pull >> whatever we're interested in (actually, including the strand). >> >> my two-cents about this, and thanks again for your quick help. >> >> robert. >> >> On Thu, 2009-10-22 at 00:58 -0700, Patrick Aboyoun wrote: >> >>> Robert, >>> I wrote you some code that you can use below. I've seen other >>> reduction operations where the user wanted to reduce a RangedData >>> type object by more than one column (e.g. strand and score), so >>> perhaps an appropriate signature for a reduce method for RangedData >>> would be >>> >>> reduce(x, by, ...) >>> >>> If this method where written, the remaining question would be should >>> it support reducing transformations of the remaining values columns >>> or should non "by" columns be dropped from the output. I'm inclined >>> towards the latter, since a reduce(x, by, valuesFUN) method wouldn't >>> be much simpler than the rdapply method given below. Would a >>> straight-forward reduce(x = rd, by = "strand") approach for >>> RangedData meet your needs? >>> >>> >>> > suppressMessages(library(IRanges)) >>> > sta <- c(1, 2, 5, 6, 2, 3, 4, 5) >>> > end <- c(3, 4, 5, 7, 2, 4, 6, 7) >>> > str <- rep(c("+", "+", "-", "-"), 2) >>> > chr <- rep(c("chr1", "chr2"), each = 4) >>> > rd <- RangedData(IRanges(start=sta, end=end), strand=str, space=chr) >>> > rd >>> RangedData: 8 ranges by 1 columns >>> columns(1): strand >>> sequences(2): chr1 chr2 >>> > >>> > # Interesting bit starts here >>> > reduceStranded <- function(x) { >>> + strand <- x$strand >>> + rngs <- ranges(x)[[1]] >>> + ans <- IRangesList("+" = reduce(rngs[strand == "+"]), >>> + "-" = reduce(rngs[strand == "-"])) >>> + RangedData(unlist(ans, use.names = FALSE), >>> + strand = rep(c("+", "-"), elementLengths(ans)), >>> space = names(x)) >>> + } > params <- RDApplyParams(rd, reduceStranded, reducerFun = >>> function(x) do.call(c, unname(x))) >>> > rdapply(params) >>> RangedData: 4 ranges by 1 columns >>> columns(1): strand >>> sequences(2): chr1 chr2 >>> > as.data.frame(rdapply(params)) >>> space start end width strand >>> 1 chr1 1 4 4 + >>> 2 chr1 5 7 3 - >>> 3 chr2 2 4 3 + >>> 4 chr2 4 7 4 - >>> > sessionInfo() >>> R version 2.9.2 Patched (2009-08-24 r49420) >>> i386-apple-darwin9.8.0 >>> >>> locale: >>> C/en_US.UTF-8/C/C/C/C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> other attached packages: >>> [1] IRanges_1.2.3 >>> >>> >>> >>> >>> Robert Castelo wrote: >>> >>>> dear list, >>>> >>>> i'd like to project a set of exons on genomic space but keeping >>>> the strand information. let's assume for simplicity that no exons >>>> overlap in opposite strands so no conflicts need to be resolved >>>> regarding that case. in principle this is easy without keeping the >>>> strand using a RangeList and the reduce method. however i've been >>>> struggling for a while without success to figure out how can i >>>> "reduce" my coordinates without loosing the strand information. >>>> >>>> my guess is that to carry out the strand information i need to >>>> use a RangedData object: >>>> >>>> sta <- c(1, 1, 1) >>>> end <- c(3, 4, 5) >>>> str <- c("+", "+", "-") >>>> rd <- RangedData(IRanges(start=sta, end=end), strand=str, space=chr) >>>> rd >>>> RangedData: 3 ranges by 1 columns >>>> columns(1): strand >>>> sequences(2): chr1 chr2 >>>> >>>> and then use rdapply to reduce on each of the chromosomes but i >>>> either don't know how to directly apply reduce: >>>> >>>> params <- RDApplyParams(rd, reduce) >>>> rdapply(params) >>>> Error in function (classes, fdef, mtable) : >>>> unable to find an inherited method for function "reduce", for >>>> signature "RangedData" >>>> >>>> >>>> or i loose the strand information: >>>> >>>> params <- RDApplyParams(rd, function(rd) reduce(ranges(rd))) >>>> rdapply(params) >>>> $chr1 >>>> RangesList: 1 range >>>> 1: 1:4 >>>> >>>> $chr2 >>>> RangesList: 1 range >>>> 1: 1:5 >>>> >>>> thanks for your help! >>>> robert. >>>> >>>> >>>> >>>> sessionInfo() >>>> R version 2.9.1 (2009-06-26) >>>> i386-apple-darwin8.11.1 >>>> >>>> locale: >>>> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] rtracklayer_1.4.1 RCurl_0.98-1 >>>> [3] bitops_1.0-4.1 GOstats_2.10.0 >>>> [5] graph_1.22.2 Category_2.10.1 >>>> [7] geneplotter_1.22.0 lattice_0.17-25 >>>> [9] limma_2.18.2 bgSpJncArrayDm1.db_1.0.1 >>>> [11] org.Dm.eg.db_2.2.11 RSQLite_0.7-1 >>>> [13] DBI_0.2-4 >>>> BSgenome.Dmelanogaster.UCSC.dm1_1.0.0 >>>> [15] BSgenome_1.12.3 RColorBrewer_1.0-2 >>>> [17] Biostrings_2.12.8 IRanges_1.2.3 >>>> [19] annotate_1.22.0 AnnotationDbi_1.6.1 >>>> [21] Biobase_2.4.1 xtable_1.5-5 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] genefilter_1.24.2 GO.db_2.2.11 grid_2.9.1 >>>> GSEABase_1.6.1 >>>> [5] RBGL_1.20.0 splines_2.9.1 survival_2.35-4 tools_2.9.1 >>>> [9] XML_2.5-3 >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 15.5 years ago Patrick Aboyoun ★ 1.6k

0

Entering edit mode

Robert Castelo ★ 3.4k

@rcastelo

Last seen 9 days ago

Barcelona/Universitat Pompeu Fabra

Patrick, it looks great to me, and i think is going to be useful to many people. thanks for handling this issue so quickly. best, robert. On Sat, 24 Oct 2009 07:09:07 +0200, Patrick Aboyoun <paboyoun at="" fhcrc.org=""> wrote: > Robert, > I just added a reduce method for RangedData objects to BioC 2.5 for R > 2.10, which is the current development branch and will become the > release branch on October 28. The signature for this method is > > reduce(x, by, with.inframe.attrib = FALSE) > > where by specifies the grouping variables from the values columns. So if > you just have strand in your values columns, a simple reduce(rd) call > will do. If you have columns in addition to your strand column that you > don't want to use for grouping during the reduction operation, the call > would become reduce(rd, "strand"). I also recently changed the show > methods for many IRanges objects to display the first 10 or so values so > the user feels more connected to their data. Here is what the code looks > like now: > > > suppressMessages(library(IRanges)) > > sta <- c(1, 2, 5, 6, 2, 3, 4, 5) > > end <- c(3, 4, 5, 7, 2, 4, 6, 7) > > str <- rep(c("+", "+", "-", "-"), 2) > > chr <- rep(c("chr1", "chr2"), each = 4) > > rd <- RangedData(IRanges(start=sta, end=end), strand=str, space=chr) > > reduce(rd) > RangedData with 4 rows and 1 value column across 2 spaces > space ranges | strand > <character> <iranges> | <character> > 1 chr1 [5, 7] | - > 2 chr1 [1, 4] | + > 3 chr2 [4, 7] | - > 4 chr2 [2, 4] | + > > # if strand is a factor variable, the results are > > reduce(RangedData(IRanges(start=sta, end=end), strand=factor(str), > space=chr)) > RangedData with 4 rows and 1 value column across 2 spaces > space ranges | strand > <character> <iranges> | <factor> > 1 chr1 [5, 7] | - > 2 chr1 [1, 4] | + > 3 chr2 [4, 7] | - > 4 chr2 [2, 4] | + > > sessionInfo() > R version 2.10.0 RC (2009-10-18 r50160) > i386-apple-darwin9.8.0 > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > other attached packages: > [1] IRanges_1.3.97 > > > > Patrick Aboyoun wrote: >> Robert, >> To address your case of data extraction from the original RangedData >> object, I would recommend using "[" with i = <<rangeslist>>, e.g. >> origRD[ranges(reducedRd),]. >> >> The situation becomes more involved when, for example, someone wants to >> sum up the "score" column values for all the ranges that overlapped. >> These data summary operations are open ended and don't lend themselves >> to simple interfaces, although we can provide targeted functions, or >> add arguments to the reduce function, that support specific data >> summary operations on ancillary columns. >> >> >> Patrick >> >> >> Robert Castelo wrote: >>> hi Patrick, >>> >>> thanks a lot, the concise code snippet below is what i was looking for, >>> i'd had written at least twice as much to do the same job :) >>> >>> regarding whether it would be nice to have a reduction operation with a >>> signature that includes other stuff i think that for the particular >>> case >>> of the strand it is definitely useful to project (reduce in RangedData >>> terminology) genomic coordinates on genomic space taking it into >>> account >>> because many of the functional elements in the genome are stranded and >>> projecting doesn't imply always that one is not interested in the >>> strand >>> of the projected intervals. >>> >>> with respect to whether other information contained in additional >>> columns should somehow be "reduced" i think it would suffice to enable >>> the reduction of the "names" of a RangedData (taking, for instance, >>> arbitrarily one name between two overlapping ranges) because those >>> names >>> would allow to go back to the original (un-reduced) data and pull >>> whatever we're interested in (actually, including the strand). >>> >>> my two-cents about this, and thanks again for your quick help. >>> >>> robert. >>> >>> On Thu, 2009-10-22 at 00:58 -0700, Patrick Aboyoun wrote: >>> >>>> Robert, >>>> I wrote you some code that you can use below. I've seen other >>>> reduction operations where the user wanted to reduce a RangedData >>>> type object by more than one column (e.g. strand and score), so >>>> perhaps an appropriate signature for a reduce method for RangedData >>>> would be >>>> >>>> reduce(x, by, ...) >>>> >>>> If this method where written, the remaining question would be should >>>> it support reducing transformations of the remaining values columns >>>> or should non "by" columns be dropped from the output. I'm inclined >>>> towards the latter, since a reduce(x, by, valuesFUN) method wouldn't >>>> be much simpler than the rdapply method given below. Would a >>>> straight-forward reduce(x = rd, by = "strand") approach for >>>> RangedData meet your needs? >>>> >>>> >>>> > suppressMessages(library(IRanges)) >>>> > sta <- c(1, 2, 5, 6, 2, 3, 4, 5) >>>> > end <- c(3, 4, 5, 7, 2, 4, 6, 7) >>>> > str <- rep(c("+", "+", "-", "-"), 2) >>>> > chr <- rep(c("chr1", "chr2"), each = 4) >>>> > rd <- RangedData(IRanges(start=sta, end=end), strand=str, >>>> space=chr) >>>> > rd >>>> RangedData: 8 ranges by 1 columns >>>> columns(1): strand >>>> sequences(2): chr1 chr2 >>>> > >>>> > # Interesting bit starts here >>>> > reduceStranded <- function(x) { >>>> + strand <- x$strand >>>> + rngs <- ranges(x)[[1]] >>>> + ans <- IRangesList("+" = reduce(rngs[strand == "+"]), >>>> + "-" = reduce(rngs[strand == "-"])) >>>> + RangedData(unlist(ans, use.names = FALSE), >>>> + strand = rep(c("+", "-"), elementLengths(ans)), >>>> space = names(x)) >>>> + } > params <- RDApplyParams(rd, reduceStranded, reducerFun = >>>> function(x) do.call(c, unname(x))) >>>> > rdapply(params) >>>> RangedData: 4 ranges by 1 columns >>>> columns(1): strand >>>> sequences(2): chr1 chr2 >>>> > as.data.frame(rdapply(params)) >>>> space start end width strand >>>> 1 chr1 1 4 4 + >>>> 2 chr1 5 7 3 - >>>> 3 chr2 2 4 3 + >>>> 4 chr2 4 7 4 - >>>> > sessionInfo() >>>> R version 2.9.2 Patched (2009-08-24 r49420) >>>> i386-apple-darwin9.8.0 >>>> >>>> locale: >>>> C/en_US.UTF-8/C/C/C/C >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods >>>> base other attached packages: >>>> [1] IRanges_1.2.3 >>>> >>>> >>>> >>>> >>>> Robert Castelo wrote: >>>> >>>>> dear list, >>>>> >>>>> i'd like to project a set of exons on genomic space but keeping >>>>> the strand information. let's assume for simplicity that no exons >>>>> overlap in opposite strands so no conflicts need to be resolved >>>>> regarding that case. in principle this is easy without keeping the >>>>> strand using a RangeList and the reduce method. however i've been >>>>> struggling for a while without success to figure out how can i >>>>> "reduce" my coordinates without loosing the strand information. >>>>> >>>>> my guess is that to carry out the strand information i need to >>>>> use a RangedData object: >>>>> >>>>> sta <- c(1, 1, 1) >>>>> end <- c(3, 4, 5) >>>>> str <- c("+", "+", "-") >>>>> rd <- RangedData(IRanges(start=sta, end=end), strand=str, space=chr) >>>>> rd >>>>> RangedData: 3 ranges by 1 columns >>>>> columns(1): strand >>>>> sequences(2): chr1 chr2 >>>>> >>>>> and then use rdapply to reduce on each of the chromosomes but i >>>>> either don't know how to directly apply reduce: >>>>> >>>>> params <- RDApplyParams(rd, reduce) >>>>> rdapply(params) >>>>> Error in function (classes, fdef, mtable) : >>>>> unable to find an inherited method for function "reduce", for >>>>> signature "RangedData" >>>>> >>>>> >>>>> or i loose the strand information: >>>>> >>>>> params <- RDApplyParams(rd, function(rd) reduce(ranges(rd))) >>>>> rdapply(params) >>>>> $chr1 >>>>> RangesList: 1 range >>>>> 1: 1:4 >>>>> >>>>> $chr2 >>>>> RangesList: 1 range >>>>> 1: 1:5 >>>>> >>>>> thanks for your help! >>>>> robert. >>>>> >>>>> >>>>> >>>>> sessionInfo() >>>>> R version 2.9.1 (2009-06-26) >>>>> i386-apple-darwin8.11.1 >>>>> >>>>> locale: >>>>> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >>>>> >>>>> attached base packages: >>>>> [1] stats graphics grDevices utils datasets methods base >>>>> >>>>> other attached packages: >>>>> [1] rtracklayer_1.4.1 RCurl_0.98-1 >>>>> [3] bitops_1.0-4.1 GOstats_2.10.0 >>>>> [5] graph_1.22.2 Category_2.10.1 >>>>> [7] geneplotter_1.22.0 lattice_0.17-25 >>>>> [9] limma_2.18.2 bgSpJncArrayDm1.db_1.0.1 >>>>> [11] org.Dm.eg.db_2.2.11 RSQLite_0.7-1 >>>>> [13] DBI_0.2-4 >>>>> BSgenome.Dmelanogaster.UCSC.dm1_1.0.0 >>>>> [15] BSgenome_1.12.3 RColorBrewer_1.0-2 >>>>> [17] Biostrings_2.12.8 IRanges_1.2.3 >>>>> [19] annotate_1.22.0 AnnotationDbi_1.6.1 >>>>> [21] Biobase_2.4.1 xtable_1.5-5 >>>>> >>>>> loaded via a namespace (and not attached): >>>>> [1] genefilter_1.24.2 GO.db_2.2.11 grid_2.9.1 >>>>> GSEABase_1.6.1 >>>>> [5] RBGL_1.20.0 splines_2.9.1 survival_2.35-4 tools_2.9.1 >>>>> [9] XML_2.5-3 >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at stat.math.ethz.ch >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>> >>> >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 15.5 years ago Robert Castelo ★ 3.4k

Login before adding your answer.