speed of runmed on SimpleRleList

0

Entering edit mode

Janet Young ▴ 740

@janet-young-2360

Last seen 5.5 years ago

Fred Hutchinson Cancer Research Center,…

Hi there, I've been using runmean on some coverage objects - nice. Runmed also seems useful, but also seems very slow (oddly so) - I'm wondering whether there's some easy improvement could be made there? Same issue with devel version and an older version. All should be explained in the code below (I hope). thanks very much, Janet ------------------------------------------------------------------- Dr. Janet Young Malik lab Fred Hutchinson Cancer Research Center 1100 Fairview Avenue N., C3-168, P.O. Box 19024, Seattle, WA 98109-1024, USA. tel: (206) 667 1471 fax: (206) 667 6524 email: jayoung ...at... fhcrc.org ------------------------------------------------------------------- library(GenomicRanges) ### a small GRanges object (example from ?GRanges) seqinfo <- Seqinfo(paste0("chr", 1:3), c(1000, 2000, 1500), NA, "mock1") gr2 <-GRanges(seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), ranges = IRanges(1:10, width = 10:1, names = head(letters,10)), strand = Rle(strand(c("-", "+", "*", "+", "-")),c(1, 2, 2, 3, 2)), score = 1:10, GC = seq(1, 0, length=10), seqinfo=seqinfo) gr2 cov <- coverage(gr2) ### runmed is slow! (for you Hutchies: this is on a rhino machine) ### I'm really trying to run this on some much bigger objects (whole genome coverage), where the slowness is more of an issue. system.time(runmed(cov, 51)) # user system elapsed # 1.120 0.016 1.518 ### runmean is fast system.time(runmean(cov, 51)) # user system elapsed # 0.008 0.000 0.005 sessionInfo() R Under development (unstable) (2012-10-03 r60868) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] GenomicRanges_1.11.29 IRanges_1.17.32 BiocGenerics_0.5.6 loaded via a namespace (and not attached): [1] stats4_2.16.0 ## see a similar issue on my Mac, using older R R version 2.15.1 (2012-06-22) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicRanges_1.10.1 IRanges_1.16.2 BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] parallel_2.15.1 stats4_2.15.1

Coverage Cancer Coverage Cancer • 1.4k views

ADD COMMENT • link updated 12.2 years ago by Hervé Pagès 16k • written 12.2 years ago by Janet Young ▴ 740

0

Entering edit mode

Hervé Pagès 16k

@herve-pages-1542

Last seen 1 day ago

Seattle, WA, United States

Hi Janet, The culprit seems to be the call to smoothEnds() at the end of the "runmed" method for Rle objects: > selectMethod("runmed", "Rle") Method Definition: function (x, k, endrule = c("median", "keep", "drop", "constant"), algorithm = NULL, print.level = 0) { if (!all(is.finite(as.vector(x)))) stop("NA/NaN/Inf not supported in runmed,Rle-method") endrule <- match.arg(endrule) n <- length(x) k <- normargRunK(k = k, n = n, endrule = endrule) i <- (k + 1L)%/%2L ans <- runq(x, k = k, i = i) if (endrule == "constant") { runLength(ans)[1L] <- runLength(ans)[1L] + (i - 1L) runLength(ans)[nrun(ans)] <- runLength(ans)[nrun(ans)] + (i - 1L) } else if (endrule != "drop") { ans <- c(head(x, i - 1L), ans, tail(x, i - 1L)) if (endrule == "median") { ans <- smoothEnds(ans, k = k) } } ans } Not too surprisingly, the complexity of smoothEnds() doesn't depend on the length of its first arg: > system.time(smoothEnds(Rle(0L, 10), k=5)) user system elapsed 0.028 0.000 0.026 > system.time(smoothEnds(Rle(0L, 10000), k=5)) user system elapsed 0.028 0.000 0.026 but is linear w.r.t to the value of its 2nd ag: > system.time(smoothEnds(Rle(0L, 10000), k=51)) user system elapsed 0.280 0.000 0.282 > system.time(smoothEnds(Rle(0L, 10000), k=511)) user system elapsed 2.836 0.000 2.841 In other words, runmed() will be pretty fast on a full genome coverage, but you will pay an additional fixed price for each chromosome in the genome, and this fixed priced only depends on the length of the median window. Note that the "smoothEnds" method for Rle objects is just re-using the exact same code as stats::smoothEnds(), which is not very efficient on Rle's. So we'll look into ways to improve this. Cheers, H. On 02/15/2013 12:27 PM, Janet Young wrote: > Hi there, > > I've been using runmean on some coverage objects - nice. Runmed also seems useful, but also seems very slow (oddly so) - I'm wondering whether there's some easy improvement could be made there? Same issue with devel version and an older version. All should be explained in the code below (I hope). > > thanks very much, > > Janet > > ------------------------------------------------------------------- > > Dr. Janet Young > > Malik lab > > Fred Hutchinson Cancer Research Center > 1100 Fairview Avenue N., C3-168, > P.O. Box 19024, Seattle, WA 98109-1024, USA. > > tel: (206) 667 1471 fax: (206) 667 6524 > email: jayoung ...at... fhcrc.org > > ------------------------------------------------------------------- > > > > library(GenomicRanges) > > ### a small GRanges object (example from ?GRanges) > seqinfo <- Seqinfo(paste0("chr", 1:3), c(1000, 2000, 1500), NA, "mock1") > gr2 <-GRanges(seqnames = > Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), > ranges = IRanges(1:10, width = 10:1, names = head(letters,10)), > strand = Rle(strand(c("-", "+", "*", "+", "-")),c(1, 2, 2, 3, 2)), > score = 1:10, > GC = seq(1, 0, length=10), > seqinfo=seqinfo) > gr2 > > cov <- coverage(gr2) > > ### runmed is slow! (for you Hutchies: this is on a rhino machine) > ### I'm really trying to run this on some much bigger objects (whole genome coverage), where the slowness is more of an issue. > system.time(runmed(cov, 51)) > # user system elapsed > # 1.120 0.016 1.518 > > ### runmean is fast > system.time(runmean(cov, 51)) > # user system elapsed > # 0.008 0.000 0.005 > > sessionInfo() > > > R Under development (unstable) (2012-10-03 r60868) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] GenomicRanges_1.11.29 IRanges_1.17.32 BiocGenerics_0.5.6 > > loaded via a namespace (and not attached): > [1] stats4_2.16.0 > > > ## see a similar issue on my Mac, using older R > > R version 2.15.1 (2012-06-22) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] GenomicRanges_1.10.1 IRanges_1.16.2 BiocGenerics_0.4.0 > > loaded via a namespace (and not attached): > [1] parallel_2.15.1 stats4_2.15.1 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

ADD COMMENT • link 12.2 years ago Hervé Pagès 16k

0

Entering edit mode

Hi Herve (and Valerie), Interesting - thanks for looking into this. Funny that the ends make such a large difference, but I think I understand. Hmm - I guess that issue would go away if I use "drop" - that should be fine, as I don't care about medians in the partial windows at the start and end of each chromosome (although I do want to make sure I get the windows lined up correctly, but I can probably just add a windowsize/2 constant to each position). I found today that the timing issues are actually made less relevant for me because of memory issues - I'm finding that runmean and runmed give me way more information than I need, and they create objects that are huge in memory, because they're reporting at 1bp resolution. For my analysis, I don't need windows spaced by 1bp - e.g. I can look at 5kb windows sliding along the genome in 500bp increments. I've cooked up some code that seems to work (using Views and aggregate), but I wonder whether in the long run there could be some kind of built-in solution in IRanges. I know it would probably have to return different object types, to be able to keep track of window positions/sizes, but it seems like something a lot of folks will want to do (e.g. look at coverage over the genome, smoothing in some kind of running window, but not necessarily every 1bp). Does that sound doable? (maybe I'm missing some existing solution?) thanks, Janet On Feb 19, 2013, at 3:51 PM, Hervé Pagès wrote: > Hi Janet, > > The culprit seems to be the call to smoothEnds() at the end of the > "runmed" method for Rle objects: > > > selectMethod("runmed", "Rle") > Method Definition: > > function (x, k, endrule = c("median", "keep", "drop", "constant"), > algorithm = NULL, print.level = 0) > { > if (!all(is.finite(as.vector(x)))) > stop("NA/NaN/Inf not supported in runmed,Rle-method") > endrule <- match.arg(endrule) > n <- length(x) > k <- normargRunK(k = k, n = n, endrule = endrule) > i <- (k + 1L)%/%2L > ans <- runq(x, k = k, i = i) > if (endrule == "constant") { > runLength(ans)[1L] <- runLength(ans)[1L] + (i - 1L) > runLength(ans)[nrun(ans)] <- runLength(ans)[nrun(ans)] + > (i - 1L) > } > else if (endrule != "drop") { > ans <- c(head(x, i - 1L), ans, tail(x, i - 1L)) > if (endrule == "median") { > ans <- smoothEnds(ans, k = k) > } > } > ans > } > > Not too surprisingly, the complexity of smoothEnds() doesn't depend on > the length of its first arg: > > > system.time(smoothEnds(Rle(0L, 10), k=5)) > user system elapsed > 0.028 0.000 0.026 > > system.time(smoothEnds(Rle(0L, 10000), k=5)) > user system elapsed > 0.028 0.000 0.026 > > but is linear w.r.t to the value of its 2nd ag: > > > system.time(smoothEnds(Rle(0L, 10000), k=51)) > user system elapsed > 0.280 0.000 0.282 > > system.time(smoothEnds(Rle(0L, 10000), k=511)) > user system elapsed > 2.836 0.000 2.841 > > In other words, runmed() will be pretty fast on a full genome > coverage, but you will pay an additional fixed price for each > chromosome in the genome, and this fixed priced only depends > on the length of the median window. > > Note that the "smoothEnds" method for Rle objects is just > re-using the exact same code as stats::smoothEnds(), which is not > very efficient on Rle's. So we'll look into ways to improve this. > > Cheers, > H. > > > On 02/15/2013 12:27 PM, Janet Young wrote: >> Hi there, >> >> I've been using runmean on some coverage objects - nice. Runmed also seems useful, but also seems very slow (oddly so) - I'm wondering whether there's some easy improvement could be made there? Same issue with devel version and an older version. All should be explained in the code below (I hope). >> >> thanks very much, >> >> Janet >> >> ------------------------------------------------------------------- >> >> Dr. Janet Young >> >> Malik lab >> >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Avenue N., C3-168, >> P.O. Box 19024, Seattle, WA 98109-1024, USA. >> >> tel: (206) 667 1471 fax: (206) 667 6524 >> email: jayoung ...at... fhcrc.org >> >> ------------------------------------------------------------------- >> >> >> >> library(GenomicRanges) >> >> ### a small GRanges object (example from ?GRanges) >> seqinfo <- Seqinfo(paste0("chr", 1:3), c(1000, 2000, 1500), NA, "mock1") >> gr2 <-GRanges(seqnames = >> Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), >> ranges = IRanges(1:10, width = 10:1, names = head(letters,10)), >> strand = Rle(strand(c("-", "+", "*", "+", "-")),c(1, 2, 2, 3, 2)), >> score = 1:10, >> GC = seq(1, 0, length=10), >> seqinfo=seqinfo) >> gr2 >> >> cov <- coverage(gr2) >> >> ### runmed is slow! (for you Hutchies: this is on a rhino machine) >> ### I'm really trying to run this on some much bigger objects (whole genome coverage), where the slowness is more of an issue. >> system.time(runmed(cov, 51)) >> # user system elapsed >> # 1.120 0.016 1.518 >> >> ### runmean is fast >> system.time(runmean(cov, 51)) >> # user system elapsed >> # 0.008 0.000 0.005 >> >> sessionInfo() >> >> >> R Under development (unstable) (2012-10-03 r60868) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] GenomicRanges_1.11.29 IRanges_1.17.32 BiocGenerics_0.5.6 >> >> loaded via a namespace (and not attached): >> [1] stats4_2.16.0 >> >> >> ## see a similar issue on my Mac, using older R >> >> R version 2.15.1 (2012-06-22) >> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] GenomicRanges_1.10.1 IRanges_1.16.2 BiocGenerics_0.4.0 >> >> loaded via a namespace (and not attached): >> [1] parallel_2.15.1 stats4_2.15.1 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 [[alternative HTML version deleted]]

ADD REPLY • link 12.2 years ago Janet Young ▴ 740

0

Entering edit mode

Hi Janet, On 02/19/2013 06:10 PM, Janet Young wrote: > Hi Herve (and Valerie), > > Interesting - thanks for looking into this. Funny that the ends make > such a large difference, but I think I understand. It's because we are re-using stats::smoothEnds implementation, which contains things like 'for (i in 3:k) {...}', and therefore is inefficient when 'k' is not very small. > Hmm - I guess that > issue would go away if I use "drop" - that should be fine, as I don't > care about medians in the partial windows at the start and end of each > chromosome (although I do want to make sure I get the windows lined up > correctly, but I can probably just add a windowsize/2 constant to each > position). If you don't care about the truncated windows at the start and end of each chromosome, but do want to preserve the windows lined up correctly, you can use endrule="keep" or endrule="constant". I don't know what they do exactly with the truncated windows (it doesn't matter), but, unlike endrule="drop", they preserve the length of the input. And they don't have the cost of endrule="median". > > I found today that the timing issues are actually made less relevant for > me because of memory issues - I'm finding that runmean and runmed give > me way more information than I need, and they create objects that are > huge in memory, because they're reporting at 1bp resolution. An operation like runmean() almost systematically returns an Rle that is bigger in memory than its input, because it increases the nb of runs. One important number to keep an eye on is the mean of the run lengths (i.e. mean(runLength(x))). If it falls below the threshold of 2, then the integer-Rle representation is not worth it: an ordinary integer vector will use less space in memory. For a numeric-Rle, the threshold is 3/2. Unfortunately, runmean() tends to make that number drop badly: x <- Rle(sample(10L, 150, replace=TRUE), sample(50L, 150, replace=TRUE)) > mean(runLength(x)) [1] 27.58462 > mean(runLength(runmean(x, k=5))) [1] 5.843393 > mean(runLength(runmean(x, k=15))) [1] 2.228322 > mean(runLength(runmean(x, k=25))) [1] 1.537333 > mean(runLength(runmean(x, k=35))) [1] 1.28976 With k=35, keeping the result in a numeric-Rle is no more worth it. > For my > analysis, I don't need windows spaced by 1bp - e.g. I can look at 5kb > windows sliding along the genome in 500bp increments. > > I've cooked up some code that seems to work (using Views and aggregate), > but I wonder whether in the long run there could be some kind of > built-in solution in IRanges. I know it would probably have to return > different object types, to be able to keep track of window > positions/sizes, but it seems like something a lot of folks will want to > do (e.g. look at coverage over the genome, smoothing in some kind of > running window, but not necessarily every 1bp). Does that sound doable? > (maybe I'm missing some existing solution?) I don't think we have a built-in solution for this at the moment but it would certainly be doable. There are probably many ways to go but here is one that I find appealing. The starting point would be to have an easy way to switch between the 2 following representations of a numeric function F defined along a genome (e.g. coverage): (1) RleList representation (2) A GRanges made of fixed-width bins (that cover the entire genome), and with a metadata column that contains the average value of F for each bin. Because of the averaging, going from (1) to (2) would of course lose some information but would potentially reduce memory usage a lot. More precisely, (2) offers the advantage of using an amount of memory that is under control because it only depends on the width of the bins. For example, for a function F defined along the Human genome, if the bins are of width 500 (like in your case), the GRanges representation would have about 6 million ranges and occupy less than 100Mb. With the RleList representation, it can vary a lot: from a few hundreds of bytes (in the best case i.e. when coverage is constant), to 24Gb if all the runs are of length 1 (worst case). Once you have (2), you could do all the operations you want (e.g. runmean, runmed, etc...,) by operating directly on the metadata col. Unlike with the RleList, the size of the GRanges would not grow while you do this. Then, when you are done, you could always come back to (1) if you really miss the RleList representation. This RleList would actually be quite small: at most 72Mb in your use case (i.e. bins of width 500 on the Human genome). Of course this approach is only suitable if the user accepts the loss of information introduced by the first step i.e. by the conversion from (1) to (2). The only things we would need to add to support this approach are the functions to switch between (1) and (2). Would that be of any interest? Cheers, H. > > thanks, > > Janet > > > > > On Feb 19, 2013, at 3:51 PM, Hervé Pagès wrote: > >> Hi Janet, >> >> The culprit seems to be the call to smoothEnds() at the end of the >> "runmed" method for Rle objects: >> >> > selectMethod("runmed", "Rle") >> Method Definition: >> >> function (x, k, endrule = c("median", "keep", "drop", "constant"), >> algorithm = NULL, print.level = 0) >> { >> if (!all(is.finite(as.vector(x)))) >> stop("NA/NaN/Inf not supported in runmed,Rle-method") >> endrule <- match.arg(endrule) >> n <- length(x) >> k <- normargRunK(k = k, n = n, endrule = endrule) >> i <- (k + 1L)%/%2L >> ans <- runq(x, k = k, i = i) >> if (endrule == "constant") { >> runLength(ans)[1L] <- runLength(ans)[1L] + (i - 1L) >> runLength(ans)[nrun(ans)] <- runLength(ans)[nrun(ans)] + >> (i - 1L) >> } >> else if (endrule != "drop") { >> ans <- c(head(x, i - 1L), ans, tail(x, i - 1L)) >> if (endrule == "median") { >> ans <- smoothEnds(ans, k = k) >> } >> } >> ans >> } >> >> Not too surprisingly, the complexity of smoothEnds() doesn't depend on >> the length of its first arg: >> >> > system.time(smoothEnds(Rle(0L, 10), k=5)) >> user system elapsed >> 0.028 0.000 0.026 >> > system.time(smoothEnds(Rle(0L, 10000), k=5)) >> user system elapsed >> 0.028 0.000 0.026 >> >> but is linear w.r.t to the value of its 2nd ag: >> >> > system.time(smoothEnds(Rle(0L, 10000), k=51)) >> user system elapsed >> 0.280 0.000 0.282 >> > system.time(smoothEnds(Rle(0L, 10000), k=511)) >> user system elapsed >> 2.836 0.000 2.841 >> >> In other words, runmed() will be pretty fast on a full genome >> coverage, but you will pay an additional fixed price for each >> chromosome in the genome, and this fixed priced only depends >> on the length of the median window. >> >> Note that the "smoothEnds" method for Rle objects is just >> re-using the exact same code as stats::smoothEnds(), which is not >> very efficient on Rle's. So we'll look into ways to improve this. >> >> Cheers, >> H. >> >> >> On 02/15/2013 12:27 PM, Janet Young wrote: >>> Hi there, >>> >>> I've been using runmean on some coverage objects - nice. Runmed >>> also seems useful, but also seems very slow (oddly so) - I'm >>> wondering whether there's some easy improvement could be made there? >>> Same issue with devel version and an older version. All should be >>> explained in the code below (I hope). >>> >>> thanks very much, >>> >>> Janet >>> >>> ------------------------------------------------------------------- >>> >>> Dr. Janet Young >>> >>> Malik lab >>> >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Avenue N., C3-168, >>> P.O. Box 19024, Seattle, WA 98109-1024, USA. >>> >>> tel: (206) 667 1471 fax: (206) 667 6524 >>> email: jayoung ...at... fhcrc.org <http: fhcrc.org=""> >>> >>> ------------------------------------------------------------------- >>> >>> >>> >>> library(GenomicRanges) >>> >>> ### a small GRanges object (example from ?GRanges) >>> seqinfo <- Seqinfo(paste0("chr", 1:3), c(1000, 2000, 1500), NA, "mock1") >>> gr2 <-GRanges(seqnames = >>> Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), >>> ranges = IRanges(1:10, width = 10:1, names = head(letters,10)), >>> strand = Rle(strand(c("-", "+", "*", "+", "-")),c(1, 2, 2, >>> 3, 2)), >>> score = 1:10, >>> GC = seq(1, 0, length=10), >>> seqinfo=seqinfo) >>> gr2 >>> >>> cov <- coverage(gr2) >>> >>> ### runmed is slow! (for you Hutchies: this is on a rhino machine) >>> ### I'm really trying to run this on some much bigger objects (whole >>> genome coverage), where the slowness is more of an issue. >>> system.time(runmed(cov, 51)) >>> # user system elapsed >>> # 1.120 0.016 1.518 >>> >>> ### runmean is fast >>> system.time(runmean(cov, 51)) >>> # user system elapsed >>> # 0.008 0.000 0.005 >>> >>> sessionInfo() >>> >>> >>> R Under development (unstable) (2012-10-03 r60868) >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=C LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods >>> [8] base >>> >>> other attached packages: >>> [1] GenomicRanges_1.11.29 IRanges_1.17.32 BiocGenerics_0.5.6 >>> >>> loaded via a namespace (and not attached): >>> [1] stats4_2.16.0 >>> >>> >>> ## see a similar issue on my Mac, using older R >>> >>> R version 2.15.1 (2012-06-22) >>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >>> >>> locale: >>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] GenomicRanges_1.10.1 IRanges_1.16.2 BiocGenerics_0.4.0 >>> >>> loaded via a namespace (and not attached): >>> [1] parallel_2.15.1 stats4_2.15.1 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org=""> >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

ADD REPLY • link 12.2 years ago Hervé Pagès 16k

Login before adding your answer.