Dear all :
I am trying to generate plot (either curve or line graph) for benchmark result of the Bioconductor package that I developed, and intend to show the overall performance of my package respect to other software tool (implemented in C#). I profiled all my code by using profvis
package in Rstudio, but I need nice plot instead. How can I make this happen ? Any way to get this sort of plot ? Any idea ?
However, In my workflow, I implemented list of function which ultimately contributes to indicate overall performance of my packages. Briefly speaking, here is the pipeline:
read peak file -> clean data -> find overlapping -> check overlapping requirement -> first level filtration -> fisher method -> second level filtration -> export result - > visualize output -> THE END;
Edit :
I tried rbenchmark
package to produce benchmarkk result in this way :
benchmark( s1=myFunc1, s2=myFunc2, s3=myFunc3, ... s10=myFunc10, order="elapsed", replications=2 )
which gives benchmark metric respect to runtime of evaluating each function. Based on this result, how can I get nice plot (either line graph or curve) ? Any idea ?
Each pipeline has corresponding well-purposed R function that accepting different parameter. I want to get only one plot where X
axis shows number of input peak files, Y
axis show run time of my package that analyzing each peak file. I am lack of idea how to generate rather explicit plot (line graph or curve) that indicate performance of my packages that accept list of peak files as an input. Any idea to make this happen easily ? What's the starting point to evaluate R package performance which can be determined by contribution of several R functions ? How can I get desired curve plot ? Thanks in advance :)
Best regards :
Jurat
What do you mean by nice plot? Do you wan to see how much it takes when you increase your input? Then you need to run the benchmark with several inputs (maybe just sections of a file, or increasing files) and plot a time vs length of input with ggplot2. You could also estimate the big-O of your package/algorithm and avoid calculating the benchmark.
Dear Lluis :
Thanks for your helpful respond. Yes, I want to see the plot run-time against number of features in each file. I did bench mark the function with giving several input files, but I am not happy with the resulted plot. I did this way.
This is the benchmark result data.frame by using
rbenchmark :
but resulted plot is still not desired. Could you reproduce your thought with intuitive example to get explicit plot ? How to estimate big-O of R/BioConductor package ? Thank you.
You can read about big-O notation in Wikipedia. You are plotting each function and the time it takes, not the number of features in each file. You could add a column where you indicate the number of features used for each function and plot using it. But if in each function you are using an even increasing number of features it seems that your function escalates at exponential rate, so your O notation is O(n^2). If you want a better O you would need to modify your package/algorithm/function.
Is that possible to reproduce your thought with few simple example ? I checked out stackoverflow about estimating big-O of R package, not much helpful post out there. Could you continue your statement with example ? Thanks
As you can see as I double the input On the time increase is also the double, while On2 the time increase is much higher.