Entering edit mode
Guest User
★
13k
@guest-user-4897
Last seen 10.2 years ago
Hi
I am trying to write a package that will make a few shortcuts for my
lazy coworkers. So I wrote a few bits of code that will find their
primers in amongst a fastq of multiplexed reads (e.g 10-20).
Next I thought I would save them the trouble of copy pasting Primers,
Chromosome, and Start into a shell script, by instead autogenerating
the script - We have the excellent BSgenome and Mmusculus9 packages
installed so this seems a good starting point:
So for the first primer this works well:
> system.time(vmatchPattern("CCAGCACTGTATAGCCGATC", Mmusculus))
user system elapsed
45.853 2.702 50.273
This is fine for a single primer but it seems from the docs (and
testing) that if I want to lookup 15 primers it will take 15 passes
through the genome and 15x as long. About the same time it would take
them to just copy them from their lab-books. I guess they could have a
coffee...still...
My first question: Is there another function or package on BioC that I
have missed that might help me with this? Or low level functions I
should look at to build a vectorised search (exact match) through
Mmusculus?
And second I guess is a feature suggestion: Why not allow matchPattern
to pass once through the genome comparing a set(char vector ,
DNAStringSet etc) to the subject? This seems to require little extra
computational load (I think).
And given the difficulty of using BLAST within R might be very useful
extension.
thx
Stephen
-- output of sessionInfo():
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods base
other attached packages:
[1] BSgenome.Mmusculus.UCSC.mm9_1.3.19 BSgenome_1.30.0
Biostrings_2.30.1
[4] GenomicRanges_1.14.4 XVector_0.2.0
IRanges_1.20.6
[7] BiocGenerics_0.8.0 data.table_1.8.10
dplyr_0.1
[10] hflights_0.1 Rcpp_0.10.6
loaded via a namespace (and not attached):
[1] assertthat_0.1 devtools_1.4.1 digest_0.6.4 evaluate_0.5.1
formatR_0.10 httr_0.2 knitr_1.5
[8] memoise_0.1 RCurl_1.95-4.1 stats4_3.0.2 stringr_0.6.2
tools_3.0.2 whisker_0.3-2
--
Sent via the guest posting facility at bioconductor.org.