Entering edit mode
Paul Shannon
▴
750
@paul-shannon-5161
Last seen 10.2 years ago
I am new to QuasR, and alos quite new to aligning short reads to
reference genomes more generally.
I cannot figure out how to use a pre-built indexed reference genome
file with QuasR. The examples supplied with the package work nicely.
Scaling up to using all of hg19 raises problems for me. I apologize
if I am missing the obvious.
To illustrate the problem, I call QuasR's qAlign method with just two
arguments (quoting from the man page):
sampleFile: a text file listing input sequence files and sample
names
genome: the reference genome for primary alignments, one of:
* a string referring to a "BSgenome" package (e.g.
""BSgenome.Hsapiens.UCSC.hg19""), which will be
downloaded automatically from Bioconductor if not
present
* the name of a fasta sequence file containing one or
several sequences (chromosomes) to be used as a
reference
QuasR apparently invokes the bowtie indexing program when supplied
either of the two "genome" options: a BSgenome package, or a fasta
file. But since indexing takes a long time -- hours, apparently -- I
hoped to provide a ready-made index file, and found some described
here:
http://bowtie-bio.sourceforge.net/tutorial.shtml
specifically
ftp://ftp.ccb.jhu.edu/pub/data/bowtie_indexes/hg19.ebwt.zip
Various attempts to specify this file, or any of its contents
(unzipped) to QuasR fail with these messages:
Error: The specified genome
/Users/pshannon/s/data/public/bowtie/indexes/hg19.1.ebwt does not have
the extension of a fasta file (fa,fasta,fna)>
Error: The specified genome has to be a file and not a directory:
/Users/pshannon/s/data/public/bowtie/indexes
I'll be grateful for advice on how to do this properly.
Thanks,
- Paul
> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] C
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods base
other attached packages:
[1] Rsamtools_1.13.14 BSgenome_1.29.0 Biostrings_2.29.2
QuasR_1.1.4 GenomicRanges_1.13.12 XVector_0.1.0
[7] IRanges_1.19.8 BiocGenerics_0.7.2 Rbowtie_1.1.3
BiocInstaller_1.11.1
loaded via a namespace (and not attached):
[1] AnnotationDbi_1.23.11 Biobase_2.21.2 DBI_0.2-7
GenomicFeatures_1.13.8 RCurl_1.95-4.1
[6] RSQLite_0.11.3 ShortRead_1.19.3 XML_3.95-0.2
biomaRt_2.17.0 bitops_1.0-5
[11] compiler_3.0.0 grid_3.0.0 hwriter_1.3
lattice_0.20-15 rtracklayer_1.21.5
[16] stats4_3.0.0 tools_3.0.0 zlibbioc_1.7.0