Dear Jack and Sean,
I have tried to use the SRAdb package to retrieve FASTQ files for study, e.g. run SRR2961981.Unfortunately, I run into an error when I try to retrieve the paths to its FASTQ file:
This command succeeds:
listSRAfile("SRR2961981", sra_con, fileType = "sra") run study sample experiment 1 SRR2961981 SRP066728 SRS1180807 SRX1453139 ftp 1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX145/SRX1453139/SRR2961981/SRR2961981.sra
But this command fails:
listSRAfile("SRR2961981", sra_con, fileType = "fastq") Error in if (nchar(run1) < 10) { : missing value where TRUE/FALSE needed
Yet, the fastq file exists at the EBI:
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR296/001/SRR2961981
The error is thrown by the getFASTQinfo
function called by listSRAfile
with fileType = "fastq"
:
getFASTQinfo("SRR2961981", sra_con) Error in if (nchar(run1) < 10) { : missing value where TRUE/FALSE needed
because the SQL query
"SELECT * FROM fastq WHERE run_accession IN ('SRR2961981')"
returns no results (and the function doesn't check for that, hence the generic error message).
I am a bit confused, because the SRAdb SQLite database clearly knows about the run, as listSRAfile
succeeds. Is it possible that some runs are missing from its fastq
table?
Thanks a lot for any hints,
Thomas
> sessionInfo() R version 3.3.1 (2016-06-21) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.11.6 (El Capitan) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] dplyr_0.5.0 argparse_1.0.1 proto_0.3-10 SRAdb_1.30.0 [5] RCurl_1.95-4.8 bitops_1.0-6 graph_1.50.0 BiocGenerics_0.18.0 [9] RSQLite_1.0.0 DBI_0.5-1 BiocInstaller_1.22.3 loaded via a namespace (and not attached): [1] Rcpp_0.12.7 XML_3.98-1.4 assertthat_0.1 R6_2.2.0 magrittr_1.5 [6] stats4_3.3.1 httr_1.2.1 lazyeval_0.2.0 getopt_1.20.0 RMySQL_0.10.9 [11] rjson_0.2.15 tools_3.3.1 Biobase_2.32.0 findpython_1.0.1 tibble_1.2 [16] GEOquery_2.38.4
I am also getting this error for specific datasets. It seems somewhat random. SRP051830 is an example of one that doesn't work.
EDIT: What did work is to set the fileType = 'sra' ... so perhaps it's an issue with fastq availability. Downside is obviously that you have to do the sra --> fastq conversion yourself.