Note: this post is also on Biostars. The suggestion I got there was that my internet was failing, but there is no error message to indicate this is the case and I consistently get 6427 records, so I am on the fence about whether this is the reason. If it is, does anyone have advice on a fix or alternative that's not "get better internet"?
I want to query GWAS Catalog using the gwascat package in R. I was surprised to see makeCurrentGwasCat() returns only 6,427 associations when there are many more in GWAS Catalog. Is this what I am meant to be observing, or is something going wrong here?
> cat1 <- makeCurrentGwascat()
running read.delim on http://www.ebi.ac.uk/gwas/api/search/downloads/alternative...
formatting gwaswloc instance...
NOTE: input data had non-ASCII characters replaced by '*'.
Warning message:
In gwdf2GRanges(tab, extractDate = as.character(Sys.Date())) :
NAs introduced by coercion
> cat1
gwasloc instance with 6427 records and 38 attributes per record.
Extracted: 2021-01-12
Genome: GRCh38
Excerpt:
GRanges object with 5 ranges and 3 metadata columns:
seqnames ranges strand | DISEASE/TRAIT SNPS P-VALUE
<Rle> <IRanges> <Rle> | <character> <character> <numeric>
[1] 22 41151150 * | General risk tolerance (MTAG) rs75843224 6e-14
[2] 1 207861610 * | General risk tolerance (MTAG) rs984983 6e-14
[3] 2 59787624 * | General risk tolerance (MTAG) rs6732097 6e-14
[4] 12 102069362 * | General risk tolerance (MTAG) rs17437668 9e-14
[5] 6 26173250 * | General risk tolerance (MTAG) rs34661691 9e-14
-------
seqinfo: 23 sequences from GRCh38 genome
Contrast this to the data that comes with the package from 2016 which has more associations:
data(ebicat38)
ebicat38
gwasloc instance with 22714 records and 36 attributes per record.
Extracted: 2016-01-18
Genome: GRCh38
Excerpt:
GRanges object with 5 ranges and 3 metadata columns:
seqnames ranges strand | DISEASE/TRAIT SNPS P-VALUE
<Rle> <IRanges> <Rle> | <character> <character> <numeric>
[1] 11 41798900 * | Post-traumatic stress disorder rs10768747 5e-06
[2] 15 34768262 * | Post-traumatic stress disorder rs12232346 2e-06
[3] 8 96500749 * | Post-traumatic stress disorder rs2437772 6e-06
[4] 9 98221544 * | Post-traumatic stress disorder rs7866350 1e-06
[5] 15 54423444 * | Post-traumatic stress disorder rs73419609 6e-06
-------
seqinfo: 23 sequences from GRCh38 genome
My session info:
> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] gwascat_2.18.0 Homo.sapiens_1.3.1 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 org.Hs.eg.db_3.10.0
[5] GO.db_3.10.0 OrganismDbi_1.28.0 GenomicFeatures_1.38.2 GenomicRanges_1.38.0
[9] GenomeInfoDb_1.22.1 AnnotationDbi_1.48.0 IRanges_2.20.2 S4Vectors_0.24.4
[13] Biobase_2.46.0 BiocGenerics_0.32.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.5 lattice_0.20-41 prettyunits_1.1.1 Rsamtools_2.2.3 Biostrings_2.54.0 assertthat_0.2.1
[7] digest_0.6.27 asreml_4.1.0.110 BiocFileCache_1.10.2 R6_2.5.0 RSQLite_2.2.2 httr_1.4.2
[13] ggplot2_3.3.3 pillar_1.4.7 zlibbioc_1.32.0 rlang_0.4.10 progress_1.2.2 curl_4.3
[19] rstudioapi_0.13 data.table_1.13.6 blob_1.2.1 Matrix_1.2-18 BiocParallel_1.20.1 stringr_1.4.0
[25] RCurl_1.98-1.2 bit_4.0.4 biomaRt_2.42.1 munsell_0.5.0 DelayedArray_0.12.3 compiler_3.6.2
[31] rtracklayer_1.46.0 pkgconfig_2.0.3 askpass_1.1 openssl_1.4.3 tidyselect_1.1.0 SummarizedExperiment_1.16.1
[37] tibble_3.0.4 GenomeInfoDbData_1.2.2 matrixStats_0.57.0 XML_3.99-0.3 crayon_1.3.4 dplyr_1.0.2
[43] dbplyr_2.0.0 GenomicAlignments_1.22.1 bitops_1.0-6 rappdirs_0.3.1 RBGL_1.62.1 grid_3.6.2
[49] gtable_0.3.0 lifecycle_0.2.0 DBI_1.1.0 magrittr_2.0.1 scales_1.1.1 graph_1.64.0
[55] stringi_1.5.3 XVector_0.26.0 ellipsis_0.3.1 generics_0.1.0 vctrs_0.3.6 tools_3.6.2
[61] bit64_4.0.5 glue_1.4.2 purrr_0.3.4 hms_0.5.3 colorspace_2.0-0 BiocManager_1.30.10
[67] memoise_1.1.0
Thanks all.
Updating R and redownloading all of the packages solved my problem. Less of an internet issue, more of a me being lazy issue :) Thanks.