Question

[gwascat] Updated GWAS catalog from file

1

Entering edit mode

enricoferrero ▴ 680

@enricoferrero-6037

Last seen 3.4 years ago

Switzerland

Hello,

Is there any way to get an up-to-date version of the GWAS catalog starting from an existing file? Currently the version accessible with data(ebicat38) is outdated (January 2016) and the function makeCurrentGwascat() only accepts a URL as argument, not an existing file.

I'm preparing some training material where students won't be able to rely on internet connection so I need a way to create a gwasloc object from a file on the hard drive. The file in question is exactly the same that gets downloaded and parsed by makeCurrentGwascat(), i.e.: https://www.ebi.ac.uk/gwas/api/search/downloads/alternative

Thank you!

Code illustrating the problem:

> library(gwascat)
Loading required package: Homo.sapiens
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, cbind, colMeans, colnames, colSums, do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rowMeans, rownames, rowSums, sapply, setdiff,
    sort, table, tapply, union, unique, unsplit, which, which.max, which.min

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: IRanges
Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

    expand.grid

Loading required package: OrganismDbi
Loading required package: GenomicFeatures
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: GO.db

Loading required package: org.Hs.eg.db

Loading required package: TxDb.Hsapiens.UCSC.hg19.knownGene
gwascat loaded.  Use data(ebicat38) for hg38 coordinates;
 data(ebicat37) for hg19 coordinates.

> packageVersion("gwascat")
[1] '2.10.0'

# this version is out of date, see date in 'Extracted'
> data(ebicat38)
> ebicat38
gwasloc instance with 22714 records and 36 attributes per record.
Extracted:  2016-01-18
Genome:  GRCh38
Excerpt:
GRanges object with 5 ranges and 3 metadata columns:
      seqnames               ranges strand |                  DISEASE/TRAIT        SNPS   P-VALUE
         <Rle>            <IRanges>  <Rle> |                    <character> <character> <numeric>
  [1]       11 [41798900, 41798900]      * | Post-traumatic stress disorder  rs10768747     5e-06
  [2]       15 [34768262, 34768262]      * | Post-traumatic stress disorder  rs12232346     2e-06
  [3]        8 [96500749, 96500749]      * | Post-traumatic stress disorder   rs2437772     6e-06
  [4]        9 [98221544, 98221544]      * | Post-traumatic stress disorder   rs7866350     1e-06
  [5]       15 [54423444, 54423444]      * | Post-traumatic stress disorder  rs73419609     6e-06
  -------
  seqinfo: 23 sequences from GRCh38 genome

# this fails because I'm not connected
> makeCurrentGwascat()
running read.delim on http://www.ebi.ac.uk/gwas/api/search/downloads/alternative...
Error in open.connection(file, "rt") : cannot open the connection

# this also fails because the function expects a URL, not a file
> makeCurrentGwascat("gwas.catalog.txt")
running read.delim on gwas.catalog.txt...
Error in url(table.url) : URL scheme unsupported by this method

gwascat • 2.4k views

ADD COMMENT • link 7.4 years ago enricoferrero ▴ 680

1

Entering edit mode

Robert Castelo ★ 3.4k

@rcastelo

Last seen 12 days ago

Barcelona/Universitat Pompeu Fabra

Assuming the downloaded file is called 'gwas_catalog_v1.0.1-associations_e90_r2017-09-12.tsv', what about

gwascat <- read.delim("gwas_catalog_v1.0.1-associations_e90_r2017-09-12.tsv", sep="\t", header=TRUE, stringsAsFactors=FALSE)

?

just look at the source code of 'makeCurrentGwascat()' and you'll find out the few instructions to build the object from this file.

cheers,

robert.

ADD COMMENT • link 7.4 years ago Robert Castelo ★ 3.4k

0

Entering edit mode

Thanks. Yes, this would probably work but it's not very elegant - especially considering this is training material for students.

I wonder wheter it would not be easier and more straightforward to follow to simply create a GRanges object at that point?

ADD REPLY • link 7.4 years ago enricoferrero ▴ 680

1

Entering edit mode

I see three options: write a wrapper function for your students, parse the gwascat file yourself and provide your students directly with the 'GRanges' object, or contact the package maintainer to request further functionality.

ADD REPLY • link 7.4 years ago Robert Castelo ★ 3.4k

score 2 · Accepted Answer · 2017-12-13

Based on Robert Castelo's answer below, I ended up doing this:

download.file("http://www.ebi.ac.uk/gwas/api/search/downloads/alternative", destfile = "gwas_catalog_v1.0.1-associations_e90_r2017-12-04.tsv")
snps <- read.delim("gwas_catalog_v1.0.1-associations_e90_r2017-12-04.tsv", check.names = FALSE, stringsAsFactors = FALSE)
snps <- gwascat:::gwdf2GRanges(snps, extractDate = "2017-12-04")
genome(snps) <- "GRCh38"

Which returns an object similar to what you would get with makeCurrentGwascat()