recount3 - download bigwig files for TCGA data
1
1
Entering edit mode
Xuebing ▴ 10
@f6ce8073
Last seen 2.7 years ago
United States

How can I download all bigwig files for TCGA samples? I noticed an answer was provided previously for recount2 but it doesn’t work for recount3:

Recount2 Bigwigs for TCGA

Thanks!

recount3 recount TCGA • 1.5k views
ADD COMMENT
0
Entering edit mode
@lcolladotor
Last seen 1 day ago
United States

Hi,

Thank you for your interest in recount3 and recount2. The easiest option in recount3 to find the URLs for BigWig files is to use the recount3::create_rse() function which will include a colData() column called BigWigURL as shown at https://github.com/LieberInstitute/recount3/issues/21#issuecomment-1074156958. Here's a short extract:

as.data.frame(colData(rse)[1, c("external_id", "study", "BigWigURL")])
#>                                                 external_id study
#> GTEX-T6MN-0011-R1A-SM-32QOY.1 GTEX-T6MN-0011-R1A-SM-32QOY.1 BRAIN
#>                                                                                                                                                             BigWigURL
#> GTEX-T6MN-0011-R1A-SM-32QOY.1 http://duffel.rail.bio/recount3/human/data_sources/gtex/base_sums/IN/BRAIN/OY/gtex.base_sums.BRAIN_GTEX-T6MN-0011-R1A-SM-32QOY.1.ALL.bw

You could also use recount3::locate_url(), however as noted at https://github.com/LieberInstitute/recount3/issues/21#issuecomment-1074156958, that function doesn't guarantee that the result is a valid URL due to programmatic reasons from the data host side (IDIES at JHU).

Using recount3::create_rse() at the gene level might be a bit too much data to download for a large project such as TCGA (which is split by tissue as is GTEx), so you might prefer to dive into the internal code of recount3::create_rse_manual() and re-use it https://github.com/LieberInstitute/recount3/blob/6eb14b844062ebdf45fe5a356577e3ea0483c97e/R/create_rse_manual.R#L156-L165 after downloading the TCGA metadata files.

As you can see, there are a few different options, with different degrees of complexity.

Once you have located the URLs, you can use recount3::file_retrieve() which uses internally BiocFileCache::bfcrpath() https://github.com/LieberInstitute/recount3/blob/6eb14b844062ebdf45fe5a356577e3ea0483c97e/R/file_retrieve.R#L80 or download them through some other way including recount::download_retry() which uses internally downloader::download() https://github.com/leekgroup/recount/blob/10f29f9d44906f798aa3a7655ae40ac269c36ae5/R/download_retry.R#L39.

Best, Leo

ADD COMMENT

Login before adding your answer.

Traffic: 730 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6