Question

SummarizedExperiment object from GEOquery obtained GSE

1

Entering edit mode

rbronste ▴ 60

@rbronste-12189

Last seen 5.5 years ago

I was wondering if there was a straightforward way to take a GEOquery downloaded GSE, which I get in the following way:

gse <- getGEO("GSE63137",GSEMatrix=FALSE)

and to create a SummarizedExperiment object with representative ranges? If the GSE has for instance 10 bed files, is there a quick way to make a singular SummarizedExperiment object from these?

geoquery summarizedexperiment GEO matrix • 3.0k views

ADD COMMENT • link updated 7.3 years ago by Sean Davis 21k • written 7.3 years ago by rbronste ▴ 60

1

Entering edit mode

With 10 bed files, what would you want the ranges and the "assay" in the summarized experiment to contain?

ADD REPLY • link 7.3 years ago Sean Davis 21k

1

Entering edit mode

No I don't mean that it should be a single SummarizedExperiment object (sorry bad wording on my part), just that I would obtain 10 SE objects simultaneously with associated GRanges. Just in general what is the most straightforward and direct way to do this?

ADD REPLY • link 7.3 years ago rbronste ▴ 60

score 4 · Accepted Answer · 2018-01-02

4

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 10 weeks ago

United States

I have just updated the GEOquery getGEOSuppFiles() function (version 2.47.16) to support filtering supplemental files (filter_regex='bed') and to return a listing of files without download (fetch_files=FALSE). To give it a try immediately, install from github:

biocLite('seandavi/GEOquery')

In 24-48 hours, the development version of GEOquery should be available.

After installation, you should be able to do something like:

library(GEOquery)
library(rtracklayer)
bedfiles = getGEOSuppFiles("GSE63137", filter_regex = 'bed')
# a data.frame with the filenames as rownames
bedfiles_as_granges = lapply(rownames(bedfiles), import, format = "bed")
bedfiles_as_granges[[1]] #first file granges

[[1]]
GRanges object with 103361 ranges and 0 metadata columns:
           seqnames               ranges strand
              <Rle>            <IRanges>  <Rle>
       [1]     chr1   [3094879, 3095533]      *
       [2]     chr1   [3119625, 3120840]      *
       [3]     chr1   [3121310, 3121944]      *
       [4]     chr1   [3292627, 3293590]      *
       [5]     chr1   [3322353, 3322979]      *
       ...      ...                  ...    ...
  [103357]     chrY [90808536, 90809176]      *
  [103358]     chrY [90810611, 90811697]      *
  [103359]     chrY [90812380, 90813359]      *
  [103360]     chrY [90828629, 90829131]      *
  [103361]     chrY [90838918, 90839418]      *
  -------
  seqinfo: 22 sequences from an unspecified genome; no seqlengths

I suspect that leaving these as GRanges objects may be the most useful form for analysis, but you could go ahead and convert to SummarizedExperiments if you like.

ADD COMMENT • link 7.3 years ago Sean Davis 21k

0

Entering edit mode

Thanks, exactly what I was after!

ADD REPLY • link 7.3 years ago rbronste ▴ 60

1

Entering edit mode

Great. Let me know if you have any problems or have further suggestions on the GEOquery side of things.

ADD REPLY • link 7.3 years ago Sean Davis 21k

1

Entering edit mode

Actually one other quick question, the (filter_regex='bed'), will this also eliminate the .tar files that contain .bed files from retrieval and only get the bed.gz files listed in supp data?

ADD REPLY • link 7.3 years ago rbronste ▴ 60

1

Entering edit mode

The regex is matched directly against filenames, not file contents. The default behavior when no regex is specified is to fetch all supplemental files, so the .tar file will be downloaded by default.

ADD REPLY • link 7.3 years ago Sean Davis 21k