GEO series matrix empty
1
1
Entering edit mode
gs.123 ▴ 20
@36e5f9f3
Last seen 2.8 years ago
Canada

How can I load in the data from GSE108497 into R?

I tried using GEOquery and the expression data is empty:

gse108497<- getGEO('GSE108497',GSEMatrix=T)
show(gse108497)

$GSE108497_series_matrix.txt.gz
ExpressionSet (storageMode: lockedEnvironment)
assayData: 0 features, 512 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: GSM2901826 GSM2901827 ... GSM2902337 (512 total)
  varLabels: title geo_accession ... tp:ch1 (74 total)
  varMetadata: labelDescription
featureData
  featureNames:
  fvarLabels: ID Species ... GB_ACC (30 total)
  fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
Annotation: GPL10558

I've also tried using getGEOSuppFiles(), but could not find a solution. Thank you.

GEOquery GEO • 2.7k views
ADD COMMENT
1
Entering edit mode

The submitters did not actually submit quantification results for the samples as part of the sample records. Therefore, GEOquery cannot load them automatically. You'll have to use getGEOSuppFiles() and then merge the raw data with the annotation data that you got with getGEO. Unfortunately, there is no standard approach for merging the supplied raw data with the metadata from GEOquery; each dataset will vary somewhat.

ADD REPLY
0
Entering edit mode

Thank you for your reply. I don't think the submitters submitted anything at all.

ADD REPLY
1
Entering edit mode

The supplementary files (if you want to look programmatically, see getGEOSuppFiles) contain what are described as "non-normalized" and "normalized" data that you can read using standard R tab-delimited text file readers. You'll then have to manipulate those data into a form that you can use in R/Bioconductor. But the data are submitted.

ADD REPLY
1
Entering edit mode

When I read in "GSE108497_normalized_data.txt", I get this:

> GSE[1:3, 1:4]
             X9269325021_A Detection.Pval X9269325021_B Detection.Pval.1
ILMN_1708238       -3.3708         0.6234       -3.1623           0.6364
ILMN_1711886      146.8245         0.0000      194.8972           0.0000
ILMN_1759828        9.6286         0.1208       16.0719           0.0260

However, I can't find the file that will help me correspond the sample ID (i.e., GSMxxxxx to ILMN_xxxxxxx). Do you know where I can find this?

ADD REPLY
0
Entering edit mode
g = getGEO('GSE108497')[[1]]
pData(g)$description

Note that R adds the "X" to the column names in the .txt file, but if you look at the "description", they match up with the exception of the "X". And, yes, you have to just poke around. This approach works for this dataset, but the next one will be different (columns, naming, etc).

ADD REPLY
0
Entering edit mode

Thank you so much!! I've been working on this for days. Much appreciated.

ADD REPLY
1
Entering edit mode
@gordon-smyth
Last seen 6 hours ago
WEHI, Melbourne, Australia

It seems to me that there is a serious problem with this dataset. The GEO series lists 512 samples but the supplementary data files contain expression values for only 510 beadchips. I think one would have to write to the submitters.

ADD COMMENT

Login before adding your answer.

Traffic: 620 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6