Problem with GEOquery R package in Windows when trying to download specific GEO processed data
1
1
Entering edit mode
svlachavas ▴ 840
@svlachavas-7225
Last seen 14 months ago
Germany/Heidelberg/German Cancer Resear…

Dear Community,

based on the very large size of raw data of a specific affymetrix HTA 2.0 dataset in GEO (with GSE88884), i used the following small code chunk to download the processed data:

library(GEOquery)
gseList = getGEO("GSE88884")

https://ftp.ncbi.nlm.nih.gov/geo/series/GSE88nnn/GSE88884/matrix/

OK

Found 1 file(s)

GSE88884_series_matrix.txt.gz

trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE88nnn/GSE88884/matrix/GSE88884_series_matrix.txt.gz'

Content type 'application/x-gzip' length 103966 bytes (101 KB)

downloaded 101 KB

 

File stored at:

C:\Users\EFSTAT~1\AppData\Local\Temp\RtmpEPUvTf/GPL17586.soft

Warning message:

In read.table(file = file, header = header, sep = sep, quote = quote,  :

  not all columns named in 'colClasses' exist

 

gse

ExpressionSet (storageMode: lockedEnvironment)

assayData: 0 features, 1820 samples

  element names: exprs

protocolData: none

phenoData

  sampleNames: GSM2350873 GSM2350874 ... GSM2352692 (1820 total)

  varLabels: title geo_accession ... relation (48 total)

  varMetadata: labelDescription

featureData

  featureNames:

  fvarLabels: ID probeset_id ... SPOT_ID (15 total)

  fvarMetadata: Column Description labelDescription

experimentData: use 'experimentData(object)'

Annotation: GPL17586

gse = gseList[[1]]
head(exprs(gse))

     GSM2350873 GSM2350874 GSM2350875 GSM2350876 GSM2350877 GSM2350878 GSM2350879 GSM2350880
     GSM2350881 GSM2350882 GSM2350883 GSM2350884 GSM2350885 GSM2350886 GSM2350887 GSM2350888
     GSM2350889 GSM2350890 GSM2350891 GSM2350892 GSM2350893 GSM2350894 GSM2350895 GSM2350896
     GSM2350897 GSM2350898 GSM2350899 GSM2350900 GSM2350901 GSM2350902.....

 

sessionInfo()

R version 3.3.1 (2016-06-21)

Platform: x86_64-w64-mingw32/x64 (64-bit)

Running under: Windows >= 8 x64 (build 9200)

 

locale:

[1] LC_COLLATE=Greek_Greece.1253  LC_CTYPE=Greek_Greece.1253    LC_MONETARY=Greek_Greece.1253

[4] LC_NUMERIC=C                  LC_TIME=Greek_Greece.1253   

 

attached base packages:

[1] parallel  stats     graphics  grDevices utils     datasets  methods   base    

 

other attached packages:

[1] GEOquery_2.40.0     Biobase_2.34.0      BiocGenerics_0.20.0

 

loaded via a namespace (and not attached):

[1] httr_1.2.1     R6_2.2.0       tools_3.3.1    RCurl_1.95-4.8 knitr_1.15.1   bitops_1.0-6 

[7] XML_3.98-1.5

 

So what about this weird problem ? with no genes/probesets and also no expression appear ?

 

geoquery affymetrix microarrays windows8 getGEO • 1.8k views
ADD COMMENT
1
Entering edit mode
@sean-davis-490
Last seen 4 months ago
United States

The GEO Series Matrix files are a reflection of what is submitted.  Unfortunately, the submitters did not submit any normalized values.  Instead, they submitted only the .CEL files, it appears.  So, you are stuck with needing to download the .CEL files and processing those as raw data.  That isn't necessarily a bad thing, but it is a bit less convenient.  You can use the `getGEOSuppFiles()` to orchestrate the downloads.  From there, you might find the oligo package useful for preprocessing.  The GSE can still serve as the sample annotation, but building the rest of the ExpressionSet will be a bit of coding.  

ADD COMMENT
0
Entering edit mode

Dear Sean,

thank you for your comprehensive answer !! i usually download raw files, but this time due to the very large size of the raw data, i saw "naively" from this link (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE88884) the abbreviation GSE88884_ILLUMINATE1and2_SLEbaselineVsHealthy_preprocessed.txt.gz

and i thought with getGEO() i would download the processed data, which lead my to the above problem. 

ADD REPLY

Login before adding your answer.

Traffic: 545 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6