This problem is back, and it is somewhat acute for me as the edX course currently running (PH525.5x) asks that GEOquery be used for a certain dataset.
I am going to narrate fully a workaround. Students are welcome to attempt the workaround and notify staff if alternate approaches are needed. Errorfighting skills are a central component of true mastery.
First, getGEO fails.
> prob = getGEO("GSE34313") ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE34nnn/GSE34313/matrix/ Found 1 file(s) GSE34313_series_matrix.txt.gz Using locally cached version: /var/folders/5_/14ld0y7s0vbg_z0g2c9l8v300000gr/T//Rtmpq4jVn9/GSE34313_series_matrix.txt.gz Using locally cached version of GPL6480 found here: /var/folders/5_/14ld0y7s0vbg_z0g2c9l8v300000gr/T//Rtmpq4jVn9/GPL6480.soft Error in xj[i] : only 0's may be mixed with negative subscripts Enter a frame number, or 0 to exit 1: getGEO("GSE34313") 2: getAndParseGSEMatrices(GEO, destdir, AnnotGPL = AnnotGPL, getGPL = getGPL) 3: parseGSEMatrix(destfile, destdir = destdir, AnnotGPL = AnnotGPL, getGPL = g 4: getGEO(GPL, AnnotGPL = AnnotGPL, destdir = destdir) 5: parseGEO(filename, GSElimits, destdir, AnnotGPL = AnnotGPL, getGPL = getGPL 6: parseGPL(fname) 7: .parseGPLWithLimits(con) 8: new("GEODataTable", columns = cols, table = dat3[1:(nrow(dat3) - 1), ]) 9: initialize(value, ...) 10: initialize(value, ...) 11: dat3[1:(nrow(dat3) - 1), ] 12: `[.data.frame`(dat3, 1:(nrow(dat3) - 1), )
The caches are not the problem. There is something wrong with the pursuit of the annotation. I haven't had time to understand what.
But we can get the expression data successfully by setting one option to a non-default value.
> exonly = getGEO("GSE34313", getGPL=FALSE) ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE34nnn/GSE34313/matrix/ Found 1 file(s) GSE34313_series_matrix.txt.gz Using locally cached version: /var/folders/5_/14ld0y7s0vbg_z0g2c9l8v300000gr/T//Rtmpq4jVn9/GSE34313_series_matrix.txt.gz Warning message: closing unused connection 3 (/var/folders/5_/14ld0y7s0vbg_z0g2c9l8v300000gr/T//Rtmpq4jVn9/GPL6480.soft) > exonly[[1]] ExpressionSet (storageMode: lockedEnvironment) assayData: 41000 features, 10 samples element names: exprs protocolData: none phenoData sampleNames: GSM847200 GSM847201 ... GSM847209 (10 total) varLabels: title geo_accession ... data_row_count (36 total) varMetadata: labelDescription featureData: none experimentData: use 'experimentData(object)' Annotation: GPL6480
Can we get the annotation data? Yes, manually we can get it at
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL6480
clicking on the Annotation SOFT table button. Now we have
GPL6480.annot.gz on disk. We then use
anno = parseGPL("~/Downloads/GPL6480.annot.gz") # that's my download area; use yours
beware:
> warnings() Warning messages: 1: In readLines(con, 1) : seek on a gzfile connection returned an internal error 2: In readLines(con, 1) : seek on a gzfile connection returned an internal error
and 30 additional
Doesn't look promising, but press on.
> getClass(class(anno)) Class "GPL" [package "GEOquery"] Slots: Name: dataTable header Class: GEODataTable list Extends: "GEOData" > getClass("GEODataTable") Class "GEODataTable" [package "GEOquery"] Slots: Name: columns table Class: data.frame data.frame > dim(anno@dataTable@table) [1] 41108 22
That's a nice row number ... maybe this will work?
eset = exonly[[1]] annotab = anno@dataTable@table annotab = annotab[-which(is.na(annotab[,1])),] rownames(annotab) = as.character(annotab[,1]) fData(eset) = annotab[ rownames(exprs(eset)), ] > eset ExpressionSet (storageMode: lockedEnvironment) assayData: 41000 features, 10 samples element names: exprs protocolData: none phenoData sampleNames: GSM847200 GSM847201 ... GSM847209 (10 total) varLabels: title geo_accession ... data_row_count (36 total) varMetadata: labelDescription featureData featureNames: A_23_P100001 A_23_P100011 ... A_32_P99942 (41000 total) fvarLabels: ID Gene title ... Platform_SEQUENCE (22 total) fvarMetadata: labelDescription experimentData: use 'experimentData(object)' Annotation: GPL6480
Still not there: check experimentData(eset) ...
library(annotate) mi = pmid2MIAME("21257922")
Read 495 items
now i've got an eset i can believe in?
> experimentData(eset) = mi > eset ExpressionSet (storageMode: lockedEnvironment) assayData: 41000 features, 10 samples element names: exprs protocolData: none phenoData sampleNames: GSM847200 GSM847201 ... GSM847209 (10 total) varLabels: title geo_accession ... data_row_count (36 total) varMetadata: labelDescription featureData featureNames: A_23_P100001 A_23_P100011 ... A_32_P99942 (41000 total) fvarLabels: ID Gene title ... Platform_SEQUENCE (22 total) fvarMetadata: labelDescription experimentData: use 'experimentData(object)' pubMedIds: 21257922 Annotation: GPL6480 > abstract(eset) [1] "Glucocorticoids (GCs), which activate GC receptor (GR) signaling and thus modulate gene expression, are widely used to treat asthma. GCs exert their therapeutic effects...
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hi Vince,
I can't reproduce:
Or on Windows: