The official (supplied by codelink) GEO Platform (GPL) is:
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL2895
The GPL record contains information about associated samples and series. The following will provide the series IDs associated with the codelink platform:
gpl = getGEO("GPL2895")
Meta(gpl)$series_id
[1] "GSE3578" "GSE4106" "GSE4609" "GSE4812" "GSE4846" "GSE5108" "GSE5216" "GSE5350" "GSE6213"
[10] "GSE6304" "GSE6585" "GSE6630" "GSE6692" "GSE7330" "GSE8353" "GSE8604" "GSE9332" "GSE9490"
[19] "GSE10064" "GSE10123" "GSE10145" "GSE12530" "GSE13857" "GSE14797" "GSE14808" "GSE15829" "GSE16523"
[28] "GSE16717" "GSE16944" "GSE17470" "GSE18124" "GSE18464" "GSE19834" "GSE20167" "GSE22812" "GSE24519"
[37] "GSE24591" "GSE24807" "GSE25431" "GSE26326" "GSE27448" "GSE29002" "GSE29136" "GSE29763" "GSE31075"
[46] "GSE32191" "GSE32403" "GSE32902" "GSE33133" "GSE33651" "GSE35499" "GSE36007" "GSE37186" "GSE37187"
[55] "GSE38542" "GSE40007" "GSE44172" "GSE44187" "GSE44736" "GSE55768" "GSE56739" "GSE60602" "GSE79189"
[64] "GSE80347" "GSE94318"
There are three additional GPLs (alternative--supplied by other submitters) noted on that webpage. GEO adds that information to the GPL as simple text annotations (not ideal, but the information is there).
Meta(gpl)$relation
[1] "Alternative to: GPL11010"
[2] "Alternative to: GPL8060"
[3] "Alternative to: GPL18134 ([DISCOVERY PROBE_TYPE])"
Each of these GPL records can be treated the same way to get a complete list of GSEs (or GSMs, if that is the goal).
Alternatively, each GSE record has an associated platform, stored in the annotation
slot of an ExpressionSet
. More concretely:
gse = getGEO('GSE3578')[[1]]
# gse is an ExpressionSet
gse
Note the Annotation
below shows "GPL2895".
ExpressionSet (storageMode: lockedEnvironment)
assayData: 54359 features, 156 samples
element names: exprs
protocolData: none
phenoData
sampleNames: GSM82284 GSM82285 ... GSM128604 (156 total)
varLabels: title geo_accession ... data_row_count (31 total)
varMetadata: labelDescription
featureData
featureNames: 1001 1002 ... 504109 (54359 total)
fvarLabels: ID LOGICAL_ROW ... GI_LIST (9 total)
fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
Annotation: GPL2895
Returning to the original question, checking to see if a GSE belongs to a specific platform is just this check:
annotation(gse) == 'GPL2895'
[1] TRUE
EDIT: This answer is perhaps not a complete answer to the original question, it seems, as the question
seems to focus on parsing of text files after reading again. Indeed, matching files to formats is a
challenging problem.