The following getGEO query retrieves data files and meta data for a
recent GEO submission of mine,
one that has been curated:
GDS4252 <- getGEO("GDS4252")
Columns(GDS4252)
> str(Columns(GDS4252))
'data.frame': 16 obs. of 4 variables:
$ sample : Factor w/ 16 levels "GSM754979","GSM754980",..:
5 6 7 8 1 2 3 4 13 14 ...
$ genotype/variation: Factor w/ 2 levels "CFTR mutant",..: 1 1 1 1 1
1 1 1 2 2 ...
$ agent : Factor w/ 2 levels "PA01","unexposed": 1 1 1 1 2
2 2 2 1 1 ...
The folks at NCBI have correctly created two factors with two levels
to describe the 16 samples in my experiment.
I am interested in retrieving similar information using GEOmetadb, but
this has proved problematic.
getSQLiteFile(destdir = getwd(), destfile = "GEOmetadb.sqlite.gz")
con <- dbConnect(SQLite(), "GEOmetadb.sqlite")
dat <- dbGetQuery(con, "select * from gds where gds = 'GDS4252'")
> dat
[1] ID gds title
[4] description type pubmed_id
[7] gpl platform_organism
platform_technology_type
[10] feature_count sample_organism sample_type
[13] channel_count sample_count value_type
[16] gse order update_date
<0 rows> (or 0-length row.names)
It seems, for starters, that this GDS identifier for my particular
submission isn't accounted for in the current
database.
Others are, so it looks like my syntax and so forth is ok:
> dat <- dbGetQuery(con, "select gds from gds limit 10")
> dat
gds
1 GDS5
2 GDS6
3 GDS10
4 GDS12
5 GDS15
6 GDS16
7 GDS17
8 GDS18
9 GDS19
10 GDS20
There is also the question of where a set of fields (variable in
number) describing sample factors and their levels would actually
"live"
in the SQLite database.
This information does not seem to be an attribute of the GDS in any
case:
> dat <- dbGetQuery(con, "select fieldname from geodb_column_desc
where TableName = 'gds'")
> dat
FieldName
1 ID
2 channel_count
3 description
4 feature_count
5 gds
6 order
7 platform
8 platform_organism
9 platform_technology_type
10 pubmed_id
11 reference_series
12 sample_count
13 sample_organism
14 sample_type
15 title
16 type
17 update_date
18 value_type
Nor does it seem to be a feature stored in the samples:
> dat <- dbGetQuery(con, "select fieldname from geodb_column_desc
where TableName = 'gsm'")
> dat
FieldName
1 ID
2 channel_count
3 characteristics_ch1
4 characteristics_ch2
5 contact
6 data_processing
7 data_row_count
8 description
9 extract_protocol_ch1
10 extract_protocol_ch2
11 gpl
12 gse
13 gsm
14 hyb_protocol
15 label_ch1
16 label_ch2
17 label_protocol_ch1
18 label_protocol_ch2
19 last_update_date
20 molecule_ch1
21 molecule_ch2
22 organism_ch1
23 organism_ch2
24 source_name_ch1
25 source_name_ch2
26 status
27 submission_date
28 supplementary_file
29 title
30 treatment_protocol_ch1
31 treatment_protocol_ch2
32 type
Any advice greatly appreciated.
Tom
[[alternative HTML version deleted]]
Hi, Tom.
Sorry to take so long to get back to you. See below.
On Thu, Jun 6, 2013 at 11:15 AM, Thomas H. Hampton
<thomas.h.hampton at="" dartmouth.edu=""> wrote:
> The following getGEO query retrieves data files and meta data for a
recent GEO submission of mine,
> one that has been curated:
>
> GDS4252 <- getGEO("GDS4252")
> Columns(GDS4252)
>> str(Columns(GDS4252))
> 'data.frame': 16 obs. of 4 variables:
> $ sample : Factor w/ 16 levels
"GSM754979","GSM754980",..: 5 6 7 8 1 2 3 4 13 14 ...
> $ genotype/variation: Factor w/ 2 levels "CFTR mutant",..: 1 1 1 1
1 1 1 1 2 2 ...
> $ agent : Factor w/ 2 levels "PA01","unexposed": 1 1 1 1
2 2 2 2 1 1 ...
>
> The folks at NCBI have correctly created two factors with two levels
to describe the 16 samples in my experiment.
>
> I am interested in retrieving similar information using GEOmetadb,
but this has proved problematic.
>
> getSQLiteFile(destdir = getwd(), destfile = "GEOmetadb.sqlite.gz")
>
> con <- dbConnect(SQLite(), "GEOmetadb.sqlite")
> dat <- dbGetQuery(con, "select * from gds where gds = 'GDS4252'")
>
>> dat
> [1] ID gds title
> [4] description type pubmed_id
> [7] gpl platform_organism
platform_technology_type
> [10] feature_count sample_organism sample_type
> [13] channel_count sample_count value_type
> [16] gse order update_date
> <0 rows> (or 0-length row.names)
>
> It seems, for starters, that this GDS identifier for my particular
submission isn't accounted for in the current
> database.
>
> Others are, so it looks like my syntax and so forth is ok:
>
>> dat <- dbGetQuery(con, "select gds from gds limit 10")
>> dat
> gds
> 1 GDS5
> 2 GDS6
> 3 GDS10
> 4 GDS12
> 5 GDS15
> 6 GDS16
> 7 GDS17
> 8 GDS18
> 9 GDS19
> 10 GDS20
>
>
> There is also the question of where a set of fields (variable in
number) describing sample factors and their levels would actually
"live"
> in the SQLite database.
It does appear that our update script has a bug; GDS4252 is not
present, so we'll check on that.
> This information does not seem to be an attribute of the GDS in any
case:
You'll want to check out the gds_subset table for details of the GDS
groups.
>> dat <- dbGetQuery(con, "select fieldname from geodb_column_desc
where TableName = 'gds'")
>> dat
> FieldName
> 1 ID
> 2 channel_count
> 3 description
> 4 feature_count
> 5 gds
> 6 order
> 7 platform
> 8 platform_organism
> 9 platform_technology_type
> 10 pubmed_id
> 11 reference_series
> 12 sample_count
> 13 sample_organism
> 14 sample_type
> 15 title
> 16 type
> 17 update_date
> 18 value_type
>
> Nor does it seem to be a feature stored in the samples:
>
>> dat <- dbGetQuery(con, "select fieldname from geodb_column_desc
where TableName = 'gsm'")
>> dat
> FieldName
> 1 ID
> 2 channel_count
> 3 characteristics_ch1
> 4 characteristics_ch2
> 5 contact
> 6 data_processing
> 7 data_row_count
> 8 description
> 9 extract_protocol_ch1
> 10 extract_protocol_ch2
> 11 gpl
> 12 gse
> 13 gsm
> 14 hyb_protocol
> 15 label_ch1
> 16 label_ch2
> 17 label_protocol_ch1
> 18 label_protocol_ch2
> 19 last_update_date
> 20 molecule_ch1
> 21 molecule_ch2
> 22 organism_ch1
> 23 organism_ch2
> 24 source_name_ch1
> 25 source_name_ch2
> 26 status
> 27 submission_date
> 28 supplementary_file
> 29 title
> 30 treatment_protocol_ch1
> 31 treatment_protocol_ch2
> 32 type
>
>
> Any advice greatly appreciated.
The following getGEO query retrieves data files and meta data for a
recent GEO submission of mine,
one that has been curated:
GDS4252 <- getGEO("GDS4252")
Columns(GDS4252)
> str(Columns(GDS4252))
'data.frame': 16 obs. of 4 variables:
$ sample : Factor w/ 16 levels "GSM754979","GSM754980",..:
5 6 7 8 1 2 3 4 13 14 ...
$ genotype/variation: Factor w/ 2 levels "CFTR mutant",..: 1 1 1 1 1
1 1 1 2 2 ...
$ agent : Factor w/ 2 levels "PA01","unexposed": 1 1 1 1 2
2 2 2 1 1 ...
The folks at NCBI have correctly created two factors with two levels
to describe the 16 samples in my experiment.
I am interested in retrieving similar information using GEOmetadb, but
this has proved problematic.
getSQLiteFile(destdir = getwd(), destfile = "GEOmetadb.sqlite.gz")
con <- dbConnect(SQLite(), "GEOmetadb.sqlite")
dat <- dbGetQuery(con, "select * from gds where gds = 'GDS4252'")
> dat
[1] ID gds title
[4] description type pubmed_id
[7] gpl platform_organism
platform_technology_type
[10] feature_count sample_organism sample_type
[13] channel_count sample_count value_type
[16] gse order update_date
<0 rows> (or 0-length row.names)
It seems, for starters, that this GDS identifier for my particular
submission isn't accounted for in the current
database.
Others are, so it looks like my syntax and so forth is ok:
> dat <- dbGetQuery(con, "select gds from gds limit 10")
> dat
gds
1 GDS5
2 GDS6
3 GDS10
4 GDS12
5 GDS15
6 GDS16
7 GDS17
8 GDS18
9 GDS19
10 GDS20
There is also the question of where a set of fields (variable in
number) describing sample factors and their levels would actually
"live"
in the SQLite database.
This information does not seem to be an attribute of the GDS in any
case:
> dat <- dbGetQuery(con, "select fieldname from geodb_column_desc
where TableName = 'gds'")
> dat
FieldName
1 ID
2 channel_count
3 description
4 feature_count
5 gds
6 order
7 platform
8 platform_organism
9 platform_technology_type
10 pubmed_id
11 reference_series
12 sample_count
13 sample_organism
14 sample_type
15 title
16 type
17 update_date
18 value_type
Nor does it seem to be a feature stored in the samples:
> dat <- dbGetQuery(con, "select fieldname from geodb_column_desc
where TableName = 'gsm'")
> dat
FieldName
1 ID
2 channel_count
3 characteristics_ch1
4 characteristics_ch2
5 contact
6 data_processing
7 data_row_count
8 description
9 extract_protocol_ch1
10 extract_protocol_ch2
11 gpl
12 gse
13 gsm
14 hyb_protocol
15 label_ch1
16 label_ch2
17 label_protocol_ch1
18 label_protocol_ch2
19 last_update_date
20 molecule_ch1
21 molecule_ch2
22 organism_ch1
23 organism_ch2
24 source_name_ch1
25 source_name_ch2
26 status
27 submission_date
28 supplementary_file
29 title
30 treatment_protocol_ch1
31 treatment_protocol_ch2
32 type
Any advice greatly appreciated.
Tom
[[alternative HTML version deleted]]