Entering edit mode
Manca Marco PATH
▴
200
@manca-marco-path-3578
Last seen 10.3 years ago
Dear James, dear Sean, and dear Bioconductors
good morning.
Thank you for your help up to now, I really apreciate it.
I am probably a bit thickheaded, and I apologize for this, but I am
still missing something from the picture. The work instructions from
James worked excellently in my case, and I am sincerely grateful for
the patience and support I have receive.
I am nevertheless wondering how did you gain all this insight into the
GSE structure and its handling...
I have read the following documents:
- An Introduction to Bioconductor's ExpressionSet Class ( http://www.b
ioconductor.org/packages/2.5/bioc/vignettes/Biobase/inst/doc/Expressio
nSetIntroduction.pdf )
- GEOquery ( http://watson.nci.nih.gov/bioc_mirror/2.4/bioc/manuals/GE
Oquery/man/GEOquery.pdf )
- Using the GEOquery package ( http://www.bioconductor.org/packages/1.
8/bioc/vignettes/GEOquery/inst/doc/GEOquery.pdf )
...and yet I am afraid that I would have terrible headaches trying to
do what James (and Sean) guided me to, on a new dataset all on my own.
Is there any source of information on the topics that I am missing? Or
is it just the experience gathered during a painful attempts/failures-
success process?
My best regards,
yorus Marco
--
Marco Manca, MD
University of Maastricht
Faculty of Health, Medicine and Life Sciences (FHML)
Cardiovascular Research Institute (CARIM)
E-mail: m.manca at path.unimaas.nl
Mobile: +31626441205
Twitter: @markomanka
________________________________________
Da: James F. Reid [james.reid at ifom-ieo-campus.it]
Inviato: luned? 27 luglio 2009 13.26
A: Manca Marco (PATH)
Cc: sdavis2 at mail.nih.gov; bioconductor mailing list
Oggetto: Re: [BioC] R: How to use GEOquery to extract more than the
default information from a GSE
Hi Marco,
if you set GSEMatrix=FALSE and pick what you want you will have to
create an ExpressionSet de novo.
For extracting particular annotations of the samples, for example
'characteristics_ch1' and 'source_name_ch1' as you mention, you will
want to include these in an annotated phenoData data.frame which in
turn
will be included in an ExpressionSet.
Here's a way of producing a reduced phenoData:
library("GEOquery")
gse <- getGEO('GSE9820', GSEMatrix=FALSE)
pD1 <- sapply(names(GSMList(gse)), function(gsm)
GSMList(gse)[[gsm]]@header$characteristics_ch1)
pD2 <- sapply(names(GSMList(gse)), function(gsm)
GSMList(gse)[[gsm]]@header$source_name_ch1)
pD1[,1]
##[1] "patient" "patient ID_REF: A10" "age:58"
"sex:M"
pD2[1]
## GSM247703
##"macrophages"
## now put things together
pD <- data.frame(type = pD1[1, ],
patientID = sub("patient ID_REF: ", "", pD1[2, ]),
age = sub("age:", "", pD1[3, ]),
sex = sub("sex:", "", pD1[4, ]),
cell = pD2)
phenoD <- new('AnnotatedDataFrame',
data = pD,
varMetadata = data.frame(labelDescription = colnames(pD)))
When you create the 'exprs' slot in the ExpressionSet make sure that
the
columns match the rows of this phenoData object.
HTH,
J.
Manca Marco (PATH) wrote:
> Dear James,
>
> thank you for your prompt and kind reply.
>
> I was doing like the following and I wasn't able to see my
annotation associated to the filesL
> library("GEOquery")
> gse <- getGEO("GSE9820")
> gse
>
> ...following your suggestion I get exactly the same output as you.
>
> Nevertheless I would love to be able to build my own ExprSet from a
GSE using GEOquery with the option GSEMatrix=FALSE and then selecting
the variables I want to import/include. In GEOquery's vignette there
is an example of this but I am not able to find a document listing the
options and the language/naming I should use to personalize the final
file (the vignette only mentions that personalizing everything is
quite difficult, but possible anyway).
>
> Thank you.
>
> Best regards,
> Marco
> ________________________________________
> Da: James F. Reid [james.reid at ifom-ieo-campus.it]
> Inviato: venerd? 24 luglio 2009 15.38
> A: Manca Marco (PATH)
> Cc: sdavis2 at mail.nih.gov; bioconductor mailing list
> Oggetto: Re: [BioC] How to use GEOquery to extract more than the
default information from a GSE
>
> Hi Marco,
>
> I'm not sure what you mean by 'more than default information'.
>
> Using GEOquery can be a bit complicated if the GEO series (GSE)
contains
> multiple platforms, but in your case you're fine because there is
only one.
>
> If you can get a complete ExpressionSet which stores samples
annotation,
> platform annotation and expression values by doing:
>
> library("GEOquery")
> gse <- getGEO("GSE9820")
> names(gse)
> ##[1] "GSE9820_series_matrix.txt.gz"
> gse[[1]]
>
> which prints out:
> ExpressionSet (storageMode: lockedEnvironment)
> assayData: 20589 features, 153 samples
> element names: exprs
> phenoData
> sampleNames: GSM247703, GSM247704, ..., GSM247855 (153 total)
> varLabels and varMetadata description:
> title: NA
> geo_accession: NA
> ...: ...
> data_row_count: NA
> (33 total)
> featureData
> featureNames: ILMN_10000, ILMN_10001, ..., ILMN_9999 (20589
total)
> fvarLabels and fvarMetadata description:
> ID: NA
> GB_ACC: NA
> ...: ...
> SYNONYM: NA
> (6 total)
> additional fvarMetadata: Column, Description
> experimentData: use 'experimentData(object)'
> Annotation: GPL6255
>
> fvarLabels(gse[[1]])
> [1] "ID" "GB_ACC" "SYMBOL" "DEFINITION" "ONTOLOGY"
> [6] "SYNONYM"
>
> contains all the information for the platform, varLabels will give
you
> the labels of the sample information and you can get to the
expression
> values by means of exprs(gse[[1]]).
>
> HTH,
> J.
>
>
> Manca Marco (PATH) wrote:
>> Dear Sean and dear bioconductors,
>>
>> I am writing you to ask a source of inspiration (code pieces,
notes, references, whatever you might think appropriate) to import
array annotation and other data from the GSE I am trying to work with
(namely the GSE9820) into my eset.
>>
>> I have read on GEOquery's vignette that this is actually possible,
despite being a bit tricky:
>>
>> "So, using a combination of lapply on the GSMList, one can extract
as many columns of interest as necessary to build the data structure
of choice. Because the GSM data from the GEO website are fully
downloaded and included in the GSE object, one can extract foreground
and background as well as quality for two-channel arrays, for example.
Getting array annotation is also a bit more complicated, but by
replacing \platform" in the lapply call to get platform information
for each array, one can get other information associated with each
array. Future work with this package will likely focus on better tools
for manipulating GSE data" From http://www.bioconductor.org/packages/2
.4/bioc/vignettes/GEOquery/inst/doc/GEOquery.pdf Page 22 of 22
>>
>> ...but I can't find anywhere any hint.
>>
>> Thank you in advance for your patience and support.
>>
>> My best regards,
>> Marco
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>