Entering edit mode
Ochsner, Scott A
▴
60
@ochsner-scott-a-4334
Last seen 10.4 years ago
Jing,
Here is where you have to be very careful. The metadata does seem to
indicate that the data are log2 and that RMA has been utilized. As
this dataset is from Affymetrix, I would expect log2 values to be in
the range of 2 to 16. From what little you have shown us this appears
to be the case. Safest bet is to import the .CEL files if available
and normalize yourself. I've come across a few datasets archived in
GEO in which the journal article describes a normalization procedure
which is not consistent with what is described in GEO metadata which
is not consistent with the actual data. I have truly found that with
GEO data, buyer beware.
Scott
Scott A. Ochsner, PhD
One Baylor Plaza BCM130, Houston, TX 77030
Voice: (713) 798-6227 Fax: (713) 790-1275
-----Original Message-----
From: bioconductor-bounces@r-project.org [mailto:bioconductor-
bounces@r-project.org] On Behalf Of Jing Huang
Sent: Tuesday, August 30, 2011 10:36 AM
To: 'bioconductor at r-project.org'
Subject: [BioC] GEOquery package
Dear Sean and all members,
I am trying to extract GSE data from GEO and do analysis. I am
wondering if the GSE data has been normalized and log 2 transformed. R
scripts and output are copied below. Can somebody help me on this?
>Table(GSMList(gse)[[1]])[1:5, ]
ID_REF VALUE
1 1007_s_at 7.693888187
2 1053_at 8.571408272
3 117_at 5.179812431
4 121_at 7.468027592
5 1255_g_at 3.118550777
> Columns(GSMList(gse)[[1]])[1:5, ]
Column Description
1 ID_REF
2 VALUE log2 signal intensity, RMA <<<<< Does this means
that the value is log2 transformed and the data was normalized
by RMA
NA <na> <na>
NA.1 <na> <na>
NA.2 <na> <na>
According to GEOquery package I should do following steps in order to
get the eset:
> probesets <- Table(GPLList(gse)[[1]])$ID
> data.matrix <- do.call("cbind", lapply(GSMList(gse), function(x) {
+ tab <- Table(x)
+ mymatch <- match(probesets, tab$ID_REF)
+ return(tab$VALUE[mymatch])
+ }))
> data.matrix <- apply(data.matrix, 2, function(x) {
+ as.numeric(as.character(x))
+ })
> data.matrix <- log2(data.matrix)
> data.matrix[1:5, ]
GSM424759 GSM424760 GSM424761 GSM424762 GSM424763 GSM424764
GSM424765
[1,] 2.943713 2.917086 2.926155 2.983485 2.973219 2.962445
2.926030
[2,] 3.099532 3.136898 3.152696 3.217172 3.206948 3.198448
3.135146
[3,] 2.372900 2.309177 2.354380 2.373350 2.368464 2.381139
2.314555
[4,] 2.900727 2.873853 2.863911 2.879232 2.927384 2.913594
2.852870
[5,] 1.640876 1.645330 1.494274 1.792643 1.719597 1.648126
1.605055
Is the log2 transformation necessary for this dataset?
Many thanks
Jing
[[alternative HTML version deleted]]
_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor