getting normalized expression values from GEO GSE files

0

Entering edit mode

Maria Kesa ▴ 30

@maria-kesa-6668

Last seen 10.5 years ago

Hello:-), My name is Maria and my goal is to get normalized gene expression values from this study http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE3398 I installed GEOQuery and it's dependencies RCurl and XML library. I have two questions: 1. How do I resolve the error that is posted below, when I try to use gse3398<-getGEO('GSE3398',GSEMatrix=TRUE) ? (I tried installing and reinstalling RCurl and GEOQuery) 2. How should I normalize the data, considering that there are multiple platforms in the experiment? 3. If point 1. can not be made to work, I found that it is possible to load the files manually using the links like (Replacing GPL2648 with the different platforms in the series) ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE3398/GSE3398-GPL26 48_series_matrix.txt.gz. My question is how do I process these files and put them into an eset in R? As I ask in question 2, how do I get the normalized gene expression values out of the data and get the gene names? Your help would be much appreciated! The error message that I get and the sessionInfo is below. > gse3398<-getGEO('GSE3398',GSEMatrix=TRUE)Found 7 file(s)GSE3398-GPL2648_series_matrix.txt.gzsh: 1: curl: not foundError in file(con, "r") : cannot open the connectionIn addition: Warning messages:1: In download.file(sprintf("ftp://ftp.ncbi.nlm.nih.gov/geo/s eries/%s/%s/matrix/%s", : download had nonzero exit status2: In file(con, "r") : cannot open file '/tmp/RtmppUAQIH/GSE3398-GPL2648_series_matrix.txt.gz': No such file or directory > sessionInfo()R version 3.1.1 (2014-07-10) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 [2] LC_NUMERIC=C [3] LC_TIME=et_EE.UTF-8 [4] LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=et_EE.UTF-8 [6] LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=et_EE.UTF-8 [8] LC_NAME=C [9] LC_ADDRESS=C [10] LC_TELEPHONE=C [11] LC_MEASUREMENT=et_EE.UTF-8 [12] LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils [6] datasets methods base other attached packages: [1] GEOquery_2.28.0 Biobase_2.22.0 [3] BiocGenerics_0.8.0 RCurl_1.95-4.3 [5] bitops_1.0-6 loaded via a namespace (and not attached): [1] tools_3.1.1 XML_3.98-1.1 Thank you, Maria [[alternative HTML version deleted]]

PROcess GEOquery PROcess GEOquery • 4.5k views

ADD COMMENT • link updated 10.5 years ago by Levi Waldron ▴ 80 • written 10.5 years ago by Maria Kesa ▴ 30

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 1 day ago

United States

Hi Maria, Sometimes with online resources, there are momentary hiccups. I can currently download that dataset: > gse3398<-getGEO('GSE3398') ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE3nnn/GSE3398/matrix/ Found 7 file(s) GSE3398-GPL2648_series_matrix.txt.gz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 153k 100 153k 0 0 100k 0 0:00:01 0:00:01 --:--:-- 100k File stored at: /data3/tmp/RtmpOwnhbS/GPL2648.soft GSE3398-GPL2778_series_matrix.txt.gz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 206k 100 206k 0 0 133k 0 0:00:01 0:00:01 --:--:-- 133k File stored at: /data3/tmp/RtmpOwnhbS/GPL2778.soft GSE3398-GPL2832_series_matrix.txt.gz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 1060k 100 1060k 0 0 593k 0 0:00:01 0:00:01 --:--:-- 593k File stored at: /data3/tmp/RtmpOwnhbS/GPL2832.soft GSE3398-GPL2868_series_matrix.txt.gz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 253k 100 253k 0 0 167k 0 0:00:01 0:00:01 --:--:-- 167k File stored at: /data3/tmp/RtmpOwnhbS/GPL2868.soft GSE3398-GPL2904_series_matrix.txt.gz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 196k 100 196k 0 0 129k 0 0:00:01 0:00:01 --:--:-- 129k File stored at: /data3/tmp/RtmpOwnhbS/GPL2904.soft GSE3398-GPL2905_series_matrix.txt.gz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 1995k 100 1995k 0 0 1034k 0 0:00:01 0:00:01 --:--:-- 1034k File stored at: /data3/tmp/RtmpOwnhbS/GPL2905.soft GSE3398-GPL2906_series_matrix.txt.gz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 104k 100 104k 0 0 66850 0 0:00:01 0:00:01 --:--:-- 66867 File stored at: /data3/tmp/RtmpOwnhbS/GPL2906.soft As for point 2, I can't really help you with that one, as I know nothing about this experiment other than the cursory glance I just made at the GEO site. You might consider the GeneMeta package ( http://www.bioconductor.org/packages/release/bioc/html/GeneMeta.html), which is intended for the analysis of data from various sources. Best, Jim On Wed, Aug 27, 2014 at 4:18 PM, Maria Kesa <maria.kesa at="" gmail.com=""> wrote: > Hello:-), > > My name is Maria and my goal is to get normalized gene expression values > from this study http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE3398 > > I installed GEOQuery and it's dependencies RCurl and XML library. > > I have two questions: > 1. How do I resolve the error that is posted below, when I try to > use gse3398<-getGEO('GSE3398',GSEMatrix=TRUE) ? (I tried installing and > reinstalling RCurl and GEOQuery) > 2. How should I normalize the data, considering that there are multiple > platforms in the experiment? > 3. If point 1. can not be made to work, I found that it is possible to load > the files manually using the links like (Replacing GPL2648 with the > different platforms in the series) > > ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE3398/GSE3398-GPL 2648_series_matrix.txt.gz > . > My question is how do I process these files and put them into an eset in R? > As I ask in question 2, how do I get the normalized gene expression values > out of the data and get the gene names? > > Your help would be much appreciated! The error message that I get and the > sessionInfo is below. > > > gse3398<-getGEO('GSE3398',GSEMatrix=TRUE)Found 7 > file(s)GSE3398-GPL2648_series_matrix.txt.gzsh: 1: curl: not foundError in > file(con, "r") : cannot open the connectionIn addition: Warning messages:1: > In download.file(sprintf(" > ftp://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s", : > download had nonzero exit status2: In file(con, "r") : > cannot open file > '/tmp/RtmppUAQIH/GSE3398-GPL2648_series_matrix.txt.gz': No such file > or directory > > > > sessionInfo()R version 3.1.1 (2014-07-10) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 > [2] LC_NUMERIC=C > [3] LC_TIME=et_EE.UTF-8 > [4] LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=et_EE.UTF-8 > [6] LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=et_EE.UTF-8 > [8] LC_NAME=C > [9] LC_ADDRESS=C > [10] LC_TELEPHONE=C > [11] LC_MEASUREMENT=et_EE.UTF-8 > [12] LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils > [6] datasets methods base > > other attached packages: > [1] GEOquery_2.28.0 Biobase_2.22.0 > [3] BiocGenerics_0.8.0 RCurl_1.95-4.3 > [5] bitops_1.0-6 > > loaded via a namespace (and not attached): > [1] tools_3.1.1 XML_3.98-1.1 > > > Thank you, > > Maria > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]]

ADD COMMENT • link 10.5 years ago James W. MacDonald 68k

0

Entering edit mode

Levi Waldron ▴ 80

@levi-waldron-6357

Last seen 10.5 years ago

On Wed, Aug 27, 2014 at 4:18 PM, Maria Kesa <maria.kesa at="" gmail.com=""> wrote: > 2. How should I normalize the data, considering that there are multiple > platforms in the experiment? > It's not obvious from the GEO page or the paper why they used 7 platforms, but I believe they may be complementary and intended to be combined to provide something approaching whole-genome coverage. I don't envy you trying to normalize these spotted cDNA arrays, but I would do a standard normalization such as Loess, and do exploratory analysis such as seeing whether the different platforms have different batch effects (could be apparent if some platforms have many more differentially expressed genes, for example, or very different sample clustering patterns than other platforms). [[alternative HTML version deleted]]

ADD COMMENT • link 10.5 years ago Levi Waldron ▴ 80

Login before adding your answer.