getGEO function to load files from other locations than GEO ?
1
0
Entering edit mode
@wolfgang-raffelsberger-1805
Last seen 10.3 years ago
Dear list, I'm trying to see if one could use the getGEO function (GEOquery package) to load files from other locations than GEO. In particular I'd like to load data from a local directory (under Windows). I see that getGEOfile() function passes the specific GEO url to download.file, but I haven't been successful in changing this ... Any suggestions ? Wolfgang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wolfgang Raffelsberger, PhD Laboratoire de BioInformatique et G?nomique Int?gratives IGBMC 1 rue Laurent Fries, 67404 Illkirch Strasbourg, France Tel (+33) 388 65 3300 Fax (+33) 388 65 3276 wolfgang.raffelsberger at igbmc.u-strasbg.fr
• 1.2k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 4 months ago
United States
On Friday 01 June 2007 09:16, Wolfgang Raffelsberger wrote: > Dear list, > > I'm trying to see if one could use the getGEO function (GEOquery > package) to load files from other locations than GEO. > In particular I'd like to load data from a local directory (under Windows). > I see that getGEOfile() function passes the specific GEO url to > download.file, but I haven't been successful in changing this ... > Any suggestions ? Hi, Wolfgang. See the help for getGEO. There is a filename argument that does exactly what you are describing. Sean
ADD COMMENT
0
Entering edit mode
Dear list, Sorry to bug you again on the issue of using the "getGEO()" function to load files from other locations than GEO... Sean Davis a ?crit : > See the help for getGEO. There is a filename argument that does exactly what > you are describing. > > Sean > I tried to spicify the file using the filname-argument : > in.file <- "GSM180487.txt" # just picking an example of an original GEO downloaded &decompressed file > deGEO1 <- getGEO(filename=in.file ) # i.e., from the directory with my file... Error in switch(as.character(first.entity[1]), sample = { : argument is missing, with no default This happens both on Linux & Windows (see sessionInfo at end of message) .. and the same error occurs when pasting path & file.name : > wdir <- getwd() > deGEO1 <- getGEO(filename=paste(wdir,in.file,sep="/") ) Error in switch(as.character(first.entity[1]), sample = { : argument is missing, with no default or with : > deGEO1 <- getGEO(filename=paste("file:/",wdir,in.file,sep="/") ) Error in switch(as.character(first.entity[1]), sample = { : argument is missing, with no default So, when checking getGEO or taking the code apart I can opne the open the connection with : > con <- file(paste(wdir,in.file,sep="/"), "r") # seems to work OK > ret <- parseGEO(con, GSElimits=NULL) # as if getGEO would call it .. Error in switch(as.character(first.entity[1]), sample = { : argument is missing, with no default Further checking parseGEO that holds the switch(...) experssion cited in the error message I got now stuck, when trying to see what findFirstEntity() does I get : > findFirstEntity(con) Error: could not find function "findFirstEntity" Bottomline, initially I thought the problem is improper syntax of the character-string for path & file to be read, so I've tried lots of combinations with in-path and '/'s. But now I'm not sure any more if this is really the problem ... Any ideas ? > sessionInfo() R version 2.5.0 (2007-04-23) x86_64-unknown-linux-gnu locale: C attached base packages: [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" [7] "base" other attached packages: GEOquery "2.0.5" and similarly : > sessionInfo() R version 2.5.0 (2007-04-23) i386-pc-mingw32 locale: LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETARY= French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252 attached base packages: [1] "splines" "grid" "tools" "stats" "graphics" "grDevices" "utils" "datasets" "tcltk" "methods" "base" other attached packages: snapCGH aCGH sma multtest cluster GLAD aws tilingArray pixmap geneplotter "1.4.0" "1.10.0" "0.5.15" "1.14.0" "1.11.5" "1.10.0" "1.3-2" "1.14.0" "0.4-7" "1.14.0" lattice annotate genefilter survival vsn strucchange sandwich zoo RColorBrewer affy "0.15-4" "1.14.1" "1.14.1" "2.31" "2.2.0" "1.3-2" "2.0-2" "1.3-1" "0.2-3" "1.14.0" affyio Biobase limma GEOquery svIO R2HTML svMisc svSocket svIDE "1.4.0" "1.14.0" "2.10.0" "2.0.5" "0.9-5" "1.58" "0.9-5" "0.9-5" "0.9-5" > Sean Davis a ?crit : > On Friday 01 June 2007 09:16, Wolfgang Raffelsberger wrote: > >> Dear list, >> >> I'm trying to see if one could use the getGEO function (GEOquery >> package) to load files from other locations than GEO. >> In particular I'd like to load data from a local directory (under Windows). >> I see that getGEOfile() function passes the specific GEO url to >> download.file, but I haven't been successful in changing this ... >> Any suggestions ? >> > > Hi, Wolfgang. > > See the help for getGEO. There is a filename argument that does exactly what > you are describing. > > Sean > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wolfgang Raffelsberger, PhD Laboratoire de BioInformatique et G?nomique Int?gratives IGBMC 1 rue Laurent Fries, 67404 Illkirch Strasbourg, France Tel (+33) 388 65 3300 Fax (+33) 388 65 3276 http://www-bio3d-igbmc.u-strasbg.fr/~wraff wolfgang.raffelsberger at igbmc.u-strasbg.fr
ADD REPLY
0
Entering edit mode
Wolfgang Raffelsberger wrote: > Dear list, > > Sorry to bug you again on the issue of using the "getGEO()" function > to load files from other locations than GEO... > > Sean Davis a ?crit : >> See the help for getGEO. There is a filename argument that does >> exactly what you are describing. >> Sean >> > > I tried to spicify the file using the filname-argument : > > > in.file <- "GSM180487.txt" # just picking an example of > an original GEO downloaded &decompressed file > > deGEO1 <- getGEO(filename=in.file ) # i.e., from the directory > with my file... > Error in switch(as.character(first.entity[1]), sample = { : > argument is missing, with no default Wolfgang, It looks like GSM180487 might not be a SOFT format file. If you run this command: readLines('GSM180487.txt',n=10) you should get this: [1] "^SAMPLE = GSM180487" [2] "!Sample_title = ACC 1" [3] "!Sample_geo_accession = GSM180487" [4] "!Sample_status = Public on Apr 10 2007" [5] "!Sample_submission_date = Apr 04 2007" [6] "!Sample_last_update_date = Apr 09 2007" [7] "!Sample_type = genomic" [8] "!Sample_channel_count = 2" [9] "!Sample_source_name_ch1 = ACC Tumor Sample 1" [10] "!Sample_organism_ch1 = Homo sapiens" If not, then you don't have a SOFT format file, most likely. Let me know if you need more direction. Sean
ADD REPLY
0
Entering edit mode
Hi Sean, as you suggested her the output from readLines() : > readLines('GSM180487.txt',n=10) [1] "TYPE\ttext\ttext\ttext\ttext\tinteger\tfloat\tfloat\ttext\ttext\ttext \tinteger\ [2] "FEPARAMS\tProtocol_Name\tProtocol_date\tScan_Date\tScan_ScannerName\t Scan_NumCh [3] "DATA\t44k_CGH_0605 (Editable)\t30-Jan-2006 18:01\t06-09-2006 13:25:24\tAgilent [4] "*" [5] "TYPE\tfloat\tfloat\tfloat\tinteger\tfloat\tfloat\tfloat\tinteger\tflo at\tfloat\ [6] "STATS\tgDarkOffsetAverage\tgDarkOffsetMedian\tgDarkOffsetStdDev\tgDar kOffsetNum [7] "DATA\t38.965\t39\t6.13591\t1000\t38.884\t39\t7.85039\t1000\t1.00937\t 1.0098\t3\ [8] "*" [9] "TYPE\tinteger\tinteger\tinteger\ttext\tinteger\ttext\tinteger\tintege r\ttext\tt [10] "FEATURES\tFeatureNum\tRow\tCol\taccessions\tSubTypeMask\tSubTypeName\ tProbeUID\ Most lines in the output above are very long (experiment meta-data), so I truncated since I believe you mainly want to see what kind of output I get... Indeed, it doesn't look at all like the output you descibed. Does this mean that when first downloading from GEO I get a different kind of format ? Amazingly the direct way of accessing directly at GEO (without downloading first & trying to acess the local copy) works without any difficulty... In the meantime I've managed to read tha data using read.maimages() from limma, so there's no more urgency to find a solution on this issue. As I know too little about various GEO formats I'm afraid this may get too complicted... or I got across some bad example (here I'm not reading CGH data). The route via getGEO() might have been more elegant/flexible, though ... Thank's for your help anyway, Wolfgang Sean Davis a ?crit : > Wolfgang Raffelsberger wrote: > >> Dear list, >> >> Sorry to bug you again on the issue of using the "getGEO()" function >> to load files from other locations than GEO... >> >> Sean Davis a ?crit : >> >>> See the help for getGEO. There is a filename argument that does >>> exactly what you are describing. >>> Sean >>> >>> >> I tried to spicify the file using the filname-argument : >> >> >>> in.file <- "GSM180487.txt" # just picking an example of >>> >> an original GEO downloaded &decompressed file >> >>> deGEO1 <- getGEO(filename=in.file ) # i.e., from the directory >>> >> with my file... >> Error in switch(as.character(first.entity[1]), sample = { : >> argument is missing, with no default >> > Wolfgang, > > It looks like GSM180487 might not be a SOFT format file. If you run > this command: > > readLines('GSM180487.txt',n=10) > > you should get this: > > [1] "^SAMPLE = GSM180487" > [2] "!Sample_title = ACC 1" > [3] "!Sample_geo_accession = GSM180487" > [4] "!Sample_status = Public on Apr 10 2007" > [5] "!Sample_submission_date = Apr 04 2007" > [6] "!Sample_last_update_date = Apr 09 2007" > [7] "!Sample_type = genomic" > [8] "!Sample_channel_count = 2" > [9] "!Sample_source_name_ch1 = ACC Tumor Sample 1" > [10] "!Sample_organism_ch1 = Homo sapiens" > > If not, then you don't have a SOFT format file, most likely. Let me > know if you need more direction. > > Sean > > > > -- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wolfgang Raffelsberger, PhD Laboratoire de BioInformatique et G?nomique Int?gratives IGBMC 1 rue Laurent Fries, 67404 Illkirch Strasbourg, France Tel (+33) 388 65 3300 Fax (+33) 388 65 3276 http://www-bio3d-igbmc.u-strasbg.fr/~wraff wolfgang.raffelsberger at igbmc.u-strasbg.fr
ADD REPLY
0
Entering edit mode
Wolfgang Raffelsberger wrote: > Hi Sean, > > as you suggested her the output from readLines() : > > readLines('GSM180487.txt',n=10) > [1] > "TYPE\ttext\ttext\ttext\ttext\tinteger\tfloat\tfloat\ttext\ttext\tte xt\tinteger\ > [2] > "FEPARAMS\tProtocol_Name\tProtocol_date\tScan_Date\tScan_ScannerName \tScan_NumCh > [3] "DATA\t44k_CGH_0605 (Editable)\t30-Jan-2006 18:01\t06-09-2006 > 13:25:24\tAgilent > [4] "*" > [5] > "TYPE\tfloat\tfloat\tfloat\tinteger\tfloat\tfloat\tfloat\tinteger\tf loat\tfloat\ > [6] > "STATS\tgDarkOffsetAverage\tgDarkOffsetMedian\tgDarkOffsetStdDev\tgD arkOffsetNum > [7] > "DATA\t38.965\t39\t6.13591\t1000\t38.884\t39\t7.85039\t1000\t1.00937 \t1.0098\t3\ > [8] "*" > [9] > "TYPE\tinteger\tinteger\tinteger\ttext\tinteger\ttext\tinteger\tinte ger\ttext\tt > [10] > "FEATURES\tFeatureNum\tRow\tCol\taccessions\tSubTypeMask\tSubTypeNam e\tProbeUID\ > > Most lines in the output above are very long (experiment meta-data), so > I truncated since I believe you mainly want to see what kind of output I > get... > Indeed, it doesn't look at all like the output you descibed. > Does this mean that when first downloading from GEO I get a different > kind of format ? > Wolfgang, I see where the confusion arises. GEO houses many formats of data in their supplemental files. If you use getGEO to download from GEO, you will always get the correct format for use by GEOquery. If you choose to download the supplemental files, the format can be anything. Indeed, you have downloaded an Agilent Feature Extraction file. There is not any way to determine the format from the many possible formats available for supplemental files from GEO. That is why SOFT format was created and used by the GEO group. > Amazingly the direct way of accessing directly at GEO (without > downloading first & trying to acess the local copy) works without any > difficulty... > > That makes sense. Note the .soft extension when you use getGEO(), which is different than .txt that you downloaded. > In the meantime I've managed to read tha data using read.maimages() from > limma, so there's no more urgency to find a solution on this issue. > As I know too little about various GEO formats I'm afraid this may get > too complicted... or I got across some bad example (here I'm not reading > CGH data). > The file that you downloaded is not a GEO format, which is where the confusion is arising. If you want to parse the supplemental files, then you will need to determine the file type and the correct parser for it. If you stick to the GEO soft format, then GEOquery will work just fine. > The route via getGEO() might have been more elegant/flexible, though ... > As you note above, getGEO() works just fine. The confusion arises because NCBI GEO also stores supplemental files, which, of course, GEOquery cannot parse. A fully general parser for all microarray data formats is well beyond the scope of GEOquery. I will think about how best to modify the documentation to make this absolutely clear. I hope that clarifies things a bit and sorry for the confusion. Sean
ADD REPLY

Login before adding your answer.

Traffic: 550 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6