I get the following error while trying to use pData,
> eset2 <- getGEO(filename = "GSE15543_family.soft.gz")
Reading file....
Parsing....
Found 34 entities...
GPL570 (1 of 35 entities)
GSM388740 (2 of 35 entities)
GSM388741 (3 of 35 entities)
GSM388742 (4 of 35 entities)
GSM388743 (5 of 35 entities)
GSM388744 (6 of 35 entities)
GSM388745 (7 of 35 entities)
GSM388746 (8 of 35 entities)
GSM388747 (9 of 35 entities)
GSM388748 (10 of 35 entities)
GSM388749 (11 of 35 entities)
GSM388750 (12 of 35 entities)
GSM388751 (13 of 35 entities)
GSM388752 (14 of 35 entities)
GSM388753 (15 of 35 entities)
GSM388754 (16 of 35 entities)
GSM388755 (17 of 35 entities)
GSM388756 (18 of 35 entities)
GSM388757 (19 of 35 entities)
GSM388758 (20 of 35 entities)
GSM388759 (21 of 35 entities)
GSM388760 (22 of 35 entities)
GSM388761 (23 of 35 entities)
GSM388762 (24 of 35 entities)
GSM388763 (25 of 35 entities)
GSM388764 (26 of 35 entities)
GSM388765 (27 of 35 entities)
GSM388766 (28 of 35 entities)
GSM388767 (29 of 35 entities)
GSM388768 (30 of 35 entities)
GSM388769 (31 of 35 entities)
GSM388770 (32 of 35 entities)
GSM388771 (33 of 35 entities)
GSM388772 (34 of 35 entities)
> pData(eset2)
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘pData’ for signature ‘"GSE"’
> pData(eset2[[1]])
Error in eset2[[1]] : this S4 class is not subsettable
The above command works fine when > eset2 <- getGEO(filename = "GSE15543_family.soft.gz") is used.
Any suggestions on how to resolve the error?
While using > eset2 <- getGEO(filename = "GSE15543_family.soft.gz"), the file is downloaded every time and not used from the cached one . On the terminal, I use $Rscript file.R to compile
I prefer using getGEO('GSE15543'). I trouble that I face is , every time I run the R script the file,GSE15543,gets downloaded(i.e using Rscript file.R).Why does this happen? (To avoid this , I was trying to download and save the file.When I directly use
getGEO('GSE15543') in the terminal, the cached version is fetched.
Excuse me for the naive questions
See the help for getGEO. In particular, if you set `destdir` in your script, the files will be downloaded only once. The second time the cached file is used. If you do not set `destdir`, a temporary directory is used and that temporary directory only lasts as long as R runs.
Probably a really silly question,
I did as you advised,
eset <- getGEO('GSE15543', destdir = "/media/nat/entain/toy3/Bioconductor/data/")[[1]]
The cached file is fetched form the destination directory, but unfortunately I am not able to obtain the output in the terminal
nat@nat-HP-Pavilion-Notebook:/media/nat/entain/toy3/Bioconductor$ Rscript new.R
Loading required package: BiocGenerics
Loading required package: methods
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, cbind, colMeans, colnames,
colSums, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
grepl, intersect, is.unsorted, lapply, lengths, Map, mapply, match,
mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rowMeans, rownames, rowSums, sapply, setdiff, sort,
table, tapply, union, unique, unsplit, which, which.max, which.min
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Setting options('download.file.method.GEOquery'='auto')
Setting options('GEOquery.inmemory.gpl'=FALSE)
Found 1 file(s)
GSE15543_series_matrix.txt.gz
Using locally cached version: /media/nat/entain/toy3/Bioconductor/data//GSE15543_series_matrix.txt.gz
Parsed with column specification:
cols(
.default = col_double(),
ID_REF = col_character()
)
See spec(...) for full column specifications.
Using locally cached version of GPL570 found here:
/media/nat/entain/toy3/Bioconductor/data//GPL570.soft
^C
Execution halted
I had to exit after waiting for quite some time. When the destination directory is not specified the script runs fine. Please find the link to my code here
I'm not sure what went wrong.
Many thanks
Looks like GEOquery is working correctly--not sure why it is slower. You could try using a smaller GSE (GSE20, maybe) to see if the process completes. If you are running in two different settings, file system performance and memory limitations may impact speed of processing, but now I am speculating.
Sean, yes, the smaller file works!
I would like to ask for suggestions on what has to be done for larger files.
Many thanks for your time and attention.
The behavior you are seeing appears to be specific to your environment, so I do not really have a good set of suggestions other than to experiment. You might benefit from getting some local IT support to work with you.
What kind of a device is at /media/nat? Does it have enough space both for the file and for intermediate storage? Do you have full read / write permissions at that location? Please also post your sessionInfo(), and confirm that BiocInstaller::biocValid() reports that packages are up-to-date.
/media/nat is a directory on my laptop. Type(inode/directory) .I have full read and write permissions at that location. It has a total capacity of 212 GB and used is 20GB.
I asked about /media/ because 'usually' this is where one mounts removable media like thumb drives and so on. I guess when you reported available space it was from a command like
df -h /media/nat/entain/toy3/Bioconductor/data
You could try and debug by using `debug()` and then stepping through individual functions until something goes wrong; here's my session... I start by debugging some likely functions, one exported the other not...
I then evaluate the function, where 'fl' is the file path to my cached file. The function runs until it calls parseGEO
I then look at the help page ?browser and type 'n' to go to the next line of code
until I get to the next place I'd put a debug call, and then step through some more, eventually exiting
At some point your own code will freeze. You can then break, debug() that function, and try again. Repeat until you get to what seems like a basic function (e.g., read_csv) that seems like it 'should' work; you can print out things like variables while in the Browser> , so occasionally you will want to do a 'sanity check'.
I am following what you mentioned to debug,
It freezes at this point.
Sorry for being dumb, but I couldn't exactly understand how this "You can then break, debug() that function, and try again." has to be done.
So 'break' with cntrl-C or when at the Browser[]> prompt type Q. I'd then
and then start using the browser again. Here's the start of the function definition
I'd look at the variables that are defined and make sure they make sense
and then step through, checking as I go
etc. If that readLine doesn't look right, I'd start again (Browser[2]> Q, then invoke the function again) and this time instead of using 'n' to step through that line of code, I'd try to evaluate the line in the browser without the 'suppressWarnings()'
You'll have to use this approach to narrow down as much as possible what the problem is; if it is with readLines(), then I wouldn't go further -- readLines is from base R, and is likely to 'work'.
The file should be a plain text file, so you should be able to open it outside R using a plain text editor. It might help to do the same exercise but storing on a different part of your file system, e.g., '/tmp', and comparing things like file.info().