Hi all.
I am trying to download a GSE file from the Gene Expression Omnibus and I am encountering a new error. Since the bug is still not fixed, I am attempting a "workaround".
Code should be placed in three backticks as shown below
library(GEOquery)
my_id <- "GSE12657"
readr::local_edition(1)
gse <- getGEO(my_id)
sessionInfo( )
Error in readBin(inn, what = raw(0L), size = 1L, n = BFR.SIZE) : error reading from the connection In addition: Warning message: In readBin(inn, what = raw(0L), size = 1L, n = BFR.SIZE) : invalid or incomplete compressed data
I am using a Windows 10 machine with R 4.1.2, Bioconductor 3.14, and all installed packages are current.
Thank you.
Thank you James.
Unfortunately, the subsequent code does not work as expected.
sampledata <- pData(z) Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘pData’ for signature ‘"list"’
I am following the advice of a postdoc in the hope of completing a course project and this is the code she suggested.
Update: I am attempting to extract all of the gene symbols from each GSM object on the assumption that these would be the names of the genes identified in the experiment and that I could then use Cytoscape in some capacity with them.
sampleData <- pData(z[[1]]) featureData <- fData(z[[1]])
The above code appears to resolve the issue above. Is there a convenient method to extract all of the gene symbols from each GSM listed in the sampleData object?
Heh. Postdocs... You should never listen to them ;-D
What you get from
getGEO
is a list, so really you should probably doIn which case
pData
andfData
will now work as expected.Oh, I missed your edit. I normally follow on email, so that's my excuse and I'm sticking with it.
You could use the data from GEO, but it's a pain, because it is just a reformulation of the Affy CSV files, which themselves are a pain to deal with. I have a function in my affycoretools package that might be useful to you.
And now the SYMBOL column has the gene symbols
Or if you just want the symbols without dealing with all the other stuff, you can use the functions from
AnnotationDbi
directly.So there is a decision you have to make as to which symbol you want for the multiples. The simplest and most naive is to take the first one.
Thank you James ;)
I am attempting to extract all of the gene symbols from each GSM object on the assumption that these would be the names of the genes identified in the experiment and that I could then use Cytoscape in some capacity with them.
Is there a convenient method to extract all of the gene symbols from each GSM listed in the sampleData object?
What you are calling GSM objects are just individual Affy arrays. And each one was used to infer the transcript abundance for a given sample. And by 'infer' what I really mean is that somebody processed some total RNA in such a way that it could be used to hybridize to a bunch of 25-mers that were grown on a silicone wafer, after which a fluorophore was attached and then quantified by measuring the intensity with a CCD camera. The pixel intensity from the picture was then used to generate an 'expression value' which is a unitless thing, based on how bright a spot on a silicone wafer was, and is meant to be proportional to the amount of mRNA was originally in the sample.
That long-winded paragraph is meant to back me up when I say that there are no 'genes identified in the experiment'. All you get are some random numbers that partially reflect the amount of a given transcript that was in the original sample, and can only be used by comparing to the random numbers for the same gene that were generated (preferably) at the same time in the same lab using the same chip type. And the results for the comparison are log fold changes, which give you an estimate of how much the underlying transcript changed between the groups you compared. So you could use those GEO data to find the genes that appear to change expression between glioblastoma samples and controls, but you cannot do anything else, like 'identify genes'. That's not a thing.
If you want to compare groups, then you almost surely want to use the
limma
package, which has a user guide (google 'limma user guide'), which is very long and very complete, and you should read carefully.