I'm trying to parse the probe set ids from the file 'GSE15543' ,the GDS file name of the same study is GDS4027.
There is a problem while parsing the probe ids from GSE.
GSE file:
eset2 <-
getGEO
('GSE15543')[[1]]
fData(eset2)[nrow(fData(eset)),] #nrow = 54675
Output:
ID GB_ACC SPOT_ID Species Scientific Name Annotation Date Sequence Type NA.41561 <NA> <NA> <NA> <NA> <NA> <NA> Sequence Source Target Description Representative Public ID Gene Title NA.41561 <NA> <NA> <NA> <NA> Gene Symbol ENTREZ_GENE_ID RefSeq Transcript ID Gene Ontology Biological Process NA.41561 <NA> <NA> <NA> <NA> Gene Ontology Cellular Component Gene Ontology Molecular Function NA.41561
GDS file:
gds <- getGEO('GDS4027')
eset = GDS2eSet(gds)
fData(eset)[nrow(fData(eset)),] #nrow = 54675
Output:
ID Gene title Gene symbol Gene ID UniGene title AFFX-TrpnX-M_at AFFX-TrpnX-M_at UniGene symbol UniGene ID Nucleotide Title GI GenBank Accession AFFX-TrpnX-M_at NA Platform_CLONEID Platform_ORF Platform_SPOTID Chromosome location AFFX-TrpnX-M_at --Control Chromosome annotation GO:Function GO:Process GO:Component GO:Function ID AFFX-TrpnX-M_at GO:Process ID GO:Component ID AFFX-TrpnX-M_at
As displayed in the above outputs, I am not able to obtain the probe id while using the expression set created using gse file.
In short, the trouble is
> rownames(exprs(eset2))[1] [1] "1007_s_at" > rownames(exprs(eset2))[54675] [1] "NA.41561"
I'm not able to parse all the probe ids (e.g "NA.41561") and therefore not able to map these to gene symbols.
I would like to parse all the probe set ids from the gse file, map the probe ids to the gene symbols. Any help will be much appreciated.
I'm not sure what is going on. This is what I see:
I'll look into the issue on Windows when I get a chance.
On Linux,
cannot open URL 'http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GPL570&form=text&view=full'
sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS
locale:
[1] LC_CTYPE=en_IN LC_NUMERIC=C LC_TIME=en_IN
[4] LC_COLLATE=en_IN LC_MONETARY=en_IN LC_MESSAGES=en_IN
[7] LC_PAPER=en_IN LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_IN LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] GEOquery_2.36.0 Biobase_2.30.0 BiocGenerics_0.16.1
loaded via a namespace (and not attached):
[1] RCurl_1.95-4.11 bitops_1.0-6 XML_3.98-1.16
Any suggestion on how to resolve this error?
Yes. Use a current version of R/Bioconductor. You are using a version that is almost 3 years old, which is no longer supported. As Sean has already shown, the current version works fine.