Dear Obi,
thanks for the bug report. This problem is specific to the
output="list"
option of getBM. Please try the following code, which avoids the bug
and
works for me, using biomaRt from the bioc 1.8 release (note that the
script ran for 5 min wall clock time in my case):
##########################################################
library("biomaRt")
library("hgu133plus2")
probeids = ls(hgu133plus2ACCNUM )
mart = useMart("ensembl", "hsapiens_gene_ensembl")
print(system.time({
annotations=getBM(
attributes=c("affy_hg_u133_plus_2",
"ensembl_peptide_id","entrezgene",
"unified_uniprot_accession",
"uniprot_swissprot_accession"),
filter="affy_hg_u133_plus_2",
values=probeids, mart=mart,
na.value="NA")
}))
print(str(annotations))
##########################################################
> source("test.R")
Loading required package: XML
Loading required package: RCurl
Checking attributes and filters ... ok
[1] 27.934 0.516 319.800 0.000 0.000
`data.frame': 252222 obs. of 5 variables:
$ affy_hg_u133_plus_2 : chr "232806_s_at" "221904_at"
"232806_s_at" "23 2807_at" ...
$ ensembl_peptide_id : chr "ENSP00000340974"
"ENSP00000340974"
"ENSP00 000373360" "ENSP00000373360" ...
$ entrezgene : int NA NA NA NA 131408 131408 10752
10752 10752 27255 ...
$ unified_uniprot_accession : chr "Q8TA84" "Q8TA84" "Q6ZTF8"
"Q6ZTF8" ...
$ uniprot_swissprot_accession: chr "" "" "" "" ...
NULL
> length(unique(annotations$unified_uniprot_accession))
[1] 42193
> length(unique(annotations$affy_hg_u133_plus_2))
[1] 25188
#############################################################
Best wishes
Wolfgang
------------------------------------------------------------------
Wolfgang Huber EBI/EMBL Cambridge UK
http://www.ebi.ac.uk/huber
> Dear BioC list,
>
> I'm getting some very strange behaviour from biomaRt. The following
script works perfectly if I supply a single probe id to the getBM
function (i.e. values=probe_ids[1]). But, when I supply the entire
probe_ids list I get the following error.
>
> Error in postForm(paste(mart at host, "?", sep = ""), query =
xmlQuery) :
> <not set="">
>
> Once I get the above error, I get a whole bunch of new errors for
commands that would have worked before the error.
> examples:
>> annotations=getBM(attributes=c("affy_hg_u133_plus_2",
"ensembl_peptide_id","entrezgene", "unified_uniprot_accession",
"uniprot_swissprot_accession"), filter="affy_hg_u133_plus_2",
values=sample_gene, mart=mart, output="list", na.value="NA")
> Error in postForm(paste(mart at host, "?", sep = ""), query =
xmlQuery) :
> Couldn't resolve host 'www.biomart.org'
>
> It seems to kill my connection to biomart until I quit R and start
all over again. Can anyone help? Is there some kind of timeout for
such a large query? Has anyone gotten this sort of thing to work
before or can you suggest an alternative way to map all affy probe ids
to uniprot IDs in R?
>
> ####Start R script#####
> #Load the appropriate libraries
> library(affy)
> library(gcrma)
> library("annotate")
> library("biomaRt")
> library("hgu133plus2")
>
> setwd("/home/my_dir")
>
> #just.gcrma method
> #Get file list
> celfiles=list.files(path="/home/my_dir")
>
> #Do gcrma normalization
> gcrma_exprset=just.gcrma(filenames=celfiles,normalize=TRUE,type="ful
lmodel",verbose=TRUE,fast=FALSE,optimize.by="memory")
>
> probe_ids=geneNames(gcrma_exprset)
>
> mart <- useMart("ensembl", "hsapiens_gene_ensembl")
>
> annotations=getBM(attributes=c("affy_hg_u133_plus_2",
"ensembl_peptide_id","entrezgene", "unified_uniprot_accession",
"uniprot_swissprot_accession"), filter="affy_hg_u133_plus_2",
values=probe_ids_nc, mart=mart, output="list", na.value="NA")
> ####End R script#####