Please keep dialogue on the list so others may learn. See below.
On Sun, May 22, 2011 at 8:58 PM, Stefanie Gerstberger
<stefanie.gerstberger at="" ymail.com=""> wrote:
> Hi Vincent,
> thanks for your reply. I had problems with biomaRt :
>> sessionInfo()
> R version 2.12.1 (2010-12-16)
This is out of date. External services can't be used reliably with
old versions of R. More below
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
> attached base packages:
> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base
> other attached packages:
> [1] biomaRt_2.6.0 ? ? Biostrings_2.18.2 IRanges_1.8.8
> loaded via a namespace (and not attached):
> [1] Biobase_2.10.0 RCurl_1.4-3 ? ?tools_2.12.1 ? XML_3.2-0
>>
>> library(biomaRt)
>> ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
>> protein = getSequence(id = "ENSG00000089280", type =
"ensembl_gene_id",
>> seqType = "peptide", mart = ensembl)
> Error in getBM(c(seqType, type), filters = type, values = id, mart =
mart,
> ?:
> ??Query ERROR: caught BioMart::Exception::Database: Could not
connect to
> mysql database ensembl_mart_62: DBI
> connect('database=ensembl_mart_62;host=dcc-qa-
db.oicr.on.ca;port=3306','bm_web',...)
> failed: Can't connect to MySQL server on 'dcc-qa-db.oicr.on.ca'
(113) at
> /srv/biomart_server/biomart.org/biomart-
perl/lib/BioMart/Configuration/DBLocation.pm
> line 98
I was unable to reproduce this error with a properly update version of
R/biomaRt. See further below
>> protein = getSequence(id = c(100, 5728), type = "entrezgene",
seqType =
>> "peptide", mart = ensembl)
> Error in getBM(c(seqType, type), filters = type, values = id, mart =
mart,
> ?:
> ??Query ERROR: caught BioMart::Exception::Database: Could not
connect to
> mysql database ensembl_mart_62: DBI
> connect('database=ensembl_mart_62;host=dcc-qa-
db.oicr.on.ca;port=3306','bm_web',...)
> failed: Can't connect to MySQL server on 'dcc-qa-db.oicr.on.ca'
(113) at
> /srv/biomart_server/biomart.org/biomart-
perl/lib/BioMart/Configuration/DBLocation.pm
> line 98
>>
> that's I guess an internal ensembl problem.
> However, I tried to circumvene this problem by just manually
downloading the
> mouse sequences at ensembl biomart server - I found that the files
only
> contained 4600 cDNA sequences or if downloading the peptide
sequences I only
I don't know what to say about this. However
> mens = useMart("ensembl", dataset = "mmusculus_gene_ensembl")
> p2 = getSequence(id = c(100, 5728), type = "entrezgene", seqType =
"peptide", mart = mens)
> dim(p2)
[1] 0 2
> protein = getSequence(id = "ENSMUSG00000057573", type =
"ensembl_gene_id", seqType = "peptide", mart = mens)
> dim(protein)
[1] 1 2
> protein = getSequence(id = "ENSMUSG00000066372", type =
"ensembl_gene_id", seqType = "peptide", mart = mens)
> dim(protein)
[1] 1 2
> sessionInfo()
R version 2.13.0 Patched (2011-04-14 r55443)
Platform: x86_64-apple-darwin10.6.0/x86_64 (64-bit)
locale:
[1] C
attached base packages:
[1] stats graphics grDevices datasets tools utils
methods
[8] base
other attached packages:
[1] biomaRt_2.8.0 weaver_1.17.0 codetools_0.2-8 digest_0.4.2
loaded via a namespace (and not attached):
[1] RCurl_1.5-0 XML_3.2-0
> received 2300 sequences. I translated the 4600 sequences using
Biostrings
> but quite a bit of sequences contain undefined nucleotides and no
ATG start
> codon or are ending in frameshift. But I'm very confused about
receiving
> only 4600 cDNA sequences.
> I know this part is not really for the Bioconductor list but I was
hoping
> that someone with experience with the ensembl mouse genome knows why
I'm
> encountering this - and whether there is a way in Bioconductor to
download
> the sequences not using Biomart. I have found a way now around it ?-
by
> simply ignoring ensembl and using refseq proteins downloaded from
UCSC.
> Using BiomaRt in R seemed to me the simplest solution to obtain the
> sequences - I don't currently know any other option.
> Thanks,
> Stefanie
>
>
>
>
>
>
> ________________________________
> Von: Vincent Carey <stvjc at="" channing.harvard.edu="">
> An: Stefanie Carola Gerstberger <scg74 at="" cornell.edu="">
> CC: "Bioconductor at r-project.org" <bioconductor at="" r-project.org="">
> Gesendet: Sonntag, den 22. Mai 2011, 19:29:08 Uhr
> Betreff: Re: [BioC] Ensembl mouse proteins
>
> What is the relationship of your question to bioconductor?? Are you
> using R to perform the download?? What functions in what packages,
> with
> what version?? Read the posting guide, please, and provide result of
> sessionInfo().
>
> On Sun, May 22, 2011 at 6:12 PM, Stefanie Carola Gerstberger
> <scg74 at="" cornell.edu=""> wrote:
>> Hi,
>> I have tried to download the mouse protein sequences from Biomart
Ensembl.
>> ?I only received 2203 protein sequences for mouse, including
isoforms. The
>> same results from downloading the Ensembl protein sequences through
UCSC
>> genome browser.I also encounter the problem for Xenopus tropicalis
- only
>> 4700 protein sequences. As reference point S.cerevisae has ?6700
sequences
>> in Ensembl biomart, human 87,000, Drosophila 22,000. Does anyone
know why
>> this is and how I can circumvene this problem to get a complete
list of
>> protein sequences for mouse and Xenopus?
>> Thanks,
>> Stefanie
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>>
https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>