Entering edit mode
mauede@alice.it
▴
870
@mauedealiceit-3511
Last seen 10.2 years ago
-----Messaggio originale-----
Da: seandavi@gmail.com per conto di Sean Davis
Inviato: lun 31/05/2010 23.47
A: mauede@alice.it
Cc: Steffen Durinck; michael watson (IAH-C); Stefano Rovetta; Giuseppe
Russo; Bioconductor List
Oggetto: Re: [BioC] R: why biomaRt cannot extract 3UTR sequences for
1941 ENSGxxxxx ?
On Mon, May 31, 2010 at 11:07 AM, <mauede@alice.it> wrote:
> I reinstalled all Bioconductor packages.
> I ran again my R script aimed at extracting 3UTR sequences of
validated
> gene-targets.
> Back to "hsa-mir-1" gene-targets ... I perfoemed the following
> verifications and testsS:
>
> > is.list(genes_map)
> [1] TRUE
> > is.vector(genes_map[,"ensembl_transcript_id"])
> [1] TRUE
> > length(genes_map[,"ensembl_transcript_id"])
> [1] 1941
>
> > genes_seq <- getSequence
>
(id=genes_map[,"ensembl_transcript_id"],type="ensembl_transcript_id",
> + seqType="3utr",mart=hmart)
> Error in value[[3L]](cond) :
> Request to BioMart web service failed. Verify if you are still
connected
> to the internet. Alternatively the BioMart web service is
temporarily down.
> > genes_seq <- getSequence
> (id=genes_map[1:100,"ensembl_transcript_id"],type="ensembl_transcrip
t_id",
> + seqType="3utr",mart=hmart)
> > dim(genes_seq)
> [1] 100 2
> > genes_seq <- getSequence
> (id=genes_map[1:1000,"ensembl_transcript_id"],type="ensembl_transcri
pt_id",
> + seqType="3utr",mart=hmart)
> Error in value[[3L]](cond) :
> Request to BioMart web service failed. Verify if you are still
connected
> to the internet. Alternatively the BioMart web service is
temporarily down.
> > genes_seq <- getSequence
> (id=genes_map[1:500,"ensembl_transcript_id"],type="ensembl_transcrip
t_id",
> + seqType="3utr",mart=hmart)
> > dim(genes_seq)
> [1] 500 2
> > genes_seq <- getSequence
> (id=genes_map[1:800,"ensembl_transcript_id"],type="ensembl_transcrip
t_id",
> + seqType="3utr",mart=hmart)
> > dim(genes_seq)
> [1] 800 2
> > genes_seq <- getSequence
> (id=genes_map[1:900,"ensembl_transcript_id"],type="ensembl_transcrip
t_id",
> + seqType="3utr",mart=hmart)
> > dim(genes_seq)
> [1] 900 2
>
> The above results show that my query is successful as long as the
number of
> 3UTR sequences
> requested is less than 1000. How come ? Is this a *magic number* ?
>
I don't see that 1000 is a magic number in your example. Could you
explain
how you came to that conclusion? With the exception of the first
query
which failed, your other queries worked. Perhaps if you tried your
longer
query again, it would work. If not, I would follow the instructions
in each
case in which your query fails and make sure that you are still
connected to
the internet and that the BioMart web service is still working.
Also, I have to point out that you have been on this list long enough
to
know that you MUST include the output of sessionInfo() and a
reproducible
example in order to get the best help. Also, Steffen (the author of
the
biomaRt package) has offered to take your list of ids and check it.
Perhaps
you should try following up on some of the answers you receive before
proceeding. Just a thought.... And to be clear, everyone here is
trying
hard to get you your answers as quickly as time permits. Help us to
help
you by trying to do as folks suggest rather than simply following up
with
more questions.
Sean
I ran and posted the results of some tests where I attempted to find
the upper bound of the ENST list length.
The results I posted show (at least this was my intention) that the
query fails when I ask for 1000 3UTR sequences and get the same error
message that pops up when I ask for 1941 ENST sequences. But it works
fine if I only ask for 100, 500, 800,900 ENST sequences ...
I did send the 1941 long ENST list in a previous email. What can I
provide to reproduce the error I am getting ?
While it's true I failed to include sessionInfo() output, it's also
true the answer I got boils down to be "... it works for me".
Therefore, my further question is "why am I not so lucky ?".
As I said, and proved, the same query seems to work for me only up to
a limited ENST list length.
I'd like to find out whether this upper limit depends upon the local
network configuration or my computer cofiguration or ... ?
I can overcome this stumbling block patching my script so as it won't
exceed the limiting data number in any query.
Still, though, if I cannot figure out the cause of such a limit, I
wonder whether it may be time dependent (network / traffic load, etc
...)
As for the connection with biomaRt, running the long query separately,
after the error occurred I could successfully run commands like
listAttributes(), listMart() ... which *I think* would fail if the
connecton was down.
Thank you,
Maura E
> -----Messaggio originale-----
> Da: Steffen Durinck [mailto:sdurinck@lbl.gov]
> Inviato: ven 28/05/2010 23.16
> A: michael watson (IAH-C)
> Cc: mauede@alice.it; Bioconductor List
> Oggetto: Re: [BioC] why biomaRt cannot extract 3UTR sequences for
1941
> ENSGxxxxx ?
>
> Hi Maura,
>
> This also works for me and duplicate transcript ids shouldn't give
> problems,
> you'll only get unique results back though.
> What version of biomaRt are you running?
> Would you be able to send me your complete transcript id list as an
rda so
> I
> can try the complete list?
>
> Cheers,
> Steffen
>
> On Fri, May 28, 2010 at 1:54 PM, michael watson (IAH-C) <
> michael.watson@bbsrc.ac.uk> wrote:
>
> > The following (small) code works for me:
> >
> > library(biomaRt)
> > mart <- useMart("ensembl","hsapiens_gene_ensembl")
> > ids <- c("ENST00000262187","ENST00000296271")
> > seq <- getSequence(id=ids, type="ensembl_transcript_id",
mart=mart,
> > seqType="3utr")
> > seq
> > ________________________________________
> > From: bioconductor-bounces@stat.math.ethz.ch [
> > bioconductor-bounces@stat.math.ethz.ch] On Behalf Of
mauede@alice.it [
> > mauede@alice.it]
> > Sent: 28 May 2010 21:41
> > To: Bioconductor List
> > Subject: [BioC] why biomaRt cannot extract 3UTR sequences for 1941
> > ENSGxxxxx ?
> >
> > I executed the following lines several times from a script as well
as
> > pasting them in an R shell.
> > Systematically biomaRt is failing.
> > The problem is to extract the 3UTR sequences corresponding to a
vector
> > containing 1941
> > Ensembl Transcript numbers (some are duplicated ... is this s
problem ?)
> > Please, find the failing instructions in the following including
the ENST
> > vector
> >
> > Any suggestion is welcome. Thank you,
> > Maura
> >
> > > hmart <- useMart('ensembl', dataset='hsapiens_gene_ensembl')
> > Checking attributes ... ok
> > Checking filters ... ok
> >
> > > genes_map[,"ensembl_transcript_id"]
> > [1] "ENST00000262187" "ENST00000296271" "ENST00000346166"
> > "ENST00000381570"
> >
> > <snip>
> >
> > [1937] "ENST00000400907" "ENST00000400908" "ENST00000440864"
> > "ENST00000309042"
> > [1941] "ENST00000254325"
> >
> > > genes_seq <- getSequence
> >
(id=genes_map[,"ensembl_transcript_id"],type="ensembl_transcript_id",
> > + seqType="3utr",mart=hmart)
> > Error in value[[3L]](cond) :
> > Request to BioMart web service failed. Verify if you are still
connected
> > to the internet. Alternatively the BioMart web service is
temporarily
> down.
> >
> >
> >
> > tutti i telefonini TIM!
> >
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor@stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor@stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
>
>
>
>
> tutti i telefonini TIM!
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
tutti i telefonini TIM!
[[alternative HTML version deleted]]