Entering edit mode
michael watson IAH-C
★
3.4k
@michael-watson-iah-c-378
Last seen 10.3 years ago
Why do you need all the fields?
Don't you just need mir name (e.g. hsa-let-7d) and ensembl transcript
id (e.g. ENST000000012345)?
-----Original Message-----
From: mauede@alice.it [mailto:mauede@alice.it]
Sent: Mon 29/06/2009 8:26 AM
To: Sean Davis
Cc: michael watson (IAH-C); Steve Lianoglou; bioconductor List
Subject: R: R: [BioC] R: R: R: how to find the VALIDATED pair (miRNA,
gene-3'UTR-sequence)
Yes. I opened and stared at file http://microrna.sanger.ac.uk/cgi-
bin/targets/v5/download.pl
many times.
I thought it would be possible to extract all the fields content in
there through BioMart queries.
Basically, the match between the miRNAs from "mature.fa" and their
respecive targeted genes from
http://microrna.sanger.ac.uk/cgi-bin/targets/v5/download.pl
has to be done scanning the two files manually (basic R functions).
Then some of the info extracted from
http://microrna.sanger.ac.uk/cgi-bin/targets/v5/download.pl
can be used with BioMart quesries to get the 3'URT sequances.
Did I get it right ?
I infer that not all the fields in file http://microrna.sanger.ac.uk
/cgi-bin/targets/v5/download.pl
can be extracted through BioMart queries (TRUE / FALSE) ?
Unluckily our group Biology professor, who could have helped with
nomenclature and where to find what, is hospitalized in critical
conditions
with a heart attack.
Thank you for your patience and understanding,
Maura
-----Messaggio originale-----
Da: Sean Davis [mailto:seandavi at gmail.com]
Inviato: lun 29/06/2009 4.58
A: mauede at alice.it
Cc: michael watson (IAH-C); Steve Lianoglou; bioconductor List
Oggetto: Re: R: [BioC] R: R: R: how to find the VALIDATED pair (miRNA,
gene-3'UTR-sequence)
On Sun, Jun 28, 2009 at 10:26 PM, <mauede at="" alice.it=""> wrote:
> Since "mature.fa" and "maturestar.fa" contain the EXPERIMENTALLY
> VALIDATED miRNAs (is it TRUE ?) ,please, assume I have read
"mature.fa"
> into a list.
> I have to retain only the miRNAs from humans. Therefore I havel
erased all
> the list elements whose description does not start with "hsa". Am I
mistaken
> ?
>
That is correct, yes.
>
> In our present emergency situation I have to prepare a text file
containing
> blocks of data described in the following.
> Each block contains a human VALIDATED miRNA identifier and sequence
> (Example: "hsa-miR-20a " "UAAAGUGCUUAUAGUGCAGGUAG")
> followed by the identifier and 3'UTR sequence of ALL genes that are
> targeted by such a miRNA.
> Here is what my output file should look like. I have no idea what to
pick
> as target gene identifier. But I have to use the "hsa...."
identifier for
> the human miRNAs.
>
> VALIDATED miRNA[1] identifer miRNA[1] sequence #BLOCK_1
start
> target-gene[1,1] 3'UTR sequence
> target-gene[1,2] 3'UTR sequence
> ...............................................
> target-gene[1,n] 3'UTR sequence
#BLOCK_1
> end
>
> VALIDATED miRNA[2] identifer miRNA[2] sequence #BLOCK_2
start
> target-gene[1,1] 3'UTR sequence
> target-gene[1,2] 3'UTR sequence
> ...............................................
> target-gene[1,m] 3'UTR sequence
#BLOCK_2
> end
>
>
.....................................................................
>
.....................................................................
>
> VALIDATED miRNA[k] identifer miRNA[k] sequence #BLOCK_k
start
> target-gene[k,1] 3'UTR sequence
> target-gene[k,2] 3'UTR sequence
> ...............................................
> target-gene[k,j] 3'UTR sequence
#BLOCK_k
> end
>
>
> I understand I can get the genes data and 3UTR sequences from
Ensembl
> through BioMart.
> My problem is: given the VALIDATED miRNAs description from
"mature.fa",
> for instance "hsa-miR-20a MIMAT0000075 Homo sapiens miR-20a"
> which attributes shall I use to get the identifier and relative
3'UTR
> sequence of ALL the genes that are target for the above described
miRNA ?
>
Again, Maura, this has been answered several times now.
http://microrna.sanger.ac.uk/cgi-bin/targets/v5/download.pl
> Someone has already told me there is no BioMart attribute returning
the
> identifier "hsa-miR-20a".
> I ask whether there exist a BioMart attribute returning
"MIMAT000007" or
> "miR-20a" ?
>
> In short, I am looking for the attributes that allow me to relate
the
> miRNAs data from "mature.fa" with the genes data from Ensembl.
>
This information is in the .txt file download from the site above.
>
> The reason why I mentioned the VALIDATED file from miRecords is
because
> that Excel file seems to contain miRNA identifiers that correspond
to
> the Ensembl data returned by the attribute "hgnc_symbol"... if I am
not
> mistaken.
>
> Sorry, I cannot answer your question "which attributes do you need
..."
> because I do not know which attributes allow me to match
> the miRNAs info from "mature.fa" with the genes info from Ensembl.
> I am proceeding by trial&error and bothering Biocoductor people !
>
-----Messaggio originale-----Da: michael watson (IAH-C) [
mailto:michael.watson at bbsrc.ac.uk <michael.watson at="" bbsrc.ac.uk="">]
> Inviato: dom 28/06/2009 23.55
> A: mauede at alice.it; Steve Lianoglou
> Cc: Sean Davis; bioconductor List
> Oggetto: RE: [BioC] R: R: R: how to find the VALIDATED pair (miRNA,
> gene-3'UTR-sequence)
>
> Yes, but what are you trying to do? Biomart has a very complex
structure,
> I admit that; but why do you need/want all those attributes? What
are the
> attributes you need?
>
> This works:
>
> library(biomaRt)
> hmart <- useMart('ensembl', dataset='hsapiens_gene_ensembl')
> getBM(attributes=c("go_molecular_function_description",
> "go_molecular_function_linkage_type",
> "ensembl_gene_id",
> "ensembl_transcript_id"),
>
> filters='ensembl_transcript_id',value='ENST00000295228',mart=hmart)
>
> It gets the GO molecular function data for ensembl human transcript
> ENST00000295228. If that's what I want to do, then the code is
right; if
> it's not, then the code is wrong.
>
> How does the query you specify below relate to your question on
microRNAs?
>
> -----Original Message-----
> From: mauede at alice.it [mailto:mauede at alice.it <mauede at="" alice.it="">]
> Sent: Sun 28/06/2009 6:29 PM
> To: michael watson (IAH-C); Steve Lianoglou
> Cc: Sean Davis; bioconductor List
> Subject: R: [BioC] R: R: R: how to find the VALIDATED pair (miRNA,
> gene-3'UTR-sequence)
>
> Sure. I have to do that. I am just struggling to get all the pieces
> together. To me most of those names have no meaning as I do not have
any
> Biology background.
> Here in the following I am pasting s weird error ... maybe it is
clear to
> you.
> I am proceeding with getting 10 consecutive attributes at a tiime to
find
> the ones that I need, if any.
> So far I have successfully extracted the first 40 attributes from
the
> listAttributes(mart) but now ...
>
> > library(biomaRt)
> > hmart <- useMart('ensembl', dataset='hsapiens_gene_ensembl')
> Checking attributes ... ok
> Checking filters ... ok
> > getBM(attributes=c("go_molecular_function_description",
> + "go_molecular_function_linkage_type",
> + "clone_based_ensembl_gene_name",
> + "clone_based_ensembl_transcript_name",
> + "clone_based_vega_gene_name",
> + "clone_based_vega_transcript_name",
> + "ccds",
> + "embl",
> + "entrezgene",
> + "ottt"),
> +
> filters='ensembl_transcript_id',value='ENST00000295228',mart=hmart)
> Error in getBM(attributes = c("go_molecular_function_description",
> "go_molecular_function_linkage_type", :
> Query ERROR: caught BioMart::Exception::Usage: Too many attributes
> selected for External References
>
>
>
> -----Messaggio originale-----
> Da: michael watson (IAH-C) [mailto:michael.watson at
bbsrc.ac.uk<michael.watson at="" bbsrc.ac.uk="">
> ]
> Inviato: dom 28/06/2009 16.50
> A: mauede at alice.it; Steve Lianoglou
> Cc: Sean Davis; bioconductor List
> Oggetto: RE: [BioC] R: R: R: how to find the VALIDATED pair (miRNA,
> gene-3'UTR-sequence)
>
> Hi Maura
>
> Well, you can get gene:target info from miRBase, read in using CORNA
or
> just read.table.
> You can get miRNA sequences also from miRBase using readFASTA.
> You can get ensembl gene sequences using biomaRt.
> You can read in miRecords data using RODBC.
>
> You can then link this all together using merge(), though I
appreciate some
> work needs to be done on the list provided by readFASTA.
>
> Other than actually doing the work for you, I'm not sure what else
we can
> do.... :)
>
> Mick
>
> -----Original Message-----
> From: mauede at alice.it [mailto:mauede at alice.it <mauede at="" alice.it="">]
> Sent: Sun 28/06/2009 3:35 PM
> To: michael watson (IAH-C); Steve Lianoglou
> Cc: Sean Davis; bioconductor List
> Subject: R: [BioC] R: R: R: how to find the VALIDATED pair (miRNA,
> gene-3'UTR-sequence)
>
> Thank you very much.
> I just realized the biomart server is up & running again.
> Now I have learnt that BioMart can extract a lot of data from
Ensembl (from
> where I have been told to get the genes info)
> and can also download the validated miRNAs compressed files.
>
> I stress the main problem I am experienciing, though, is still open.
> In fact I have to find a piece of data that allows me to relate all
the
> gene info I can get from BioMart querying Ensembl
> to the downloaded miRNAs info. This is because the miRNA identifier
is not
> available through BioMart .... I wish I were mistaken.
>
> However, some other (unique ?) miRNA attribute, that is available
through
> BioMart, is also present in the VALIDATED targets file that is
downloadable
> in XLS format from miRecords. This piece of data would allow me to
relate
> the gene 3UTS string to the targeting miRNA.
> The issue is that I do not know how often such miRecords file is
updated,
> and the downloading is to be performed outside R environment.
> Maybe R might handle the download automatically through the R
"system"
> function and then the XLS file can be processed through R package
> "RExcelInstaller" ..... just a speculation ...
>
> Regards,
> Maura
>
>
> -----Messaggio originale-----
> Da: michael watson (IAH-C) [mailto:michael.watson at
bbsrc.ac.uk<michael.watson at="" bbsrc.ac.uk="">
> ]
> Inviato: dom 28/06/2009 10.15
> A: Steve Lianoglou
> Cc: mauede at alice.it; Sean Davis; bioconductor List
> Oggetto: RE: [BioC] R: R: R: how to find the VALIDATED pair (miRNA,
> gene-3'UTR-sequence)
>
> The power of Bioconductor :D
>
> So, some code would look like this:
>
> > mat <- gzcon(url("
> ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/CURRENT/mature.fa.gz"))
> > matfas <- readFASTA(mat, strip.descs=TRUE)
> > matstar <- gzcon(url("
> ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/CURRENT/maturestar.fa.g
z"))
> > matstarfas <- readFASTA(matstar, strip.descs=TRUE)
>
>
> -----Original Message-----
> From: Steve Lianoglou [mailto:mailinglist.honeypot at
gmail.com<mailinglist.honeypot at="" gmail.com="">
> ]
> Sent: Sun 28/06/2009 8:51 AM
> To: michael watson (IAH-C)
> Cc: mauede at alice.it; Sean Davis; bioconductor List
> Subject: Re: [BioC] R: R: R: how to find the VALIDATED pair (miRNA,
> gene-3'UTR-sequence)
>
> > They'll be in fasta format, and whether or not Bioconductor can
read
> > them in I have no idea - I use Bioperl for all my sequence
handling.
>
>
> Yes, bioconductor can: the Biostrings package provides readFASTA and
> writeFASTA that handle this for you.
>
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Physiology, Biophysics and Systems Biology
> Weill Medical College of Cornell University
>
> Contact Info:
http://cbio.mskcc.org/~lianos<http: cbio.mskcc.org="" %7elianos="">
>
>
>
>
>
>
>
>
>
>
> Alice Messenger ;-) chatti anche con gli amici di Windows Live
Messenger e
> tutti i telefonini TIM!
> Vai su
http://maileservizi.alice.it/alice_messenger/index.html?pmk=footer
>
>
>
>
> Alice Messenger ;-) chatti anche con gli amici di Windows Live
Messenger e
> tutti i telefonini TIM!
> Vai su
http://maileservizi.alice.it/alice_messenger/index.html?pmk=footer
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Alice Messenger ;-) chatti anche con gli amici di Windows Live
Messenger e
> tutti i telefonini TIM!
> Vai su
http://maileservizi.alice.it/alice_messenger/index.html?pmk=footer
>
Alice Messenger ;-) chatti anche con gli amici di Windows Live
Messenger e tutti i telefonini TIM!
Vai su
http://maileservizi.alice.it/alice_messenger/index.html?pmk=footer