Question

R: R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence)

0

Entering edit mode

mauede@alice.it ▴ 870

@mauedealiceit-3511

Last seen 10.1 years ago

Thank you very much. I just realized the biomart server is up & running again. Now I have learnt that BioMart can extract a lot of data from Ensembl (from where I have been told to get the genes info) and can also download the validated miRNAs compressed files. I stress the main problem I am experienciing, though, is still open. In fact I have to find a piece of data that allows me to relate all the gene info I can get from BioMart querying Ensembl to the downloaded miRNAs info. This is because the miRNA identifier is not available through BioMart .... I wish I were mistaken. However, some other (unique ?) miRNA attribute, that is available through BioMart, is also present in the VALIDATED targets file that is downloadable in XLS format from miRecords. This piece of data would allow me to relate the gene 3UTS string to the targeting miRNA. The issue is that I do not know how often such miRecords file is updated, and the downloading is to be performed outside R environment. Maybe R might handle the download automatically through the R "system" function and then the XLS file can be processed through R package "RExcelInstaller" ..... just a speculation ... Regards, Maura -----Messaggio originale----- Da: michael watson (IAH-C) [mailto:michael.watson@bbsrc.ac.uk] Inviato: dom 28/06/2009 10.15 A: Steve Lianoglou Cc: mauede@alice.it; Sean Davis; bioconductor List Oggetto: RE: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) The power of Bioconductor :D So, some code would look like this: > mat <- gzcon(url("ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/CURRE NT/mature.fa.gz")) > matfas <- readFASTA(mat, strip.descs=TRUE) > matstar <- gzcon(url("ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/C URRENT/maturestar.fa.gz")) > matstarfas <- readFASTA(matstar, strip.descs=TRUE) -----Original Message----- From: Steve Lianoglou [mailto:mailinglist.honeypot@gmail.com] Sent: Sun 28/06/2009 8:51 AM To: michael watson (IAH-C) Cc: mauede@alice.it; Sean Davis; bioconductor List Subject: Re: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) > They'll be in fasta format, and whether or not Bioconductor can read > them in I have no idea - I use Bioperl for all my sequence handling. Yes, bioconductor can: the Biostrings package provides readFASTA and writeFASTA that handle this for you. -steve -- Steve Lianoglou Graduate Student: Physiology, Biophysics and Systems Biology Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos tutti i telefonini TIM! [[alternative HTML version deleted]]

miRNA Biophysics Biostrings biomaRt miRNA Biophysics Biostrings biomaRt • 1.6k views

ADD COMMENT • link 15.3 years ago mauede@alice.it ▴ 870

score 0 · Answer 1 · 2009-06-28

Sure. I have to do that. I am just struggling to get all the pieces together. To me most of those names have no meaning as I do not have any Biology background. Here in the following I am pasting s weird error ... maybe it is clear to you. I am proceeding with getting 10 consecutive attributes at a tiime to find the ones that I need, if any. So far I have successfully extracted the first 40 attributes from the listAttributes(mart) but now ... > library(biomaRt) > hmart <- useMart('ensembl', dataset='hsapiens_gene_ensembl') Checking attributes ... ok Checking filters ... ok > getBM(attributes=c("go_molecular_function_description", + "go_molecular_function_linkage_type", + "clone_based_ensembl_gene_name", + "clone_based_ensembl_transcript_name", + "clone_based_vega_gene_name", + "clone_based_vega_transcript_name", + "ccds", + "embl", + "entrezgene", + "ottt"), + filters='ensembl_transcript_id',value='ENST00000295228',mart=hmart) Error in getBM(attributes = c("go_molecular_function_description", "go_molecular_function_linkage_type", : Query ERROR: caught BioMart::Exception::Usage: Too many attributes selected for External References -----Messaggio originale----- Da: michael watson (IAH-C) [mailto:michael.watson@bbsrc.ac.uk] Inviato: dom 28/06/2009 16.50 A: mauede@alice.it; Steve Lianoglou Cc: Sean Davis; bioconductor List Oggetto: RE: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) Hi Maura Well, you can get gene:target info from miRBase, read in using CORNA or just read.table. You can get miRNA sequences also from miRBase using readFASTA. You can get ensembl gene sequences using biomaRt. You can read in miRecords data using RODBC. You can then link this all together using merge(), though I appreciate some work needs to be done on the list provided by readFASTA. Other than actually doing the work for you, I'm not sure what else we can do.... :) Mick -----Original Message----- From: mauede@alice.it [mailto:mauede@alice.it] Sent: Sun 28/06/2009 3:35 PM To: michael watson (IAH-C); Steve Lianoglou Cc: Sean Davis; bioconductor List Subject: R: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) Thank you very much. I just realized the biomart server is up & running again. Now I have learnt that BioMart can extract a lot of data from Ensembl (from where I have been told to get the genes info) and can also download the validated miRNAs compressed files. I stress the main problem I am experienciing, though, is still open. In fact I have to find a piece of data that allows me to relate all the gene info I can get from BioMart querying Ensembl to the downloaded miRNAs info. This is because the miRNA identifier is not available through BioMart .... I wish I were mistaken. However, some other (unique ?) miRNA attribute, that is available through BioMart, is also present in the VALIDATED targets file that is downloadable in XLS format from miRecords. This piece of data would allow me to relate the gene 3UTS string to the targeting miRNA. The issue is that I do not know how often such miRecords file is updated, and the downloading is to be performed outside R environment. Maybe R might handle the download automatically through the R "system" function and then the XLS file can be processed through R package "RExcelInstaller" ..... just a speculation ... Regards, Maura -----Messaggio originale----- Da: michael watson (IAH-C) [mailto:michael.watson@bbsrc.ac.uk] Inviato: dom 28/06/2009 10.15 A: Steve Lianoglou Cc: mauede@alice.it; Sean Davis; bioconductor List Oggetto: RE: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) The power of Bioconductor :D So, some code would look like this: > mat <- gzcon(url("ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/CURRE NT/mature.fa.gz")) > matfas <- readFASTA(mat, strip.descs=TRUE) > matstar <- gzcon(url("ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/C URRENT/maturestar.fa.gz")) > matstarfas <- readFASTA(matstar, strip.descs=TRUE) -----Original Message----- From: Steve Lianoglou [mailto:mailinglist.honeypot@gmail.com] Sent: Sun 28/06/2009 8:51 AM To: michael watson (IAH-C) Cc: mauede@alice.it; Sean Davis; bioconductor List Subject: Re: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) > They'll be in fasta format, and whether or not Bioconductor can read > them in I have no idea - I use Bioperl for all my sequence handling. Yes, bioconductor can: the Biostrings package provides readFASTA and writeFASTA that handle this for you. -steve -- Steve Lianoglou Graduate Student: Physiology, Biophysics and Systems Biology Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos tutti i telefonini TIM! tutti i telefonini TIM! [[alternative HTML version deleted]]

score 0 · Answer 2 · 2009-06-29

Since "mature.fa" and "maturestar.fa" contain the EXPERIMENTALLY VALIDATED miRNAs (is it TRUE ?) ,please, assume I have read "mature.fa" into a list. I have to retain only the miRNAs from humans. Therefore I havel erased all the list elements whose description does not start with "hsa". Am I mistaken ? In our present emergency situation I have to prepare a text file containing blocks of data described in the following. Each block contains a human VALIDATED miRNA identifier and sequence (Example: "hsa-miR-20a " "UAAAGUGCUUAUAGUGCAGGUAG") followed by the identifier and 3'UTR sequence of ALL genes that are targeted by such a miRNA. Here is what my output file should look like. I have no idea what to pick as target gene identifier. But I have to use the "hsa...." identifier for the human miRNAs. VALIDATED miRNA[1] identifer miRNA[1] sequence #BLOCK_1 start target-gene[1,1] 3'UTR sequence target-gene[1,2] 3'UTR sequence ............................................... target-gene[1,n] 3'UTR sequence #BLOCK_1 end VALIDATED miRNA[2] identifer miRNA[2] sequence #BLOCK_2 start target-gene[1,1] 3'UTR sequence target-gene[1,2] 3'UTR sequence ............................................... target-gene[1,m] 3'UTR sequence #BLOCK_2 end ..................................................................... ..................................................................... VALIDATED miRNA[k] identifer miRNA[k] sequence #BLOCK_k start target-gene[k,1] 3'UTR sequence target-gene[k,2] 3'UTR sequence ............................................... target-gene[k,j] 3'UTR sequence #BLOCK_k end I understand I can get the genes data and 3UTR sequences from Ensembl through BioMart. My problem is: given the VALIDATED miRNAs description from "mature.fa", for instance "hsa-miR-20a MIMAT0000075 Homo sapiens miR-20a" which attributes shall I use to get the identifier and relative 3'UTR sequence of ALL the genes that are target for the above described miRNA ? Someone has already told me there is no BioMart attribute returning the identifier "hsa-miR-20a". I ask whether there exist a BioMart attribute returning "MIMAT000007" or "miR-20a" ? In short, I am looking for the attributes that allow me to relate the miRNAs data from "mature.fa" with the genes data from Ensembl. The reason why I mentioned the VALIDATED file from miRecords is because that Excel file seems to contain miRNA identifiers that correspond to the Ensembl data returned by the attribute "hgnc_symbol"... if I am not mistaken. Sorry, I cannot answer your question "which attributes do you need ..." because I do not know which attributes allow me to match the miRNAs info from "mature.fa" with the genes info from Ensembl. I am proceeding by trial&error and bothering Biocoductor people ! Kind regards, Maura -----Messaggio originale----- Da: michael watson (IAH-C) [mailto:michael.watson@bbsrc.ac.uk] Inviato: dom 28/06/2009 23.55 A: mauede@alice.it; Steve Lianoglou Cc: Sean Davis; bioconductor List Oggetto: RE: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) Yes, but what are you trying to do? Biomart has a very complex structure, I admit that; but why do you need/want all those attributes? What are the attributes you need? This works: library(biomaRt) hmart <- useMart('ensembl', dataset='hsapiens_gene_ensembl') getBM(attributes=c("go_molecular_function_description", "go_molecular_function_linkage_type", "ensembl_gene_id", "ensembl_transcript_id"), filters='ensembl_transcript_id',value='ENST00000295228',mart=hmart) It gets the GO molecular function data for ensembl human transcript ENST00000295228. If that's what I want to do, then the code is right; if it's not, then the code is wrong. How does the query you specify below relate to your question on microRNAs? -----Original Message----- From: mauede@alice.it [mailto:mauede@alice.it] Sent: Sun 28/06/2009 6:29 PM To: michael watson (IAH-C); Steve Lianoglou Cc: Sean Davis; bioconductor List Subject: R: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) Sure. I have to do that. I am just struggling to get all the pieces together. To me most of those names have no meaning as I do not have any Biology background. Here in the following I am pasting s weird error ... maybe it is clear to you. I am proceeding with getting 10 consecutive attributes at a tiime to find the ones that I need, if any. So far I have successfully extracted the first 40 attributes from the listAttributes(mart) but now ... > library(biomaRt) > hmart <- useMart('ensembl', dataset='hsapiens_gene_ensembl') Checking attributes ... ok Checking filters ... ok > getBM(attributes=c("go_molecular_function_description", + "go_molecular_function_linkage_type", + "clone_based_ensembl_gene_name", + "clone_based_ensembl_transcript_name", + "clone_based_vega_gene_name", + "clone_based_vega_transcript_name", + "ccds", + "embl", + "entrezgene", + "ottt"), + filters='ensembl_transcript_id',value='ENST00000295228',mart=hmart) Error in getBM(attributes = c("go_molecular_function_description", "go_molecular_function_linkage_type", : Query ERROR: caught BioMart::Exception::Usage: Too many attributes selected for External References -----Messaggio originale----- Da: michael watson (IAH-C) [mailto:michael.watson@bbsrc.ac.uk] Inviato: dom 28/06/2009 16.50 A: mauede@alice.it; Steve Lianoglou Cc: Sean Davis; bioconductor List Oggetto: RE: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) Hi Maura Well, you can get gene:target info from miRBase, read in using CORNA or just read.table. You can get miRNA sequences also from miRBase using readFASTA. You can get ensembl gene sequences using biomaRt. You can read in miRecords data using RODBC. You can then link this all together using merge(), though I appreciate some work needs to be done on the list provided by readFASTA. Other than actually doing the work for you, I'm not sure what else we can do.... :) Mick -----Original Message----- From: mauede@alice.it [mailto:mauede@alice.it] Sent: Sun 28/06/2009 3:35 PM To: michael watson (IAH-C); Steve Lianoglou Cc: Sean Davis; bioconductor List Subject: R: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) Thank you very much. I just realized the biomart server is up & running again. Now I have learnt that BioMart can extract a lot of data from Ensembl (from where I have been told to get the genes info) and can also download the validated miRNAs compressed files. I stress the main problem I am experienciing, though, is still open. In fact I have to find a piece of data that allows me to relate all the gene info I can get from BioMart querying Ensembl to the downloaded miRNAs info. This is because the miRNA identifier is not available through BioMart .... I wish I were mistaken. However, some other (unique ?) miRNA attribute, that is available through BioMart, is also present in the VALIDATED targets file that is downloadable in XLS format from miRecords. This piece of data would allow me to relate the gene 3UTS string to the targeting miRNA. The issue is that I do not know how often such miRecords file is updated, and the downloading is to be performed outside R environment. Maybe R might handle the download automatically through the R "system" function and then the XLS file can be processed through R package "RExcelInstaller" ..... just a speculation ... Regards, Maura -----Messaggio originale----- Da: michael watson (IAH-C) [mailto:michael.watson@bbsrc.ac.uk] Inviato: dom 28/06/2009 10.15 A: Steve Lianoglou Cc: mauede@alice.it; Sean Davis; bioconductor List Oggetto: RE: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) The power of Bioconductor :D So, some code would look like this: > mat <- gzcon(url("ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/CURRE NT/mature.fa.gz")) > matfas <- readFASTA(mat, strip.descs=TRUE) > matstar <- gzcon(url("ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/C URRENT/maturestar.fa.gz")) > matstarfas <- readFASTA(matstar, strip.descs=TRUE) -----Original Message----- From: Steve Lianoglou [mailto:mailinglist.honeypot@gmail.com] Sent: Sun 28/06/2009 8:51 AM To: michael watson (IAH-C) Cc: mauede@alice.it; Sean Davis; bioconductor List Subject: Re: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) > They'll be in fasta format, and whether or not Bioconductor can read > them in I have no idea - I use Bioperl for all my sequence handling. Yes, bioconductor can: the Biostrings package provides readFASTA and writeFASTA that handle this for you. -steve -- Steve Lianoglou Graduate Student: Physiology, Biophysics and Systems Biology Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos tutti i telefonini TIM! tutti i telefonini TIM! tutti i telefonini TIM! [[alternative HTML version deleted]]

score 0 · Answer 3 · 2009-06-29

I have preprocessed the Fasta miRNAs files. I'd like to find an equivalent way to download and read in the file "http://microrna.sanger.ac.uk/cgi- bin/targets/v5/download.pl/arch.v5.txt.homo_sapiens.zip" without leaving R. Maybe I should dowload it firts using a system call and then use R unzip and finally read.table ? I doubt that read.table will work because it is not a matrix (constant rows and columns length). Thank you in advance for your help. Maura -----Messaggio originale----- Da: michael watson (IAH-C) [mailto:michael.watson@bbsrc.ac.uk] Inviato: dom 28/06/2009 16.50 A: mauede@alice.it; Steve Lianoglou Cc: Sean Davis; bioconductor List Oggetto: RE: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) Hi Maura Well, you can get gene:target info from miRBase, read in using CORNA or just read.table. You can get miRNA sequences also from miRBase using readFASTA. You can get ensembl gene sequences using biomaRt. You can read in miRecords data using RODBC. You can then link this all together using merge(), though I appreciate some work needs to be done on the list provided by readFASTA. Other than actually doing the work for you, I'm not sure what else we can do.... :) Mick -----Original Message----- From: mauede@alice.it [mailto:mauede@alice.it] Sent: Sun 28/06/2009 3:35 PM To: michael watson (IAH-C); Steve Lianoglou Cc: Sean Davis; bioconductor List Subject: R: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) Thank you very much. I just realized the biomart server is up & running again. Now I have learnt that BioMart can extract a lot of data from Ensembl (from where I have been told to get the genes info) and can also download the validated miRNAs compressed files. I stress the main problem I am experienciing, though, is still open. In fact I have to find a piece of data that allows me to relate all the gene info I can get from BioMart querying Ensembl to the downloaded miRNAs info. This is because the miRNA identifier is not available through BioMart .... I wish I were mistaken. However, some other (unique ?) miRNA attribute, that is available through BioMart, is also present in the VALIDATED targets file that is downloadable in XLS format from miRecords. This piece of data would allow me to relate the gene 3UTS string to the targeting miRNA. The issue is that I do not know how often such miRecords file is updated, and the downloading is to be performed outside R environment. Maybe R might handle the download automatically through the R "system" function and then the XLS file can be processed through R package "RExcelInstaller" ..... just a speculation ... Regards, Maura -----Messaggio originale----- Da: michael watson (IAH-C) [mailto:michael.watson@bbsrc.ac.uk] Inviato: dom 28/06/2009 10.15 A: Steve Lianoglou Cc: mauede@alice.it; Sean Davis; bioconductor List Oggetto: RE: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) The power of Bioconductor :D So, some code would look like this: > mat <- gzcon(url("ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/CURRE NT/mature.fa.gz")) > matfas <- readFASTA(mat, strip.descs=TRUE) > matstar <- gzcon(url("ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/C URRENT/maturestar.fa.gz")) > matstarfas <- readFASTA(matstar, strip.descs=TRUE) -----Original Message----- From: Steve Lianoglou [mailto:mailinglist.honeypot@gmail.com] Sent: Sun 28/06/2009 8:51 AM To: michael watson (IAH-C) Cc: mauede@alice.it; Sean Davis; bioconductor List Subject: Re: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) > They'll be in fasta format, and whether or not Bioconductor can read > them in I have no idea - I use Bioperl for all my sequence handling. Yes, bioconductor can: the Biostrings package provides readFASTA and writeFASTA that handle this for you. -steve -- Steve Lianoglou Graduate Student: Physiology, Biophysics and Systems Biology Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos tutti i telefonini TIM! tutti i telefonini TIM! [[alternative HTML version deleted]]