R: R: BioMart error occurred again
1
0
Entering edit mode
@mauedealiceit-3511
Last seen 10.2 years ago
I read that message and asked for some guidelines to query biomaRt in batch mode. The PDF file available from biomaRt on-line pages shows a number of useful ways to extract useful data but it does not mention any batch interrogation mode. I thought R CMD BATCH would be the way to do that. If so it will take a while. Basically I am trying to extract the 3utr sequence for each target gene transcript listed in data set hsTargets. Since I have to save to a file the miRNA identifier, the miRNA sequence, followed by all its target gene transcripts with their 3utr sequences, my R script loops on each miRNA identifier, reads out all its target gene transcript identifiers from hsTargets, and subits such an ENST list to biomaRt to get the relative 3UTR sequences: ## -------------------- GET 3UTR SEQUENCES FOR TARGET GENE TRANSCRIPTS gene_seq <- getSequence (id=tmp[,"target"],type="ensembl_transcrip t_id",seqType="3utr",mart=hmart) In addition, to the purpose of identifying the target transcripts in the output file I also ask biomaRt for some other target identifiers providing the ENST filter: gene_map <- getBM(attributes=c("hgnc_symbol","ensembl_gene_id","refse q_dna","ensembl_transcript_id"), filters = "ensembl_transcript_id", values=gene_seq[j,"ensembl_transcript_id"], mart=hmart) The typical output file looks like the example pasted at the bottom. My question is: how can I rewrite my R script so as to accomplish my task in batch mode ? I hope I won't have to get all the 3utr sequences for all the target gene transcripts listed in hsTargets. together. Thank you, Maura >hsa-miR-7 UGGAAGACUAGUGAUUUUGUUGU UGGAAGACUAGUGAUUUUGUUGU >GPRC5A|ENSG00000013588|ENST00000014914 CTCTGTCCTGAA ......................................................... ...................................................................... ............................................. ...................................................................... ...................................................................... ................................................................. >PSMA4|ENSG00000041357|ENST00000044462 AATCAGAGATTTTATTACTCATTTGGGGCACCATTTCAGTGTAAAAGCAGTCCTACTCTTCCACACTAGG AAGGCTTTAC TTTTTTTAACTGGTGCAGTGGGAAAATA.......................................... ...................................................................... ....................... ...................................................................... ...................................................................... ................................................................. >COPZ2|ENSG00000005243|ENST00000006101 AGGCTGTGGATTCAAGGCTCCCTGCCCCCCAGATCATTTCCCCAA......................... .......................................................... ...................................................................... ...................................................................... ................................................................. >PIGB|ENSG00000069943|ENST00000164305 ACTTTCCTAGATAAATTAACATT............................................... ...................................................................... ............................... ...................................................................... ...................................................................... ................................................................. >ZNF275|ENSG00000063587|ENST00000095634 AAACGCCCTGTGGTCCCGCGGGACAGGGACGGAGTCCCCAGAGGGGATGGCAGAGTCAAAGGAGATGAAC AGTTTT GTAGCGCTTATATATTTTGT.................................................. ...................................................................... .................................. ...................................................................... ...................................................................... ................................................................ tutti i telefonini TIM! [[alternative HTML version deleted]]
miRNA biomaRt miRNA biomaRt • 1.0k views
ADD COMMENT
0
Entering edit mode
@steffenstatberkeleyedu-2907
Last seen 10.2 years ago
Hi Maura, With "query in batch" I meant querying multiple IDs at once, not one at a time. There should be a way to convert your query from querying every id one by one to a query for everything in batch and then combine the results in R. For example: 1) you make a vector with all the target transcript ID's that are in your miRNA set and retrieve all 3utrs for all of them at once.: library(biomaRt) hmart=useMart("ensembl", dataset="hsapiens_gene_ensembl") targets = c("ENST00000014914","ENST00000044462","ENST00000006101","ENST000001643 05") targets3UTR= getSequence(id=targets,type="ensembl_transcript_id",seqType="3utr",mar t=hmart) 2) in a second query retrieve the gene symbols and ensembl gene ids for this set: idmap = getBM(attributes=c("hgnc_symbol","ensembl_gene_id","refseq_dna","ensem bl_transcript_id"),filters = "ensembl_transcript_id",values=targets, mart=hmart) Then in a next step you combine the information from targets3UTR and idmap in R. So all you need is two queries to biomaRt and then loop over the results in R to combine the data. Let me know if this solves your problem. Cheers, Steffen Cheers, Steffen > I read that message and asked for some guidelines to query biomaRt in > batch mode. > The PDF file available from biomaRt on-line pages shows a number of useful > ways to extract useful data but it > does not mention any batch interrogation mode. > I thought R CMD BATCH would be the way to do that. If so it will take a > while. > > Basically I am trying to extract the 3utr sequence for each target gene > transcript listed in data set hsTargets. > Since I have to save to a file the miRNA identifier, the miRNA sequence, > followed by all its target gene transcripts with their 3utr sequences, my > R script loops on each miRNA identifier, reads out all its target gene > transcript identifiers from > hsTargets, and subits such an ENST list to biomaRt to get the relative > 3UTR sequences: > > ## -------------------- GET 3UTR SEQUENCES FOR TARGET GENE TRANSCRIPTS > gene_seq <- getSequence > (id=tmp[,"target"],type="ensembl_transcript_id",seqType="3utr",mart= hmart) > > In addition, to the purpose of identifying the target transcripts in the > output file I also ask biomaRt for some other target identifiers providing > the ENST filter: > > gene_map <- > getBM(attributes=c("hgnc_symbol","ensembl_gene_id","refseq_dna","ens embl_transcript_id"), > filters = "ensembl_transcript_id", > values=gene_seq[j,"ensembl_transcript_id"], > mart=hmart) > > The typical output file looks like the example pasted at the bottom. > My question is: how can I rewrite my R script so as to accomplish my task > in batch mode ? > I hope I won't have to get all the 3utr sequences for all the target gene > transcripts listed in hsTargets. together. > > Thank you, > Maura > >>hsa-miR-7 > UGGAAGACUAGUGAUUUUGUUGU UGGAAGACUAGUGAUUUUGUUGU >>GPRC5A|ENSG00000013588|ENST00000014914 > CTCTGTCCTGAA > .................................................................... ...................................................................... .................................. > .................................................................... ...................................................................... ................................................................... >>PSMA4|ENSG00000041357|ENST00000044462 > AATCAGAGATTTTATTACTCATTTGGGGCACCATTTCAGTGTAAAAGCAGTCCTACTCTTCCACACTA GGAAGGCTTTAC > TTTTTTTAACTGGTGCAGTGGGAAAATA........................................ ...................................................................... ......................... > .................................................................... ...................................................................... ................................................................... >>COPZ2|ENSG00000005243|ENST00000006101 > AGGCTGTGGATTCAAGGCTCCCTGCCCCCCAGATCATTTCCCCAA....................... ............................................................ > .................................................................... ...................................................................... ................................................................... >>PIGB|ENSG00000069943|ENST00000164305 > ACTTTCCTAGATAAATTAACATT............................................. ...................................................................... ................................. > .................................................................... ...................................................................... ................................................................... >>ZNF275|ENSG00000063587|ENST00000095634 > AAACGCCCTGTGGTCCCGCGGGACAGGGACGGAGTCCCCAGAGGGGATGGCAGAGTCAAAGGAGATGA ACAGTTTT > GTAGCGCTTATATATTTTGT................................................ ...................................................................... .................................... > .................................................................... ...................................................................... .................................................................. > > > > > > > > > > > > > tutti i telefonini TIM! > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT

Login before adding your answer.

Traffic: 591 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6