R: Why am I finding a mismatch between refseq_dna and ensembl_transcript_id ?
1
0
Entering edit mode
@mauedealiceit-3511
Last seen 10.2 years ago
Actually I extracted the same information the old way, that is using a loop which provided one refseq_dna at a time. I know thsi is not expected with a high-level language like R. However i could see that some ENST correspond to two different HGNC symbols. Moreover the 3utr sequence is not available for all ENSTs I have. Thank you for your answer. Regards, Maura -----Messaggio originale----- Da: Sean Davis [mailto:seandavi@gmail.com] Inviato: mer 29/07/2009 7.46 A: mauede@alice.it Cc: Bioconductor List Oggetto: Re: [BioC] Why am I finding a mismatch between refseq_dna and ensembl_transcript_id ? On Wed, Jul 29, 2009 at 12:01 AM, <mauede@alice.it> wrote: > I downloaded the following file from miRDB > http://mirdb.org/miRDB/download/MirTarget2_v3.0_prediction_result.tx t.gz > > I have checked that miRDB Gene_Bank_Accession_Number (for Human it is > something like NM_xxxxx) correspond to BioMart "refseq_dna". > > I have a vector containing 253 Gene_Bank_Accession_Numbers > length(tmp_miRNA_GB) > [1] 253 > > tmp_miRNA_GB[1:5] > [1] "NM_203390" "NM_024639" "NM_001017989" "NM_203331" "NM_001879" > > I use such a vectos as input filter to getBM to obtain the respective > ensembl_transcript_id. > Surprisingly onlly 246 ensembl_transcript_ids are found: > > > gene.map <- getBM (attributes = > c("hgnc_symbol","ensembl_gene_id","refseq_dna","ensembl_transcript_i d"), > filters = "refseq_dna", values = > tmp_miRNA_GB, mart=hmart) > > > dim(gene.map) > [1] 246 4 > > I thought there would be a 1-1 correspondence between the two attributes: > "refseq_dna" and "ensembl_transcript_id" > Am I mistaken ? > Hi, Maura. Yes, unfortunately, there is not a 1-1 correspondence. Ensembl and NCBI (the curator of RefSeq) are independent organizations, each with different build policies and annotation processes for transcripts. So, in general in this field (genomics/bioinformatics), there is RARELY a 1-1 correspondence between any two entities. I would suggest that 246/253 is actually quite a good result--I might have expected a bit less a priori. Sean tutti i telefonini TIM! [[alternative HTML version deleted]]
Annotation biomaRt Annotation biomaRt • 1.1k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States
On Wed, Jul 29, 2009 at 3:35 AM, <mauede@alice.it> wrote: > Actually I extracted the same information the old way, that is using a > loop which provided one refseq_dna at a time. > I know thsi is not expected with a high-level language like R. However i > could see that some ENST correspond to two different > HGNC symbols. Moreover the 3utr sequence is not available for all ENSTs I > have. > Not all transcripts have a 3'utr. If you want to check your code, you can always go to the Ensembl browser to see what it shows for those transcripts for which the 3'utr is missing. Sean > > Thank you for your answer. > Regards, > Maura > > > -----Messaggio originale----- > Da: Sean Davis [mailto:seandavi@gmail.com <seandavi@gmail.com>] > Inviato: mer 29/07/2009 7.46 > A: mauede@alice.it > Cc: Bioconductor List > Oggetto: Re: [BioC] Why am I finding a mismatch between refseq_dna and > ensembl_transcript_id ? > > On Wed, Jul 29, 2009 at 12:01 AM, <mauede@alice.it> wrote: > > > I downloaded the following file from miRDB > > http://mirdb.org/miRDB/download/MirTarget2_v3.0_prediction_result. txt.gz > > > > I have checked that miRDB Gene_Bank_Accession_Number (for Human it is > > something like NM_xxxxx) correspond to BioMart "refseq_dna". > > > > I have a vector containing 253 Gene_Bank_Accession_Numbers > > length(tmp_miRNA_GB) > > [1] 253 > > > tmp_miRNA_GB[1:5] > > [1] "NM_203390" "NM_024639" "NM_001017989" "NM_203331" > "NM_001879" > > > > I use such a vectos as input filter to getBM to obtain the respective > > ensembl_transcript_id. > > Surprisingly onlly 246 ensembl_transcript_ids are found: > > > > > gene.map <- getBM (attributes = > > c("hgnc_symbol","ensembl_gene_id","refseq_dna","ensembl_transcript _id"), > > filters = "refseq_dna", values = > > tmp_miRNA_GB, mart=hmart) > > > > > dim(gene.map) > > [1] 246 4 > > > > I thought there would be a 1-1 correspondence between the two attributes: > > "refseq_dna" and "ensembl_transcript_id" > > Am I mistaken ? > > > > Hi, Maura. > > Yes, unfortunately, there is not a 1-1 correspondence. Ensembl and NCBI > (the curator of RefSeq) are independent organizations, each with different > build policies and annotation processes for transcripts. So, in general in > this field (genomics/bioinformatics), there is RARELY a 1-1 correspondence > between any two entities. I would suggest that 246/253 is actually quite a > good result--I might have expected a bit less a priori. > > Sean > > > > Alice Messenger ;-) chatti anche con gli amici di Windows Live Messenger e > tutti i telefonini TIM! > Vai su http://maileservizi.alice.it/alice_messenger/index.html?pmk=footer > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 686 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6