Entering edit mode
mauede@alice.it
▴
870
@mauedealiceit-3511
Last seen 10.2 years ago
Actually I extracted the same information the old way, that is using a
loop which provided one refseq_dna at a time.
I know thsi is not expected with a high-level language like R. However
i could see that some ENST correspond to two different
HGNC symbols. Moreover the 3utr sequence is not available for all
ENSTs I have.
Thank you for your answer.
Regards,
Maura
-----Messaggio originale-----
Da: Sean Davis [mailto:seandavi@gmail.com]
Inviato: mer 29/07/2009 7.46
A: mauede@alice.it
Cc: Bioconductor List
Oggetto: Re: [BioC] Why am I finding a mismatch between refseq_dna and
ensembl_transcript_id ?
On Wed, Jul 29, 2009 at 12:01 AM, <mauede@alice.it> wrote:
> I downloaded the following file from miRDB
> http://mirdb.org/miRDB/download/MirTarget2_v3.0_prediction_result.tx
t.gz
>
> I have checked that miRDB Gene_Bank_Accession_Number (for Human it
is
> something like NM_xxxxx) correspond to BioMart "refseq_dna".
>
> I have a vector containing 253 Gene_Bank_Accession_Numbers
> length(tmp_miRNA_GB)
> [1] 253
> > tmp_miRNA_GB[1:5]
> [1] "NM_203390" "NM_024639" "NM_001017989" "NM_203331"
"NM_001879"
>
> I use such a vectos as input filter to getBM to obtain the
respective
> ensembl_transcript_id.
> Surprisingly onlly 246 ensembl_transcript_ids are found:
>
> > gene.map <- getBM (attributes =
> c("hgnc_symbol","ensembl_gene_id","refseq_dna","ensembl_transcript_i
d"),
> filters = "refseq_dna", values
=
> tmp_miRNA_GB, mart=hmart)
>
> > dim(gene.map)
> [1] 246 4
>
> I thought there would be a 1-1 correspondence between the two
attributes:
> "refseq_dna" and "ensembl_transcript_id"
> Am I mistaken ?
>
Hi, Maura.
Yes, unfortunately, there is not a 1-1 correspondence. Ensembl and
NCBI
(the curator of RefSeq) are independent organizations, each with
different
build policies and annotation processes for transcripts. So, in
general in
this field (genomics/bioinformatics), there is RARELY a 1-1
correspondence
between any two entities. I would suggest that 246/253 is actually
quite a
good result--I might have expected a bit less a priori.
Sean
tutti i telefonini TIM!
[[alternative HTML version deleted]]