Hi Marco,
Ensembl maps everything to the transcript level and when there are
multiple transcripts for one gene, a query will return multiple hits
for
that gene.
To see this better you could add the "ensembl_transcript_id" to your
query:
probe.list <-
getBM(attributes=c("ensembl_gene_id","ensembl_transcript_id","affy_hg_
u133_plus_2"),filters="affy_hg_u133_plus_2",
values=probes, mart=mart)
You'll see that you'll get a different transcript and that on this
level there is no redundancy.
The mapping to the transcript level is a choice of the Ensembl team
and
we can not change this. It makes sense for other annotation
information
such as protein domains, some alternative spliced transcripts might
have
a certain domain and other transcripts of the same gene might not
have
this domain. Or if you would query for 3'UTRs by mapping to the
transcript level you can retrieve all different UTRs associated with a
gene. Different transcripts of the same gene might even have different
functions and the current strategy would allow transcript specific GO
annotations...
Best regards,
Steffen
marco zucchelli wrote:
> Hi Steffen,
>
> one more question: In the example i reported before seems like some
> probes are reported twice,
> i.e. 207893_at is listed 2 times matched to the same gene ID.
Totally
> the "probes" vector contains the probes from hgu133plus2 (54675) but
> the query returns 66565 rows.
>
> I do not understand really the meaning of this ..
>
> Regards
>
> Marco
>
> probe.list <-
> getBM(attributes=c("ensembl_gene_id","affy_hg_u133_plus_2"),filters=
"affy_hg_u133_plus_2",
> values=probes, mart=mart)
>
> head(probes.list)
>
> ensembl_gene_id affy_hg_u133_plus_2
> 1 ENSG00000184895 207893_at
> 2 ENSG00000184895 207893_at
> 3 ENSG00000129824 201909_at
> 4 ENSG00000129824 201909_at
> 5 ENSG00000067646 207247_s_at
> 6 ENSG00000067646 207246_at
>
>
>
> On 4/3/07, *Steffen Durinck * <durincks at="" mail.nih.gov=""> <mailto:durincks at="" mail.nih.gov="">> wrote:
>
> Hi Marco,
>
> It matches the transcripts and then maps those transcripts to
the
> genes,
> even if you don't include the transcript id in the query.
> To see this you could set attributes =
>
c("ensembl_gene_id","ensembl_transcript_id","affy_hg_u133_plus_2") in
> your query. Also if Ensembl didn't find a match for the affy
> probe then
> it won't be included in the output and if they find multiple
matches
> then all of them will be returned.
>
> For the second part of your question: No, the ordering is
random so
> you'll have reorder the output with e.g. the match function or
loop
> over it.
>
> Cheers,
> Steffen
>
> marco zucchelli wrote:
> > Steffen,
> >
> > Anyway does this procedure match the affy_ID to the specific
> > transcript(s) that that probeset is targetting or does it
match
> to it
> > to a gene and then gets all the available transcripts for the
gene?
> >
> > Morover, it seems that the returned values from getBM are not
> ordered
> > as the input values.
> > Infact, if I use:
> >
> > head(probes)
> > [1] "AFFX-BioB-5_at" "AFFX-BioB-M_at" "AFFX-BioB-3_at"
> > "AFFX-BioC-5_at" "AFFX-BioC-3_at" "AFFX-BioDn-5_at"
> >
> > probe.list <-
> >
> getBM(attributes=c("ensembl_gene_id","affy_hg_u133_plus_2"),filt
ers="affy_hg_u133_plus_2",
> > values=probes, mart=mart)
> >
> > head( probes.list)
> >
> > ensembl_gene_id affy_hg_u133_plus_2
> > 1 ENSG00000184895 207893_at
> > 2 ENSG00000184895 207893_at
> > 3 ENSG00000129824 201909_at
> > 4 ENSG00000129824 201909_at
> > 5 ENSG00000067646 207247_s_at
> > 6 ENSG00000067646 207246_at
> >
> > Is there any rule based on which the probes are ordered by
getBM?
> > Or I am doing somethign wrong?
> >
> >
> > Marco
> >
> >
> >
> > On 3/30/07, *Steffen Durinck* <durincks at="" mail.nih.gov=""> <mailto:durincks at="" mail.nih.gov="">
> > <mailto:durincks at="" mail.nih.gov="" <mailto:durincks="" at="" mail.nih.gov="">>>
> wrote:
> >
> > Hi Marco,
> >
> > You can do this with the biomaRt package (use the devel
> version, >=
> > 1.9.21) , here's how:
> >
> > library(biomaRt)
> > mart=useMart("ensembl", dataset="hsapiens_gene_ensembl")
> >
> getBM(attributes=c("ensembl_gene_id","ensembl_transcript_id","sy
nonymous_snp_count","non_synonymous_snp_count"),
> > filters="affy_hg_u133_plus_2",
> values=c("201746_at","231640_at"),
> > mart=mart)
> >
> > it will give:
> >
> > ensembl_gene_id ensembl_transcript_id synonymous_snp_count
> > non_synonymous_snp_count
> > 1 ENSG00000141510 ENST00000269305
> > 5 20
> > 2 ENSG00000133703 ENST00000256078
> > 1 1
> > 3 ENSG00000133703 ENST00000311936
> > 1 1
> >
> >
> > Unfortunately you won't be able to get the affy id in the
output
> > but you
> > can use biomaRt to map the Ensembl ids in the output back
to the
> > afffy ids.
> >
> > Cheers,
> > Steffen
> >
> >
> > marco zucchelli wrote:
> > > Hi,
> > >
> > > I was wondering if it exists an annotation package for
Affy
> > 133plus2
> > > reporting the number of synonymous & non synonymous
> changes for the
> > > genes on the array.
> > >
> > > If it does not exist does anybody has a good
> suggestion about
> > how to
> > > retrive this information from databases ?
> > >
> > >
> > > Marco
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor at stat.math.ethz.ch
> <mailto:bioconductor at="" stat.math.ethz.ch="">
> > <mailto:bioconductor at="" stat.math.ethz.ch=""> <mailto:bioconductor at="" stat.math.ethz.ch="">>
> > >
https://stat.ethz.ch/mailman/listinfo/bioconductor
> > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">>
> > > Search the archives:
> >
>
http://news.gmane.org/gmane.science.biology.informatics.conductor
> > >
> >
> >
>
>
> --
> Steffen Durinck, Ph.D.
>
> Oncogenomics Section
> Pediatric Oncology Branch
> National Cancer Institute, National Institutes of Health
> URL:
http://home.ccr.cancer.gov/oncology/oncogenomics/
>
> Phone: 301-402-8103
> Address:
> Advanced Technology Center,
> 8717 Grovemont Circle
> Gaithersburg, MD 20877
>
>
--
Steffen Durinck, Ph.D.
Oncogenomics Section
Pediatric Oncology Branch
National Cancer Institute, National Institutes of Health
URL:
http://home.ccr.cancer.gov/oncology/oncogenomics/
Phone: 301-402-8103
Address:
Advanced Technology Center,
8717 Grovemont Circle
Gaithersburg, MD 20877