Tim,
While annotating a list of probesets to exons, transcripts and genes,
I
noticed that there are more probesets (e.g.,4448480) mapped to genes
than
those mapped to transcripts and the least number of probesets mapped
to the
exons. Is this expected? I suppose if one probe is aligned to multiple
exons
in a gene, then the exon mapping was removed while the gene mapping
was
kept. Could you please elaborate? Thanks so much for your help!
Best regards,
Julie
library(xmapcore)
xmap.connect("mouse")
>probeset.to.transcript("4448480", as.vector=FALSE)
NULL
> probeset.to.exon("4448480", as.vector=FALSE)
NULL
> probeset.to.gene("4448480", as.vector=FALSE)
RangedData with 1 row and 9 value columns across 1 space
space ranges | IN1 stable_id
strand
<character> <iranges> | <character> <character>
<integer>
1 13 [92020005, 92901611] | 4448480 ENSMUSG00000021708
-1
biotype status
<character> <character>
1 protein_coding KNOWN
description
<character>
1 RAS protein-specific guanine nucleotide-releasing factor 2 Gene
[Source:MGI (curated);Acc:MGI:109137]
db_display_name symbol
<character> <character>
1 MGI (curated) Rasgrf2
symbol_description
<character>
1 RAS protein-specific guanine nucleotide-releasing factor 2 Gene
> temp=
transcript.to.probeset(gene.to.transcript(probeset.to.gene("4448480",
as.vector=TRUE), as.vector=TRUE), as.vector=FALSE)
> temp[temp$stable_id == "4448480",]
[1] IN1 stable_id
[3] array_name probe_count
[5] hit_score gene_score
[7] transcript_score exon_score
[9] est_gene_score est_transcript_score
[11] est_exon_score prediction_transcript_score
[13] prediction_exon_score protein_score
[15] domain_score
<0 rows> (or 0-length row.names)
sessionInfo()
R version 2.11.1 (2010-05-31)
x86_64-apple-darwin9.8.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] mouseexonpmcdf_1.1 xmapcore_1.2.8 digest_0.4.2
[4] IRanges_1.6.11 RMySQL_0.7-5 DBI_0.2-5
loaded via a namespace (and not attached):
[1] tools_2.11.1
Hi Tim,
Thanks so much for such a quick response!
Here is a probeset ID that maps to a gene but not to any exon or
transcript.
library(xmapcore)
xmap.connect("mouse")
probeset.to.transcript("4448480", as.vector=FALSE)
#NULL
probeset.to.exon("4448480", as.vector=FALSE)
#NULL
probeset.to.gene("4448480", as.vector=TRUE)
#[1] "ENSMUSG00000021708"
I looked up the detailed information of this probeset as follows.
probeset.details("4448480")
stable_id array_name probe_count hit_score gene_score transcript_score
1 4448480 MoEx-1_0 4 1 1
0
exon_score est_gene_score est_transcript_score est_exon_score
1 0 0 0 0
prediction_transcript_score prediction_exon_score protein_score
1 0 0 0
domain_score
1 0
It looks like this probeset has one or more of its probes missing the
transcript/exon target but uniquely aligned to a gene. Is it correct
that this probeset is mapped to the un-transcribed region of the gene?
Here is an example that a probeset is mapped to both gene and
transcript but not to any exon.
probeset.details("4305509")
stable_id array_name probe_count hit_score gene_score
transcript_score
1 4305509 MoEx-1_0 4 1 1
2
exon_score est_gene_score est_transcript_score est_exon_score
1 0 1 2 0
prediction_transcript_score prediction_exon_score protein_score
1 1 0 0
domain_score
1 0
Is it correct that this probeset is aligned to the intron region of
the transcript?
Thanks so much for your help!
Best regards,
Julie
On 12/13/10 12:29 PM, "Tim Yates" <tyates@picr.man.ac.uk> wrote:
>
> Hi there!
>
> How are you doing the mapping from probeset to gene, exon,
transcript, etc?
>
> Do you have an example where you believe something is wrong?
>
> Cheers :-)
>
> Tim
>
>
>
> ----- Reply message -----
> From: "Zhu, Lihua \(Julie\)" <julie.zhu@umassmed.edu>
> Date: Mon, Dec 13, 2010 17:20
> Subject: Xmapcore package
> To: "bioconductor@r-project.org" <bioconductor@r-project.org>
> Cc: "Tim Yates" <tyates@picr.man.ac.uk>
>
> Tim,
>
> While annotating a list of probesets to exons, transcripts and
genes, I
> noticed that there are more probesets (e.g.,4448480) mapped to genes
than
> those mapped to transcripts and the least number of probesets mapped
to the
> exons. Is this expected? I suppose if one probe is aligned to
multiple exons
> in a gene, then the exon mapping was removed while the gene mapping
was
> kept. Could you please elaborate? Thanks so much for your help!
>
> Best regards,
>
> Julie
>
> library(xmapcore)
> xmap.connect("mouse")
>> probeset.to.transcript("4448480", as.vector=FALSE)
> NULL
>> probeset.to.exon("4448480", as.vector=FALSE)
> NULL
>> probeset.to.gene("4448480", as.vector=FALSE)
> RangedData with 1 row and 9 value columns across 1 space
> space ranges | IN1 stable_id
> strand
> <character> <iranges> | <character> <character>
> <integer>
> 1 13 [92020005, 92901611] | 4448480 ENSMUSG00000021708
> -1
> biotype status
> <character> <character>
> 1 protein_coding KNOWN
>
> description
>
> <character>
> 1 RAS protein-specific guanine nucleotide-releasing factor 2 Gene
> [Source:MGI (curated);Acc:MGI:109137]
> db_display_name symbol
> <character> <character>
> 1 MGI (curated) Rasgrf2
> symbol_description
> <character>
> 1 RAS protein-specific guanine nucleotide-releasing factor 2 Gene
>> temp=
transcript.to.probeset(gene.to.transcript(probeset.to.gene("4448480",
> as.vector=TRUE), as.vector=TRUE), as.vector=FALSE)
>
>> temp[temp$stable_id == "4448480",]
> [1] IN1 stable_id
> [3] array_name probe_count
> [5] hit_score gene_score
> [7] transcript_score exon_score
> [9] est_gene_score est_transcript_score
> [11] est_exon_score prediction_transcript_score
> [13] prediction_exon_score protein_score
> [15] domain_score
> <0 rows> (or 0-length row.names)
>
> sessionInfo()
> R version 2.11.1 (2010-05-31)
> x86_64-apple-darwin9.8.0
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] mouseexonpmcdf_1.1 xmapcore_1.2.8 digest_0.4.2
> [4] IRanges_1.6.11 RMySQL_0.7-5 DBI_0.2-5
>
> loaded via a namespace (and not attached):
> [1] tools_2.11.1
> --------------------------------------------------------
> This email is confidential and intended solely for
the...{{dropped:19}}
Hi Julie,
I just had a quick look at that probeset on X:Map - here:
(http://xmap.picr.man.ac.uk). There's quite a bit of info here,
including
the hit location of each individual probe in the probeset - what comes
back
is that those probesets land in an intron.
There's help pages on the website, but if you search for the probeset
(you
might need to set the species first), it will appear in the 'Selection
Details' window in the middle below the browser. Clicking on the '[+]'
by
the probeset name expands the annotation tree to reveal each probe and
then
if you expand the tree under each probe, the places where they match
to the
genome. To the right of this window, you'll see some green arrows. If
you
click on these, it'll make the browser jump to the appropriate
position...
Sorry that's a lot easier to do than to write, I think!
...anyway, it seems that the probesets are annotated as 'intronic'.
This
means that one or more of the probes don't hit an exon, as defined by
ENSEMBL...
In R, the function call:
> is.intronic(c('4448480','4305509'))
4448480 4305509
TRUE TRUE
confirms this. (If this is the first time you've run this command, it
might
take a few seconds while it builds a local cache to (ultimately) speed
things up. The second time you call it, it should be a lot quicker.)
Crispin
On 13/12/2010 17:45, "Zhu, Lihua (Julie)" <julie.zhu@umassmed.edu>
wrote:
> Hi Tim,
>
> Thanks so much for such a quick response!
>
> Here is a probeset ID that maps to a gene but not to any exon or
transcript.
>
> library(xmapcore)
> xmap.connect("mouse")
> probeset.to.transcript("4448480", as.vector=FALSE)
> #NULL
> probeset.to.exon("4448480", as.vector=FALSE)
> #NULL
> probeset.to.gene("4448480", as.vector=TRUE)
> #[1] "ENSMUSG00000021708"
>
> I looked up the detailed information of this probeset as follows.
> probeset.details("4448480")
> stable_id array_name probe_count hit_score gene_score
transcript_score
> 1 4448480 MoEx-1_0 4 1 1
0
> exon_score est_gene_score est_transcript_score est_exon_score
> 1 0 0 0 0
> prediction_transcript_score prediction_exon_score protein_score
> 1 0 0 0
> domain_score
> 1 0
>
> It looks like this probeset has one or more of its probes missing
the
> transcript/exon target but uniquely aligned to a gene. Is it correct
that this
> probeset is mapped to the un-transcribed region of the gene?
>
> Here is an example that a probeset is mapped to both gene and
transcript but
> not to any exon.
>
> probeset.details("4305509")
> stable_id array_name probe_count hit_score gene_score
transcript_score
> 1 4305509 MoEx-1_0 4 1 1
2
> exon_score est_gene_score est_transcript_score est_exon_score
> 1 0 1 2 0
> prediction_transcript_score prediction_exon_score protein_score
> 1 1 0 0
> domain_score
> 1 0
>
> Is it correct that this probeset is aligned to the intron region of
the
> transcript?
>
> Thanks so much for your help!
>
> Best regards,
>
> Julie
>
>
--------------------------------------------------------
This email is confidential and intended solely for the
u...{{dropped:15}}