Hi,
I have run into the following problem. I created a probeID-EntrezID
mapping
for the Affy mouse array from the cognate annotation file
Mouse4302.db.
Unfortunately about 10000 genes do not have corresponding EntrezID.
Many of these are genes with known functions. If I cannot map a
EntrezID to
these then I cannot retrieve GO annotations and consequently I cannot
do a
Gene Set Enrichment analysis using GOstats.
Does anyone have an update annotation file?
Many thanks in advance,
Anjan
--
===================================
anjan purkayastha, phd.
research associate
fas center for systems biology,
harvard university
52 oxford street
cambridge ma 02138
phone-703.740.6939
===================================
[[alternative HTML version deleted]]
On 11/02/2010 11:14 AM, ANJAN PURKAYASTHA wrote:
> Hi,
> I have run into the following problem. I created a probeID-EntrezID
mapping
> for the Affy mouse array from the cognate annotation file
Mouse4302.db.
> Unfortunately about 10000 genes do not have corresponding EntrezID.
> Many of these are genes with known functions. If I cannot map a
EntrezID to
> these then I cannot retrieve GO annotations and consequently I
cannot do a
> Gene Set Enrichment analysis using GOstats.
> Does anyone have an update annotation file?
Hi Anjan
What is your sessionInfo() (else how could we know what an 'updated'
annotation file is?) and how did you preform the mapping (short,
hopefully reproducible, code)?
Martin
> Many thanks in advance,
> Anjan
>
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
Hi Martin,
Session Info:
R version 2.11.1 (2010-05-31)
i386-apple-darwin9.8.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] affy_1.26.1 GOstats_2.14.0 graph_1.28.0
Category_2.14.0 mouse4302.db_2.4.1 org.Mm.eg.db_2.4.1
RSQLite_0.9-2
[8] DBI_0.2-5 AnnotationDbi_1.10.2 Biobase_2.8.0
loaded via a namespace (and not attached):
[1] affyio_1.16.0 annotate_1.26.1 genefilter_1.30.0
GO.db_2.4.1 GSEABase_1.10.0 preprocessCore_1.10.0
[7] RBGL_1.26.0 splines_2.11.1 survival_2.35-8
tools_2.11.1 XML_3.1-1 xtable_1.5-6
Commands used to create the mapping:
Library(mouse4302.db)
id <- rownames(allMtb.rma.data.frame)
map <- mouse4302ENTREZID
probe_entrezid <- unlist(mget(id, map))
p <- as.data.frame(probe_entrezid)
p now has the probeID_entrezID mappings
Thanks,
Anjan
On Tue, Nov 2, 2010 at 2:16 PM, Martin Morgan <mtmorgan@fhcrc.org>
wrote:
> On 11/02/2010 11:14 AM, ANJAN PURKAYASTHA wrote:
> > Hi,
> > I have run into the following problem. I created a probeID-
EntrezID
> mapping
> > for the Affy mouse array from the cognate annotation file
Mouse4302.db.
> > Unfortunately about 10000 genes do not have corresponding
EntrezID.
> > Many of these are genes with known functions. If I cannot map a
EntrezID
> to
> > these then I cannot retrieve GO annotations and consequently I
cannot do
> a
> > Gene Set Enrichment analysis using GOstats.
> > Does anyone have an update annotation file?
>
> Hi Anjan
>
> What is your sessionInfo() (else how could we know what an 'updated'
> annotation file is?) and how did you preform the mapping (short,
> hopefully reproducible, code)?
>
> Martin
>
> > Many thanks in advance,
> > Anjan
> >
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>
--
===================================
anjan purkayastha, phd.
research associate
fas center for systems biology,
harvard university
52 oxford street
cambridge ma 02138
phone-703.740.6939
===================================
[[alternative HTML version deleted]]
On 11/02/2010 11:20 AM, ANJAN PURKAYASTHA wrote:
> Hi Martin,
> Session Info:
> R version 2.11.1 (2010-05-31)
> i386-apple-darwin9.8.0
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] affy_1.26.1 GOstats_2.14.0 graph_1.28.0
> Category_2.14.0 mouse4302.db_2.4.1 org.Mm.eg.db_2.4.1
> RSQLite_0.9-2
> [8] DBI_0.2-5 AnnotationDbi_1.10.2 Biobase_2.8.0
>
> loaded via a namespace (and not attached):
> [1] affyio_1.16.0 annotate_1.26.1 genefilter_1.30.0
> GO.db_2.4.1 GSEABase_1.10.0 preprocessCore_1.10.0
> [7] RBGL_1.26.0 splines_2.11.1 survival_2.35-8
> tools_2.11.1 XML_3.1-1 xtable_1.5-6
>
>
> Commands used to create the mapping:
> Library(mouse4302.db)
> id <- rownames(allMtb.rma.data.frame)
> map <- mouse4302ENTREZID
> probe_entrezid <- unlist(mget(id, map))
> p <- as.data.frame(probe_entrezid)
> p now has the probeID_entrezID mappings
With R-2-11 I see
> mouse4302()
[...snip...]
mouse4302ENTREZID has 37316 mapped keys (of 45101 keys)
[...snip...]
Date for NCBI data: 2010-Mar1
The current version of R / Bioconductor is R-2-12, where there are
37413
mapped probes from NCBI data of 2010-Sep7. Using biomaRt I get
> library(biomaRt)
> mart = useMart("ensembl", "mmusculus_gene_ensembl")
> attrs = listAttributes(mart)
> attrs[grep("(Entrez|Affy mouse)", attrs[[2]]),]
name description
47 entrezgene EntrezGene ID
95 affy_mouse430_2 Affy mouse430 2
96 affy_mouse430a_2 Affy mouse430a 2
> filts = listFilters(mart)
> filts[grep("(Entrez|Affy mouse)", filts[[2]]),]
name description
52 with_entrezgene with EntrezGene ID(s)
84 entrezgene EntrezGene ID(s) [e.g. 100287163]
121 affy_mouse430_2 Affy mouse430 2 ID(s) [e.g. 1426088_at]
122 affy_mouse430a_2 Affy mouse430a 2 ID(s) [e.g. 1426088_at]
> res = getBM(c("affy_mouse430_2","entrezgene"), "with_entrezgene",
TRUE, mart)
> head(res)
affy_mouse430_2 entrezgene
1 338371
2 238944
3 208431
4 1430582_at 268281
5 1458594_at 268281
6 1455882_x_at 319922
> head(table(table(res[[1]])))
1 2 3 4 5 6
24627 1746 374 96 62 34
which tells me there are 24627 uniquely mapping probes, and some more
that could be retrieved with some work (I haven't checked my biomaRt
work very carefully here, so could have made mistakes, and I don't
know
biomaRt well enough to get the provenance of the probes I have
identified, unlike with mouse4302.db where ?mouse4302ENTREZID is
helpful). I could remap the probes using chromosome coordinates from
the
mouse4302 package and BSgenome / Biostrings, and then use org.Mm.eg.db
to map coordinates to genes, too. So I think the best you can do
easily
are the ~37,000 probes that are mapped.
Martin
>
> Thanks,
> Anjan
>
>
> On Tue, Nov 2, 2010 at 2:16 PM, Martin Morgan <mtmorgan at="" fhcrc.org=""> <mailto:mtmorgan at="" fhcrc.org="">> wrote:
>
> On 11/02/2010 11:14 AM, ANJAN PURKAYASTHA wrote:
> > Hi,
> > I have run into the following problem. I created a
> probeID-EntrezID mapping
> > for the Affy mouse array from the cognate annotation file
> Mouse4302.db.
> > Unfortunately about 10000 genes do not have corresponding
EntrezID.
> > Many of these are genes with known functions. If I cannot map
a
> EntrezID to
> > these then I cannot retrieve GO annotations and consequently I
> cannot do a
> > Gene Set Enrichment analysis using GOstats.
> > Does anyone have an update annotation file?
>
> Hi Anjan
>
> What is your sessionInfo() (else how could we know what an
'updated'
> annotation file is?) and how did you preform the mapping (short,
> hopefully reproducible, code)?
>
> Martin
>
> > Many thanks in advance,
> > Anjan
> >
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>
>
>
>
> --
> ===================================
> anjan purkayastha, phd.
> research associate
> fas center for systems biology,
> harvard university
> 52 oxford street
> cambridge ma 02138
> phone-703.740.6939
> ===================================
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
On Tue, Nov 2, 2010 at 2:14 PM, ANJAN PURKAYASTHA
<anjan.purkayastha at="" gmail.com=""> wrote:
> Hi,
> I have run into the following problem. I created a probeID-EntrezID
mapping
> for the Affy mouse array from the cognate annotation file
Mouse4302.db.
> Unfortunately about 10000 genes do not have corresponding EntrezID.
What do you mean by "10000 genes"?
The following shows that 7688 probesets do not have Entrez ID mappings
(using current packages).
> length(ls(mouse4302ENTREZID))
[1] 45101
> length(setdiff(ls(mouse4302ENTREZID),
mappedkeys(mouse4302ENTREZID)))
[1] 7688
That's just a fact of life.
> Many of these are genes with known functions. If I cannot map a
EntrezID to
> these then I cannot retrieve GO annotations and consequently I
cannot do a
> Gene Set Enrichment analysis using GOstats.
This is not really correct. You can use whatever groupings and
mappings you like with GOstats. See the
GOstatsForUnsupportedOrganisms for extensive details on dealing
with a somewhat more difficult situation. When you say the genes have
"known functions", perhaps you can use that knowledge to provide GO
associations for the unmapped genes, or, if the functions you refer to
do not have names in GO, you can create your own functional grouping
of genes.
> Does anyone have an update annotation file?
Your sessionInfo shows that you are not using the current version of
R, but that is not the main concern. If you have gene:GO mappings and
gene sets that you prefer to those available through the annotation
packages, you can use those mappings and sets to drive the GOstats
analysis.
My sessionInfo:
R version 2.12.0 Patched (2010-10-15 r53331)
Platform: x86_64-apple-darwin10.4.0/x86_64 (64-bit)
locale:
[1] C
attached base packages:
[1] stats graphics grDevices datasets tools utils
methods
[8] base
other attached packages:
[1] mouse4302.db_2.4.5 org.Mm.eg.db_2.4.6 RSQLite_0.9-2
[4] DBI_0.2-5 AnnotationDbi_1.11.9 Biobase_2.10.0
[7] weaver_1.15.0 codetools_0.2-2 digest_0.4.2
> Many thanks in advance,
> Anjan
>
> --
> ===================================
> anjan purkayastha, phd.
> research associate
> fas center for systems biology,
> harvard university
> 52 oxford street
> cambridge ma 02138
> phone-703.740.6939
> ===================================
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>