Entering edit mode
Guest User
★
13k
@guest-user-4897
Last seen 10.3 years ago
Dear list,
I have been trying to apply the MGSA method for gene set analysis to
my data by using the mgsa package that is part of the Bioconductor
release, but so far I haven't been able to make it work.
When using the package's readGAF function to create the list of gene
sets from the GO categories with the Rat files downloaded from the GO
webpage (http://www.geneontology.org/GO.downloads.annotations.shtml),
the resulting object looks like this (edited for brevity):
Object of class MgsaGoSets
16779 sets over 29266 unique items.
Set annotations:
term
GO:0000002 mitochondrial genome maintenan...
...
GO:0000014 Catalysis of the hydrolysis of...
... and 16774 other sets.
Item annotations:
symbol name
1302934 St8sia5 ST8 alpha-N-acetyl-neuraminide...
...
1302939 Eef1g eukaryotic translation elongat...
... and 29261 other items.
Applying the function mgsa() to my list of differentially expressed
genes and these gene sets doesn't work, as it looks for matches
between the 'symbol' category in the gene sets and the genes of
interest. However, the numbers in the 'symbol' category are RGD IDs
(from the Rat Genome Database, http://rgd.mcw.edu/), and I haven't
been able to find a way to either change these to something else
(Entrez ID, gene symbol, etc) or somehow get the RGD IDs for my genes
of interest without looking for them manually.
So, in order to apply MGSA to my data, I am hoping to get some help on
how to do one of these three things:
1) Modify the MgsaGoSets object so it uses as 'symbol' a more common
gene ID, such as Entrez ID, instead of RGD ID.
2) Obtain the RGD IDs of my list of differentially expressed genes
from a more common gene ID.
3) Create a named list of vectors of gene identifiers, where each GO
category is one item in the list and has associated a vector of all
the Gene IDs that comprise the category, in a similar way to the
process explained in the third section of the package creator's
Bioinformatics paper (PMID: 21561920).
I would welcome any suggestion you may have, as I am quite interested
in comparing the results of this analysis to other gene set analysis
methods. Thanks in advance for your help!
Juan
-- output of sessionInfo():
> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)
locale:
[1] C/en_US.UTF-8/C/C/C/C
attached base packages:
[1] grid stats graphics grDevices utils datasets
[7] methods base
other attached packages:
[1] mgsa_1.6.0 gplots_2.11.0 MASS_7.3-22
[4] KernSmooth_2.23-8 caTools_1.14 gdata_2.12.0
[7] gtools_2.7.0 BiocInstaller_1.8.3 xtable_1.7-0
[10] GOstats_2.24.0 graph_1.36.1 Category_2.24.0
[13] rat2302cdf_2.11.0 genefilter_1.40.0 RColorBrewer_1.0-5
[16] affycoretools_1.30.0 KEGG.db_2.8.0 GO.db_2.8.0
[19] annotate_1.36.0 rat2302.db_2.8.1 org.Rn.eg.db_2.8.0
[22] RSQLite_0.11.2 DBI_0.2-5 AnnotationDbi_1.20.3
[25] limma_3.14.3 affy_1.36.0 Biobase_2.18.0
[28] BiocGenerics_0.4.0
loaded via a namespace (and not attached):
[1] AnnotationForge_1.0.3 Biostrings_2.26.2 GSEABase_1.20.1
[4] IRanges_1.16.4 RBGL_1.34.0 RCurl_1.95-3
[7] XML_3.95-0.1 affyio_1.26.0 annaffy_1.30.0
[10] biomaRt_2.14.0 bitops_1.0-4.2 gcrma_2.30.0
[13] lattice_0.20-10 parallel_2.15.2 preprocessCore_1.20.0
[16] splines_2.15.2 stats4_2.15.2 survival_2.36-14
[19] tools_2.15.2 zlibbioc_1.4.0
--
Sent via the guest posting facility at bioconductor.org.