species in MsigDB of GSEA
2
0
Entering edit mode
Di Wu ▴ 190
@di-wu-1837
Last seen 10.3 years ago
Dear list, I am trying to use MsigDB, the gene set database from GSEA. I am interested to know whether the sets of genes are from human or mouse, particularly in C2. I know I can always click the web and go deep to see how a set was obtained. But is there any coding way to get the species sources for all the gene sets in C2 or MsigDB. Appreciate your suggestions. Cheers, Di [[alternative HTML version deleted]]
GO GO • 2.8k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 5 months ago
United States
Hi Di -- "Di Wu" <di.wu at="" med.monash.edu.au=""> writes: > Dear list, > > I am trying to use MsigDB, the gene set database from GSEA. I am interested > to know whether the sets of genes are from human or mouse, particularly in > C2. > I know I can always click the web and go deep to see how a set was obtained. > But is there any coding way to get the species sources for all the gene sets > in C2 or MsigDB. If you're using the GSEABase package, then each gene set read by getBroadSets records the organism, so for example > fl <- "/path/to/msigdb_v2.1.xml" > gss <- getBroadSets(fl) # read entire msigdb > organism(gss[[1]]) "Human" > table(sapply(gss, organism)) Chimpanzee Generic Human 1 456 1769 Human,Mouse,Rat,Dog Mouse Pig 837 248 11 Rat Rhesus Zebra Fish 3 4 8 > # retrieve a few sets from the web > gss <- getBroadSets(asBroadUri(c('chr16q', 'GNF2_ZAP70'))) > organism(gss[[1]]) "Human" As a 'closer to the metal' alternative, you could use the XML package > xml <- xmlTreeParse(fl, useInternal=TRUE) > query <- '//GENESET[@STANDARD_NAME="KENNY_WNT_UP"]/@ORGANISM' > xpathApply(xml, query, xmlValue) [[1]] [1] "Mouse" > table(unlist(xpathApply(xml, "//@ORGANISM", xmlValue))) Chimpanzee Generic Human 1 456 1769 Human,Mouse,Rat,Dog Mouse Pig 837 248 11 Rat Rhesus Zebra Fish 3 4 8 Martin > Appreciate your suggestions. > Cheers, > Di > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
Thank you, Martin. That's what I need. I have a follow-up basic question. How can I transform "collectionType" to character, such as "C2", in case I only want to play with the sets from C2. Cheers, Di On Wed, Jul 16, 2008 at 12:49 PM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > Hi Di -- > > "Di Wu" <di.wu@med.monash.edu.au> writes: > > > Dear list, > > > > I am trying to use MsigDB, the gene set database from GSEA. I am > interested > > to know whether the sets of genes are from human or mouse, particularly > in > > C2. > > I know I can always click the web and go deep to see how a set was > obtained. > > But is there any coding way to get the species sources for all the gene > sets > > in C2 or MsigDB. > > If you're using the GSEABase package, then each gene set read by > getBroadSets records the organism, so for example > > > fl <- "/path/to/msigdb_v2.1.xml" > > gss <- getBroadSets(fl) # read entire msigdb > > organism(gss[[1]]) > "Human" > > table(sapply(gss, organism)) > > Chimpanzee Generic Human > 1 456 1769 > Human,Mouse,Rat,Dog Mouse Pig > 837 248 11 > Rat Rhesus Zebra Fish > 3 4 8 > > > # retrieve a few sets from the web > > gss <- getBroadSets(asBroadUri(c('chr16q', 'GNF2_ZAP70'))) > > organism(gss[[1]]) > "Human" > > As a 'closer to the metal' alternative, you could use the XML package > > > xml <- xmlTreeParse(fl, useInternal=TRUE) > > query <- '//GENESET[@STANDARD_NAME="KENNY_WNT_UP"]/@ORGANISM' > > xpathApply(xml, query, xmlValue) > [[1]] > [1] "Mouse" > > table(unlist(xpathApply(xml, "//@ORGANISM", xmlValue))) > > Chimpanzee Generic Human > 1 456 1769 > Human,Mouse,Rat,Dog Mouse Pig > 837 248 11 > Rat Rhesus Zebra Fish > 3 4 8 > > Martin > > > Appreciate your suggestions. > > Cheers, > > Di > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > Martin Morgan > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M2 B169 > Phone: (206) 667-2793 > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
@martin-morgan-1513
Last seen 5 months ago
United States
"Di Wu" <di.wu at="" med.monash.edu.au=""> writes: > Thank you, Martin. > That's what I need. I have a follow-up basic question. > How can I transform "collectionType"? to? character, such as "C2", in case I > only want to play with the sets from C2. I'm not quite sure what you're asking, but something like this > is_c2 <- sapply(gss, function(gs) bcCategory(collectionType(gs))=="c2") gives you a logical vector which is TRUE when the bcCategory of the collectionType of each gene set in gss is "c2". You can then > c2sets <- gss[is_c2] to get just those gene sets belonging to c2 (I'm using hints from the display of the gene set to guess at how to get parts of it out, e.g., > gss[[1]] [...] collectionType: Broad bcCategory: c1 (Positional) bcSubCategory: NA details: use 'details(object)' suggests that I can use collectionType on gss[[1]], and bcCategory on the result of collectionType; I could also look in the help page, e.g., for GeneSet-class and BroadCollection-class). Also maybe worth pointing out that gene set collections can be subset by their set names, e.g., > details(gss[["KENNY_WNT_UP"]]) setName: KENNY_WNT_UP geneIds: CUGBP2, ARFGEF2, ..., CASKIN2 (total: 51) geneIdType: Symbol collectionType: Broad bcCategory: c2 (Curated) bcSubCategory: NA setIdentifier: c2:803 description: Genes up-regulated by Wnt in HC11 (mammary epithelial cells) (longDescription available) organism: Mouse pubMedIds: 15642117 urls: file://home/mtmorgan/tmp/msigdb_v2.1.xml contributor: Yujin Hoshida setVersion: 0.0.1 creationDate: Tue Jul 15 20:31:53 2008 Hope that's on the right track for what you were looking for, Martin > Cheers, > Di > > > On Wed, Jul 16, 2008 at 12:49 PM, Martin Morgan <[[mtmorgan at fhcrc.org]]> > wrote: > > Hi Di -- > > "Di Wu" <[[di.wu at med.monash.edu.au]]> writes: > > Dear list, > > I am trying to use MsigDB, the gene set > database from GSEA. I am interested > to know whether the sets > of genes are from human or mouse, particularly in > C2. > I > know I can always click the web and go deep to see how a set was > obtained. > But is there any coding way to get the species > sources for all the gene sets > in C2 or MsigDB. > > > > If you're using the GSEABase package, then each gene set read by getBroadSets > records the organism, so for example > > fl <- "/path/to/msigdb_v2.1.xml" > gss <- getBroadSets(fl) # > read entire msigdb > organism(gss[[1]]) "Human" > > table(sapply(gss, organism)) > ? ? ? ? Chimpanzee ? ? ? ? ? ? Generic ? ? ? ? ? ? ? Human ? ? ? > ? ? ? ? ? ?1 ? ? ? ? ? ? ? ? 456 ? ? ? ? ? ? ? ?1769 Human,Mouse,Rat,Dog > ? ? ? ? ? ? ? Mouse ? ? ? ? ? ? ? ? Pig ? ? ? ? ? ? ? ?837 ? ? ? > ? ? ? ? ? 248 ? ? ? ? ? ? ? ? ?11 ? ? ? ? ? ? ? ?Rat ? ? ? ? ? ? > ?Rhesus ? ? ? ? ?Zebra Fish ? ? ? ? ? ? ? ? ?3 ? ? ? ? ? ? ? ? ? > 4 ? ? ? ? ? ? ? ? ? 8 > > # retrieve a few sets from the web > gss <- > getBroadSets(asBroadUri(c('chr16q', 'GNF2_ZAP70'))) > > organism(gss[[1]]) "Human" > As a 'closer to the metal' alternative, you could use the XML > package > > xml <- xmlTreeParse(fl, useInternal=TRUE) > query <- > //GENESET[@STANDARD_NAME="KENNY_WNT_UP"]/@ORGANISM' > > xpathApply(xml, query, xmlValue) [[1]] [1] "Mouse" > > table(unlist(xpathApply(xml, "//@ORGANISM", xmlValue))) > ? ? ? ? Chimpanzee ? ? ? ? ? ? Generic ? ? ? ? ? ? ? Human ? ? ? > ? ? ? ? ? ?1 ? ? ? ? ? ? ? ? 456 ? ? ? ? ? ? ? ?1769 Human,Mouse,Rat,Dog > ? ? ? ? ? ? ? Mouse ? ? ? ? ? ? ? ? Pig ? ? ? ? ? ? ? ?837 ? ? ? > ? ? ? ? ? 248 ? ? ? ? ? ? ? ? ?11 ? ? ? ? ? ? ? ?Rat ? ? ? ? ? ? > ?Rhesus ? ? ? ? ?Zebra Fish ? ? ? ? ? ? ? ? ?3 ? ? ? ? ? ? ? ? ? > 4 ? ? ? ? ? ? ? ? ? 8 > Martin > > > Appreciate your suggestions. > Cheers, > Di > > > > > ? ? ? [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor > mailing list > [[Bioconductor at stat.math.ethz.ch]] > > [[https://stat.ethz.ch/mailman/listinfo/bioconductor]] > Search > the archives: > [[http://news.gmane.org/gmane.science.biology.informatics.conductor]] > -- Martin Morgan Computational Biology / Fred Hutchinson > Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 > Seattle, WA 98109 > Location: Arnold Building M2 B169 Phone: (206) 667-2793 > > > -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793
ADD COMMENT

Login before adding your answer.

Traffic: 1079 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6