GSEA using Broad genesets

0

Entering edit mode

Roger Liu ▴ 260

@roger-liu-2141

Last seen 10.6 years ago

Dear list, I have a question regarding using broad gene sets for GSEA anlaysis. As we know, we have "gsc <- GeneSetCollection(bcrneg_filt1, setType=KEGGCollection())" and "Am<-incidence(gsc)" to generate incidence matrix for further anlaysis. I have learned to get the geneset file from Broad such as: "c3gsc2 <- getGmt("/path/to/c3.all.v2.5.symbols.gmt", collectionType=BroadCollection(category="c3"), geneIdType=SymbolIdentifier())" My question is how to use c3gsc2 and bcneg_filt1 to create a new incidence matrix ? Do I have to manually do this? or there is a command which can do this? Thanks. Qiudao

• 1.9k views

ADD COMMENT • link updated 15.2 years ago by Martin Morgan 25k • written 15.2 years ago by Roger Liu ▴ 260

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 12 weeks ago

United States

On 02/06/2010 04:05 PM, zrl wrote: > Dear list, > > I have a question regarding using broad gene sets for GSEA anlaysis. > > As we know, we have "gsc <- GeneSetCollection(bcrneg_filt1, > setType=KEGGCollection())" and "Am<-incidence(gsc)" to generate > incidence matrix for further anlaysis. > > I have learned to get the geneset file from Broad such as: "c3gsc2 <- > getGmt("/path/to/c3.all.v2.5.symbols.gmt", > collectionType=BroadCollection(category="c3"), > geneIdType=SymbolIdentifier())" > > My question is how to use c3gsc2 and bcneg_filt1 to create a new > incidence matrix ? Do I have to manually do this? or there is a > command which can do this? Hi Quidao bcneg_filt1 is a subset of an ExpressionSet, and is just another source for creating a gene set collection. Here you're using c3.all.v2.5.symbols.gmt as a source for your gene set collection. The incidence matrix is > m <- incidence(c3gsc2) > class(m) [1] "matrix" > dim(m) [1] 837 15718 > m[1:5, 1:5] DLC1 FLJ39378 PTGS1 RORC VPRBP RGAGGAARY_V$PU1_Q6 1 1 1 1 1 KRCTCNNNNMANAGC_UNKNOWN 0 0 0 0 0 AAAYWAACM_V$HFH4_01 0 0 0 0 0 YYCATTCAWW_UNKNOWN 0 0 0 0 0 CYTAGCAAY_UNKNOWN 0 0 0 0 0 with rows as set names and columns as symbols. Martin > > > > Thanks. > > Qiudao > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793

ADD COMMENT • link 15.2 years ago Martin Morgan 25k

0

Entering edit mode

Hi Martin, Thank you for answering my question. Sorry I didn't make my question clearly. In the case of "gsc <- GeneSetCollection(bcrneg_filt1, setType=KEGGCollection())" and "Am<-incidence(gsc)", we use KEGG as reference to create gene sets of bcrneg_filt1, then create a incidence. My question is what if I use a download geneset database such as "c3.all.v2.5.symbols.gmt" as reference to create gene set of ExpressionSet bcrneg_filt1, then create a incidence matrix. Do I have to manually do this? (I mean, identifying the genes in eset,then correlates them in c3.all.v2.5.symbols.gmt to create gene sets) or is there a direct command doing this? Thanks. On Sun, Feb 7, 2010 at 9:11 AM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: > On 02/06/2010 04:05 PM, zrl wrote: >> Dear list, >> >> I have a question regarding using broad gene sets for GSEA anlaysis. >> >> As we know, we have "gsc <- GeneSetCollection(bcrneg_filt1, >> setType=KEGGCollection())" and "Am<-incidence(gsc)" to generate >> incidence matrix for further anlaysis. >> >> I have learned to get the geneset file from Broad such as: "c3gsc2 <- >> getGmt("/path/to/c3.all.v2.5.symbols.gmt", >> collectionType=BroadCollection(category="c3"), >> geneIdType=SymbolIdentifier())" >> >> My question is how to use c3gsc2 and bcneg_filt1 to create a new >> incidence matrix ? Do I have to manually do this? or there is a >> command which can do this? > > Hi Quidao > > bcneg_filt1 is a subset of an ExpressionSet, and is just another source > for creating a gene set collection. Here you're using > c3.all.v2.5.symbols.gmt as a source for your gene set collection. The > incidence matrix is > >> m <- incidence(c3gsc2) >> class(m) > [1] "matrix" >> dim(m) > [1] ? 837 15718 >> m[1:5, 1:5] > ? ? ? ? ? ? ? ? ? ? ? ?DLC1 FLJ39378 PTGS1 RORC VPRBP > RGAGGAARY_V$PU1_Q6 ? ? ? ? 1 ? ? ? ?1 ? ? 1 ? ?1 ? ? 1 > KRCTCNNNNMANAGC_UNKNOWN ? ?0 ? ? ? ?0 ? ? 0 ? ?0 ? ? 0 > AAAYWAACM_V$HFH4_01 ? ? ? ?0 ? ? ? ?0 ? ? 0 ? ?0 ? ? 0 > YYCATTCAWW_UNKNOWN ? ? ? ? 0 ? ? ? ?0 ? ? 0 ? ?0 ? ? 0 > CYTAGCAAY_UNKNOWN ? ? ? ? ?0 ? ? ? ?0 ? ? 0 ? ?0 ? ? 0 > > with rows as set names and columns as symbols. > > Martin > >> >> >> >> Thanks. >> >> Qiudao >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > Martin Morgan > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 >

ADD REPLY • link 15.2 years ago Roger Liu ▴ 260

0

Entering edit mode

On 02/07/2010 03:25 PM, zrl wrote: > Hi Martin, > > Thank you for answering my question. Sorry I didn't make my question clearly. > In the case of "gsc <- GeneSetCollection(bcrneg_filt1, > setType=KEGGCollection())" and "Am<-incidence(gsc)", we use KEGG as > reference to create gene sets of bcrneg_filt1, then create a > incidence. > > My question is what if I use a download geneset database such as > "c3.all.v2.5.symbols.gmt" as reference to create gene set of > ExpressionSet bcrneg_filt1, then create a incidence matrix. Do I have > to manually do this? (I mean, identifying the genes in eset,then > correlates them in c3.all.v2.5.symbols.gmt to create gene sets) or is > there a direct command doing this? Hi -- > c3gsc = getGmt("~/tmp/c3.all.v2.5.symbols.gmt", + geneIdType=SymbolIdentifier()) It's possible to ask for the intersection of a gene set collection with specific gene dientifiers, so > c3gsc & c("DLC1", "FLJ39378") so for an Affy array like bcrneg_filt1 a command like library(Biobase) data(sample.ExpressionSet) eset = sample.ExpressionSet[250:300,] symbolIds = getSYMBOL(featureNames(eset), annotation(eset)) gets the gene symbols, and c3gsc1 = c3gsc & symbolIds does the subset. But it might be just as easy to m = incidence(c3gsc) m1 = m[,colnames(m) %in% symbolIds] m1 = m1[rowSums(m) != 0, ] (the & operator alters the names of the gene sets, and keeps empty sets, so further processing would probably be needed). Hope that helps. Martin > Thanks. > > > > > > > On Sun, Feb 7, 2010 at 9:11 AM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: >> On 02/06/2010 04:05 PM, zrl wrote: >>> Dear list, >>> >>> I have a question regarding using broad gene sets for GSEA anlaysis. >>> >>> As we know, we have "gsc <- GeneSetCollection(bcrneg_filt1, >>> setType=KEGGCollection())" and "Am<-incidence(gsc)" to generate >>> incidence matrix for further anlaysis. >>> >>> I have learned to get the geneset file from Broad such as: "c3gsc2 <- >>> getGmt("/path/to/c3.all.v2.5.symbols.gmt", >>> collectionType=BroadCollection(category="c3"), >>> geneIdType=SymbolIdentifier())" >>> >>> My question is how to use c3gsc2 and bcneg_filt1 to create a new >>> incidence matrix ? Do I have to manually do this? or there is a >>> command which can do this? >> >> Hi Quidao >> >> bcneg_filt1 is a subset of an ExpressionSet, and is just another source >> for creating a gene set collection. Here you're using >> c3.all.v2.5.symbols.gmt as a source for your gene set collection. The >> incidence matrix is >> >>> m <- incidence(c3gsc2) >>> class(m) >> [1] "matrix" >>> dim(m) >> [1] 837 15718 >>> m[1:5, 1:5] >> DLC1 FLJ39378 PTGS1 RORC VPRBP >> RGAGGAARY_V$PU1_Q6 1 1 1 1 1 >> KRCTCNNNNMANAGC_UNKNOWN 0 0 0 0 0 >> AAAYWAACM_V$HFH4_01 0 0 0 0 0 >> YYCATTCAWW_UNKNOWN 0 0 0 0 0 >> CYTAGCAAY_UNKNOWN 0 0 0 0 0 >> >> with rows as set names and columns as symbols. >> >> Martin >> >>> >>> >>> >>> Thanks. >>> >>> Qiudao >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> -- >> Martin Morgan >> Computational Biology / Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. >> PO Box 19024 Seattle, WA 98109 >> >> Location: Arnold Building M1 B861 >> Phone: (206) 667-2793 >> -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793

ADD REPLY • link 15.2 years ago Martin Morgan 25k

0

Entering edit mode

Thank you Martin, these are what I want. I like the second method to create incidence matrix. My last question is in GSEABase when we do this: "gsc <- GeneSetCollection(bcrneg_filt1, setType=KEGGCollection())" how does GSEABase collapse the affy probes to gene symbols? (max,mean,median or not at all) So, if we use download database such as ****.symbols.gmt, how should we collapse the probes to symbols? Sorry to bother you so much. Thank you very much. Qiudao On Wed, Feb 10, 2010 at 9:47 AM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: > On 02/07/2010 03:25 PM, zrl wrote: >> Hi Martin, >> >> Thank you for answering my question. Sorry I didn't make my question clearly. >> In the case of "gsc <- GeneSetCollection(bcrneg_filt1, >> setType=KEGGCollection())" and "Am<-incidence(gsc)", we use KEGG as >> reference to create gene sets of bcrneg_filt1, then create a >> incidence. >> >> My question is what if I use a download geneset database such as >> "c3.all.v2.5.symbols.gmt" as reference to create gene set of >> ExpressionSet bcrneg_filt1, then create a incidence matrix. Do I have >> to manually do this? (I mean, identifying the genes in eset,then >> correlates them in c3.all.v2.5.symbols.gmt to create gene sets) or is >> there a direct command doing this? > > Hi -- > >> c3gsc = getGmt("~/tmp/c3.all.v2.5.symbols.gmt", > + ? ? ? ? ? ? ? ? geneIdType=SymbolIdentifier()) > > It's possible to ask for the intersection of a gene set collection with > specific gene dientifiers, so > >> c3gsc & c("DLC1", "FLJ39378") > > so for an Affy array like bcrneg_filt1 a command like > > ?library(Biobase) > ?data(sample.ExpressionSet) > ?eset = sample.ExpressionSet[250:300,] > ?symbolIds = getSYMBOL(featureNames(eset), annotation(eset)) > > gets the gene symbols, and > > ?c3gsc1 = c3gsc & symbolIds > > does the subset. But it might be just as easy to > > ?m = incidence(c3gsc) > ?m1 = m[,colnames(m) %in% symbolIds] > ?m1 = m1[rowSums(m) != 0, ] > > (the & operator alters the names of the gene sets, and keeps empty sets, > so further processing would probably be needed). > > Hope that helps. > > Martin > > >> Thanks. >> >> >> >> >> >> >> On Sun, Feb 7, 2010 at 9:11 AM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: >>> On 02/06/2010 04:05 PM, zrl wrote: >>>> Dear list, >>>> >>>> I have a question regarding using broad gene sets for GSEA anlaysis. >>>> >>>> As we know, we have "gsc <- GeneSetCollection(bcrneg_filt1, >>>> setType=KEGGCollection())" and "Am<-incidence(gsc)" to generate >>>> incidence matrix for further anlaysis. >>>> >>>> I have learned to get the geneset file from Broad such as: "c3gsc2 <- >>>> getGmt("/path/to/c3.all.v2.5.symbols.gmt", >>>> collectionType=BroadCollection(category="c3"), >>>> geneIdType=SymbolIdentifier())" >>>> >>>> My question is how to use c3gsc2 and bcneg_filt1 to create a new >>>> incidence matrix ? Do I have to manually do this? or there is a >>>> command which can do this? >>> >>> Hi Quidao >>> >>> bcneg_filt1 is a subset of an ExpressionSet, and is just another source >>> for creating a gene set collection. Here you're using >>> c3.all.v2.5.symbols.gmt as a source for your gene set collection. The >>> incidence matrix is >>> >>>> m <- incidence(c3gsc2) >>>> class(m) >>> [1] "matrix" >>>> dim(m) >>> [1] ? 837 15718 >>>> m[1:5, 1:5] >>> ? ? ? ? ? ? ? ? ? ? ? ?DLC1 FLJ39378 PTGS1 RORC VPRBP >>> RGAGGAARY_V$PU1_Q6 ? ? ? ? 1 ? ? ? ?1 ? ? 1 ? ?1 ? ? 1 >>> KRCTCNNNNMANAGC_UNKNOWN ? ?0 ? ? ? ?0 ? ? 0 ? ?0 ? ? 0 >>> AAAYWAACM_V$HFH4_01 ? ? ? ?0 ? ? ? ?0 ? ? 0 ? ?0 ? ? 0 >>> YYCATTCAWW_UNKNOWN ? ? ? ? 0 ? ? ? ?0 ? ? 0 ? ?0 ? ? 0 >>> CYTAGCAAY_UNKNOWN ? ? ? ? ?0 ? ? ? ?0 ? ? 0 ? ?0 ? ? 0 >>> >>> with rows as set names and columns as symbols. >>> >>> Martin >>> >>>> >>>> >>>> >>>> Thanks. >>>> >>>> Qiudao >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >>> -- >>> Martin Morgan >>> Computational Biology / Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N. >>> PO Box 19024 Seattle, WA 98109 >>> >>> Location: Arnold Building M1 B861 >>> Phone: (206) 667-2793 >>> > > > -- > Martin Morgan > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 >

ADD REPLY • link 15.2 years ago Roger Liu ▴ 260

0

Entering edit mode

On 02/10/2010 10:47 AM, zrl wrote: > Thank you Martin, these are what I want. I like the second method to > create incidence matrix. > My last question is in GSEABase when we do this: > > "gsc <- GeneSetCollection(bcrneg_filt1, setType=KEGGCollection())" > > how does GSEABase collapse the affy probes to gene symbols? > (max,mean,median or not at all) Remember that the gene set is a collection of symbols; expression doesn't have anything to do with its construction. GeneSetCollection() uses featureNames(bcrneg_filt1), and then the map between affy probe ids and KEGG pathways provided by the relevant Bioconductor annotation package, e.g., hgu95av2.db, hgu95av2PATH. The issue that comes up is when a probeset id maps to several pathways > featureNames(sample.ExpressionSet[201,]) [1] "31440_at" > hgu95av2PATH[[featureNames(sample.ExpressionSet)[201]]] [1] "04310" "04520" "04916" "05200" "05210" "05213" "05215" "05216" "05217" [10] "05221" "05412" and then the probeset id 1200_at is assigned to the 11 sets representing these different KEGG pathways. > GeneSetCollection(sample.ExpressionSet[201,], setType=KEGGCollection()) GeneSetCollection names: 04310, 04520, ..., 05412 (11 total) unique identifiers: 31440_at (1 total) types in collection: geneIdType: AnnotationIdentifier (1 total) collectionType: KEGGCollection (1 total) Martin > > > So, if we use download database such as ****.symbols.gmt, > how should we collapse the probes to symbols? > > Sorry to bother you so much. Thank you very much. > > Qiudao > > > > > > > On Wed, Feb 10, 2010 at 9:47 AM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: >> On 02/07/2010 03:25 PM, zrl wrote: >>> Hi Martin, >>> >>> Thank you for answering my question. Sorry I didn't make my question clearly. >>> In the case of "gsc <- GeneSetCollection(bcrneg_filt1, >>> setType=KEGGCollection())" and "Am<-incidence(gsc)", we use KEGG as >>> reference to create gene sets of bcrneg_filt1, then create a >>> incidence. >>> >>> My question is what if I use a download geneset database such as >>> "c3.all.v2.5.symbols.gmt" as reference to create gene set of >>> ExpressionSet bcrneg_filt1, then create a incidence matrix. Do I have >>> to manually do this? (I mean, identifying the genes in eset,then >>> correlates them in c3.all.v2.5.symbols.gmt to create gene sets) or is >>> there a direct command doing this? >> >> Hi -- >> >>> c3gsc = getGmt("~/tmp/c3.all.v2.5.symbols.gmt", >> + geneIdType=SymbolIdentifier()) >> >> It's possible to ask for the intersection of a gene set collection with >> specific gene dientifiers, so >> >>> c3gsc & c("DLC1", "FLJ39378") >> >> so for an Affy array like bcrneg_filt1 a command like >> >> library(Biobase) >> data(sample.ExpressionSet) >> eset = sample.ExpressionSet[250:300,] >> symbolIds = getSYMBOL(featureNames(eset), annotation(eset)) >> >> gets the gene symbols, and >> >> c3gsc1 = c3gsc & symbolIds >> >> does the subset. But it might be just as easy to >> >> m = incidence(c3gsc) >> m1 = m[,colnames(m) %in% symbolIds] >> m1 = m1[rowSums(m) != 0, ] >> >> (the & operator alters the names of the gene sets, and keeps empty sets, >> so further processing would probably be needed). >> >> Hope that helps. >> >> Martin >> >> >>> Thanks. >>> >>> >>> >>> >>> >>> >>> On Sun, Feb 7, 2010 at 9:11 AM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: >>>> On 02/06/2010 04:05 PM, zrl wrote: >>>>> Dear list, >>>>> >>>>> I have a question regarding using broad gene sets for GSEA anlaysis. >>>>> >>>>> As we know, we have "gsc <- GeneSetCollection(bcrneg_filt1, >>>>> setType=KEGGCollection())" and "Am<-incidence(gsc)" to generate >>>>> incidence matrix for further anlaysis. >>>>> >>>>> I have learned to get the geneset file from Broad such as: "c3gsc2 <- >>>>> getGmt("/path/to/c3.all.v2.5.symbols.gmt", >>>>> collectionType=BroadCollection(category="c3"), >>>>> geneIdType=SymbolIdentifier())" >>>>> >>>>> My question is how to use c3gsc2 and bcneg_filt1 to create a new >>>>> incidence matrix ? Do I have to manually do this? or there is a >>>>> command which can do this? >>>> >>>> Hi Quidao >>>> >>>> bcneg_filt1 is a subset of an ExpressionSet, and is just another source >>>> for creating a gene set collection. Here you're using >>>> c3.all.v2.5.symbols.gmt as a source for your gene set collection. The >>>> incidence matrix is >>>> >>>>> m <- incidence(c3gsc2) >>>>> class(m) >>>> [1] "matrix" >>>>> dim(m) >>>> [1] 837 15718 >>>>> m[1:5, 1:5] >>>> DLC1 FLJ39378 PTGS1 RORC VPRBP >>>> RGAGGAARY_V$PU1_Q6 1 1 1 1 1 >>>> KRCTCNNNNMANAGC_UNKNOWN 0 0 0 0 0 >>>> AAAYWAACM_V$HFH4_01 0 0 0 0 0 >>>> YYCATTCAWW_UNKNOWN 0 0 0 0 0 >>>> CYTAGCAAY_UNKNOWN 0 0 0 0 0 >>>> >>>> with rows as set names and columns as symbols. >>>> >>>> Martin >>>> >>>>> >>>>> >>>>> >>>>> Thanks. >>>>> >>>>> Qiudao >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at stat.math.ethz.ch >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>>> -- >>>> Martin Morgan >>>> Computational Biology / Fred Hutchinson Cancer Research Center >>>> 1100 Fairview Ave. N. >>>> PO Box 19024 Seattle, WA 98109 >>>> >>>> Location: Arnold Building M1 B861 >>>> Phone: (206) 667-2793 >>>> >> >> >> -- >> Martin Morgan >> Computational Biology / Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. >> PO Box 19024 Seattle, WA 98109 >> >> Location: Arnold Building M1 B861 >> Phone: (206) 667-2793 >> -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793

ADD REPLY • link 15.2 years ago Martin Morgan 25k

0

Entering edit mode

Thank you Martin. If we caculate the statistic in each gene set,is it possible that several probes mapped to the same gene. How will GSEABase deal with the calculation of statistic of a gene set with multiple probes mapped to the same gene? (or maybe this quesiton should be directed to using "category" package, since I always use its "gseattperm"). Thank you again for your detailed explaination and patience. Qiudao On Wed, Feb 10, 2010 at 3:05 PM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: > On 02/10/2010 10:47 AM, zrl wrote: >> Thank you Martin, these are what I want. I like the second method to >> create incidence matrix. >> My last question is in GSEABase when we do this: >> >> "gsc <- GeneSetCollection(bcrneg_filt1, setType=KEGGCollection())" >> >> how does GSEABase collapse the affy probes to gene symbols? >> (max,mean,median or not at all) > > Remember that the gene set is a collection of symbols; expression > doesn't have anything to do with its construction. GeneSetCollection() > uses featureNames(bcrneg_filt1), and then the map between affy probe ids > and KEGG pathways provided by the relevant Bioconductor annotation > package, e.g., hgu95av2.db, hgu95av2PATH. The issue that comes up is > when a probeset id maps to several pathways > > >> featureNames(sample.ExpressionSet[201,]) > [1] "31440_at" >> hgu95av2PATH[[featureNames(sample.ExpressionSet)[201]]] > ?[1] "04310" "04520" "04916" "05200" "05210" "05213" "05215" "05216" "05217" > [10] "05221" "05412" > > and then the probeset id 1200_at is assigned to the 11 sets representing > these different KEGG pathways. > >> GeneSetCollection(sample.ExpressionSet[201,], setType=KEGGCollection()) > GeneSetCollection > ?names: 04310, 04520, ..., 05412 (11 total) > ?unique identifiers: 31440_at (1 total) > ?types in collection: > ? ?geneIdType: AnnotationIdentifier (1 total) > ? ?collectionType: KEGGCollection (1 total) > > Martin > >> >> >> So, if we use download database such as ****.symbols.gmt, >> how should we collapse the probes to symbols? >> >> Sorry to bother you so much. Thank you very much. >> >> Qiudao >> >> >> >> >> >> >> On Wed, Feb 10, 2010 at 9:47 AM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: >>> On 02/07/2010 03:25 PM, zrl wrote: >>>> Hi Martin, >>>> >>>> Thank you for answering my question. Sorry I didn't make my question clearly. >>>> In the case of "gsc <- GeneSetCollection(bcrneg_filt1, >>>> setType=KEGGCollection())" and "Am<-incidence(gsc)", we use KEGG as >>>> reference to create gene sets of bcrneg_filt1, then create a >>>> incidence. >>>> >>>> My question is what if I use a download geneset database such as >>>> "c3.all.v2.5.symbols.gmt" as reference to create gene set of >>>> ExpressionSet bcrneg_filt1, then create a incidence matrix. Do I have >>>> to manually do this? (I mean, identifying the genes in eset,then >>>> correlates them in c3.all.v2.5.symbols.gmt to create gene sets) or is >>>> there a direct command doing this? >>> >>> Hi -- >>> >>>> c3gsc = getGmt("~/tmp/c3.all.v2.5.symbols.gmt", >>> + ? ? ? ? ? ? ? ? geneIdType=SymbolIdentifier()) >>> >>> It's possible to ask for the intersection of a gene set collection with >>> specific gene dientifiers, so >>> >>>> c3gsc & c("DLC1", "FLJ39378") >>> >>> so for an Affy array like bcrneg_filt1 a command like >>> >>> ?library(Biobase) >>> ?data(sample.ExpressionSet) >>> ?eset = sample.ExpressionSet[250:300,] >>> ?symbolIds = getSYMBOL(featureNames(eset), annotation(eset)) >>> >>> gets the gene symbols, and >>> >>> ?c3gsc1 = c3gsc & symbolIds >>> >>> does the subset. But it might be just as easy to >>> >>> ?m = incidence(c3gsc) >>> ?m1 = m[,colnames(m) %in% symbolIds] >>> ?m1 = m1[rowSums(m) != 0, ] >>> >>> (the & operator alters the names of the gene sets, and keeps empty sets, >>> so further processing would probably be needed). >>> >>> Hope that helps. >>> >>> Martin >>> >>> >>>> Thanks. >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Sun, Feb 7, 2010 at 9:11 AM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: >>>>> On 02/06/2010 04:05 PM, zrl wrote: >>>>>> Dear list, >>>>>> >>>>>> I have a question regarding using broad gene sets for GSEA anlaysis. >>>>>> >>>>>> As we know, we have "gsc <- GeneSetCollection(bcrneg_filt1, >>>>>> setType=KEGGCollection())" and "Am<-incidence(gsc)" to generate >>>>>> incidence matrix for further anlaysis. >>>>>> >>>>>> I have learned to get the geneset file from Broad such as: "c3gsc2 <- >>>>>> getGmt("/path/to/c3.all.v2.5.symbols.gmt", >>>>>> collectionType=BroadCollection(category="c3"), >>>>>> geneIdType=SymbolIdentifier())" >>>>>> >>>>>> My question is how to use c3gsc2 and bcneg_filt1 to create a new >>>>>> incidence matrix ? Do I have to manually do this? or there is a >>>>>> command which can do this? >>>>> >>>>> Hi Quidao >>>>> >>>>> bcneg_filt1 is a subset of an ExpressionSet, and is just another source >>>>> for creating a gene set collection. Here you're using >>>>> c3.all.v2.5.symbols.gmt as a source for your gene set collection. The >>>>> incidence matrix is >>>>> >>>>>> m <- incidence(c3gsc2) >>>>>> class(m) >>>>> [1] "matrix" >>>>>> dim(m) >>>>> [1] ? 837 15718 >>>>>> m[1:5, 1:5] >>>>> ? ? ? ? ? ? ? ? ? ? ? ?DLC1 FLJ39378 PTGS1 RORC VPRBP >>>>> RGAGGAARY_V$PU1_Q6 ? ? ? ? 1 ? ? ? ?1 ? ? 1 ? ?1 ? ? 1 >>>>> KRCTCNNNNMANAGC_UNKNOWN ? ?0 ? ? ? ?0 ? ? 0 ? ?0 ? ? 0 >>>>> AAAYWAACM_V$HFH4_01 ? ? ? ?0 ? ? ? ?0 ? ? 0 ? ?0 ? ? 0 >>>>> YYCATTCAWW_UNKNOWN ? ? ? ? 0 ? ? ? ?0 ? ? 0 ? ?0 ? ? 0 >>>>> CYTAGCAAY_UNKNOWN ? ? ? ? ?0 ? ? ? ?0 ? ? 0 ? ?0 ? ? 0 >>>>> >>>>> with rows as set names and columns as symbols. >>>>> >>>>> Martin >>>>> >>>>>> >>>>>> >>>>>> >>>>>> Thanks. >>>>>> >>>>>> Qiudao >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at stat.math.ethz.ch >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>>> >>>>> -- >>>>> Martin Morgan >>>>> Computational Biology / Fred Hutchinson Cancer Research Center >>>>> 1100 Fairview Ave. N. >>>>> PO Box 19024 Seattle, WA 98109 >>>>> >>>>> Location: Arnold Building M1 B861 >>>>> Phone: (206) 667-2793 >>>>> >>> >>> >>> -- >>> Martin Morgan >>> Computational Biology / Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N. >>> PO Box 19024 Seattle, WA 98109 >>> >>> Location: Arnold Building M1 B861 >>> Phone: (206) 667-2793 >>> > > > -- > Martin Morgan > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 >

ADD REPLY • link 15.2 years ago Roger Liu ▴ 260

0

Entering edit mode

Dear All, I experience installation problems of the Flow Cytometry Packages on Ubuntu Karmic 9.10. My R version is 2.9.2 I can only install parts of the packages mentioned on the following page: http://bioconductor.org/docs/workflows/flowoverview/ I always get errormessages like this one: ####begin error msg > biocLite("flowMerge") Using R version 2.9.2, biocinstall version 2.4.13. Installing Bioconductor version 2.4 packages: [1] "flowMerge" Please wait... Warnung in install.packages(pkgs = pkgs, repos = repos, dependencies = dependencies, : argument 'lib' is missing: using '/usr/local/lib/R/site-library' Warnmeldung: In getDependencies(pkgs, dependencies, available, lib) : package ?flowMerge? is not available ####end error msg I really appreciate your help! Thanks in advance! Greetings, Philipp

ADD REPLY • link 15.2 years ago Philipp Meng ▴ 20

0

Entering edit mode

Hi Philipp -- On 02/07/2010 02:19 PM, Philipp Meng wrote: > Dear All, > > > I experience installation problems of the Flow Cytometry Packages on > Ubuntu Karmic 9.10. > My R version is 2.9.2 > > I can only install parts of the packages mentioned on the following > page: > http://bioconductor.org/docs/workflows/flowoverview/ > > I always get errormessages like this one: > ####begin error msg >> biocLite("flowMerge") > Using R version 2.9.2, biocinstall version 2.4.13. > Installing Bioconductor version 2.4 packages: > [1] "flowMerge" > Please wait... > > Warnung in install.packages(pkgs = pkgs, repos = repos, dependencies = > dependencies, : > argument 'lib' is missing: using '/usr/local/lib/R/site-library' > Warnmeldung: > In getDependencies(pkgs, dependencies, available, lib) : > package ?flowMerge? is not available > ####end error msg flowMerge wasn't introduced until Bioc 2.5 / R-2.10; you'll need to update your R before being able to install flowMerge. To find out about packages available for your release, visit http://bioconductor.org, follow Software > Downloads and then Past BioC Releases on the left navigation panel. Martin > > I really appreciate your help! > Thanks in advance! > > Greetings, > > Philipp > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793

ADD REPLY • link 15.2 years ago Martin Morgan 25k

0

Entering edit mode

On 2/7/10 2:19 PM, Philipp Meng wrote: > Dear All, > > > I experience installation problems of the Flow Cytometry Packages on > Ubuntu Karmic 9.10. > My R version is 2.9.2 > First suggestion is to upgrade to the latest version of R and thus be able to use the most recent Bioconductor release. + seth

ADD REPLY • link 15.2 years ago Seth Falcon ★ 7.4k

Login before adding your answer.