topTable (fit) annotation

0

Entering edit mode

Jing Huang ▴ 380

@jing-huang-4737

Last seen 10.5 years ago

Sorry. My problem is the colnames(topTable(fit)) that I generated by GEO GSE dataset only provides a few items such as pValue... I would like to have a full table just like GEO GDS dataset generated. I describe the problem at my original email: Different dataset of GEO provides different patten on topTable(fit). This generated with GDS dataset >colnames(topTable(fit)) [1] "ID" "Gene.title" "Gene.symbol" [4] "Gene.ID" "UniGene.title" "UniGene.symbol" [7] "UniGene.ID" "Nucleotide.Title" "GI" [10] "GenBank.Accession" "Platform_CLONEID" "Platform_ORF" [13] "Platform_SPOTID" "Chromosome.location" "Chromosome.annotation" [16] "GO.Function" "GO.Process" "GO.Component" [19] "GO.Function.1" "GO.Process.1" "GO.Component.1" [22] "CTRL" "HIF1a" "HIF2a" [25] "HIF1a2a" "AveExpr" "F" [28] "P.Value" "adj.P.Val" This generated with GSE dataset >colnames(topTable(fit) [1] "ID" "mir210" "CTRL2" "AveExpr" "F" "P.Value" "adj.P.Val" I hope this is clear. Jing On 8/31/11 9:38 AM, "Freudenberg, Johannes (NIH/NIEHS) [E]" <johannes.freudenberg at="" nih.gov=""> wrote: >Hi Jing, > >I'm not quite sure I understand your question. Why can't you use the >data for further analysis? Maybe the issue is that getGEO() returns a >list? Have you tried something like: > >> gse16962 <- getGEO("GSE16962") >> eset <- gse16962[[1]] >> head(fData(eset)) > ID GB_ACC SPOT_ID Species.Scientific.Name Annotation.Date >1007_s_at 1007_s_at U48705 <na> Homo sapiens Mar 11, 2009 >1053_at 1053_at M87338 <na> Homo sapiens Mar 11, 2009 >117_at 117_at X51757 <na> Homo sapiens Mar 11, 2009 >121_at 121_at X69699 <na> Homo sapiens Mar 11, 2009 >1255_g_at 1255_g_at L36861 <na> Homo sapiens Mar 11, 2009 >1294_at 1294_at L13852 <na> Homo sapiens Mar 11, 2009 > Sequence.Type Sequence.Source >1007_s_at Exemplar sequence Affymetrix Proprietary Database >1053_at Exemplar sequence GenBank >117_at Exemplar sequence Affymetrix Proprietary Database >121_at Exemplar sequence GenBank >1255_g_at Exemplar sequence Affymetrix Proprietary Database >1294_at Exemplar sequence GenBank > >Etc. > >--Johannes > > > >-----Original Message----- >From: Jing Huang [mailto:huangji at ohsu.edu] >Sent: Wednesday, August 31, 2011 12:21 PM >To: Davis, Sean (NIH/NCI) [E] >Cc: bioconductor at r-project.org >Subject: Re: [BioC] topTable (fit) annotation > >Thank YOU Sean for responding my question. I am not sure where I should >add the platform annotation in. > >Here are what I observed: > >If I extract "GDS2162" by typing in > >> gds=getGEO("GDS2162") >> eset=GDS2eSet(gds,do.log2=T) > >>eset > >ExpressionSet (storageMode: lockedEnvironment) >assayData: 45101 features, 16 samples > element names: exprs >protocolData: none >phenoData > sampleNames: GSM67339 GSM67343 ... GSM67352 (16 total) > varLabels: sample genotype/variation agent description > varMetadata: labelDescription >featureData > featureNames: 1415670_at 1415671_at ... AFFX-TrpnX-M_at (45101 total) > fvarLabels: ID Gene.title ... GO.Component.1 (21 total) > > fvarMetadata: Column labelDescription >experimentData: use 'experimentData(object)' > pubMedIds: 16237459 >Annotation: > >Most of annotation is in. > > > >If I extract "GSE16962" by typing in > >>gse=getGEO("GSE16962") > >> gse >$GSE16962_series_matrix.txt.gz >ExpressionSet (storageMode: lockedEnvironment) >assayData: 54675 features, 12 samples > element names: exprs >protocolData: none >phenoData > sampleNames: GSM424759 GSM424760 ... GSM424770 (12 total) > varLabels: title geo_accession ... data_row_count (34 total) > varMetadata: labelDescription >featureData > featureNames: 1007_s_at 1053_at ... AFFX-TrpnX-M_at (54675 total) > fvarLabels: ID GB_ACC ... Gene.Ontology.Molecular.Function (16 total) > fvarMetadata: Column Description labelDescription >experimentData: use 'experimentData(object)' >Annotation: GPL570 > >It looks to me it includes annotation package such as GO term.... >But I can't use this eset data to do further analysis (such as fit table) >what I need. > >If I extract ("GSE16962") by typing in > >>gse=getGEO("GSE16962", GSEMatrix=F) > >Then following GEOquery package, I can generate eset2, which I can use to >do analysis what I need. But the eset2 looks like this: > >> eset2 >ExpressionSet (storageMode: lockedEnvironment) >assayData: 54675 features, 12 samples > element names: exprs >protocolData: none >phenoData > sampleNames: GSM424759 GSM424760 ... GSM424770 (12 total) > varLabels: samples > varMetadata: labelDescription >featureData: none >experimentData: use 'experimentData(object)' >Annotation: > > > >Here are the scripts that I use to generate eset2: > >>probesets <- Table(GPLList(gse)[[1]])$ID data.matrix <- >>do.call("cbind", lapply(GSMList(gse), function(x) { >+ tab <- Table(x) >+ mymatch <- match(probesets,tab$ID_REF) >+ return(tab$VALUE[mymatch]) >+ })) >> data.matrix <- apply(data.matrix, 2, function(x) { >+ as.numeric(as.character(x)) >+ }) >> require(Biobase) >> rownames(data.matrix) <- probesets >> colnames(data.matrix) <- names(GSMList(gse)) pdata <- >> data.frame(samples=names(GSMList(gse))) >> rownames(pdata) <- names(GSMList(gse)) pheno <- >> as(pdata,"AnnotatedDataFrame") >> eset2 <- new('ExpressionSet',exprs=data.matrix,phenoData=pheno) > > >At which step, I should add > >>gplannot = getGEO("GPL96", AnnotGPL=TRUE) > > >Many Many thanks > >Jing > > > > > > > > > > > > > > > >On 8/30/11 6:01 PM, "Sean Davis" <sdavis2 at="" mail.nih.gov=""> wrote: > >>Hi, Jing. >> >>NCBI GEO maintains two types of GPL records. The normal variant is >>just supplied by the submitter. However, when a GEO Series is curated >>by NCBI GEO into a GEO DataSet (GDS), they create a so-called >>"Annotation GPL". These have a relatively standard set of columns. I >>have not made the change to GEOquery yet to grab this annotation GPL >>when getting Series Matrix files. But, you can get them yourself by >>specifying: >> >>gplannot = getGEO("GPL96", AnnotGPL=TRUE) >> >>You can always replace the feature data of the ExpressionSets with the >>information in the retrieved Annotation GPL. >> >>I hope that is clear. >> >>Sean >> >> >>On Tue, Aug 30, 2011 at 5:01 PM, Jing Huang <huangji at="" ohsu.edu=""> wrote: >>> Dear All members, >>> >>> I have been extracting data from GEO (GEO package) and do some >>>analysis on them by using limma package. What I discover is the >>>components of >>>topTable(fit) are different from the dataset GDS and GSE. >>> >>> If the data is from GDS, then the colnames of topTable (fit) looks >>>like this. >>> >>>> colnames(topTable(fit)) >>> [1] "ID" "Gene.title" "Gene.symbol" >>> [4] "Gene.ID" "UniGene.title" "UniGene.symbol" >>> [7] "UniGene.ID" "Nucleotide.Title" "GI" >>> [10] "GenBank.Accession" "Platform_CLONEID" "Platform_ORF" >>> [13] "Platform_SPOTID" "Chromosome.location" >>>"Chromosome.annotation" >>> [16] "GO.Function" "GO.Process" "GO.Component" >>> [19] "GO.Function.1" "GO.Process.1" "GO.Component.1" >>> [22] "CTRL" "HIF1a" "HIF2a" >>> [25] "HIF1a2a" "AveExpr" "F" >>> [28] "P.Value" "adj.P.Val" >>> >>> If the data is from GSE, then the colnames of topTable(fit) looks >>>like this: >>> >>>>colnames(topTable(fit) >>> >>> [1] "ID" "mir210" "CTRL2" "AveExpr" "F" >>>"P.Value" "adj.P.Val" >>> >>> I am trying to add some term into this table by doing following one >>>by >>>one: the data is generated by Affymetrix human U133 platform: >>> >>>>Library(hgu133plus2.db) >>>>x=hgu133plus2SYMBOL >>>>y=topTable(fit) >>>>y$SYMBOL=unlist(as.list(x[y$ID])) >>> >>> It works but I need to add ENTREZID,SYMBOL,CHR, CHRloc, and GO >>>annotations as well. I like to have the topTable more like the >>>topTable(fit) generated at top by data GEO GDS data >>> >>> I am wondering if there is an easy way to annotate all once. >>> >>> In addition, I am having a trouble to annotate GO term. >>> >>> Many Thanks >>> >>> Jing >>> >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>>http://news.gmane.org/gmane.science.biology.informatics.conductor >>> > >_______________________________________________ >Bioconductor mailing list >Bioconductor at r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor

Annotation GO Homo sapiens annotate limma GEOquery Annotation GO Homo sapiens annotate • 1.2k views

ADD COMMENT • link 13.5 years ago Jing Huang ▴ 380

Login before adding your answer.