Analysing Human Gene ST 1.0 Arrays with oligo and oneChannelGUI yield different number of probesets
1
0
Entering edit mode
@javier-perez-florido-3121
Last seen 6.8 years ago
Dear list, Some time ago I analysed a set of Human Gene ST Arrays with oneChannelGUI. Now I'm trying to reproduce the results using oligo package but I am quite surprised with the results obtained. With oligo package, after preprocessing with rma, the number of probesets are 253002 while with oneChannelGUI the number of probesets are 33297, and the CEL files are the same!!! For oligo package, and prior to read the CEL files, I had to build the annotation package using pdInfoPackage, since the CDF file is not supported by Affymetrix. For this purpose, first I had to download the library files "Human Gene 1.0 ST Array, Analysis" from Affymetrix website. The necessary files for building the package are: HuGene-1_0-st-v1.r4.pgf HuGene-1_0-st-v1.r4.clf HuGene-1_0-st-v1.na29.hg18.probeset (CSV file) Then, I executed the following commands: library(pdInfoBuilder) baseDir <- "pathWhereTheFilesAre" (pgf <- list.files(baseDir, pattern = ".pgf",full.names = TRUE)) (clf <- list.files(baseDir, pattern = ".clf",full.names = TRUE)) (prob <- list.files(baseDir, pattern = ".probeset.csv",full.names = TRUE)) seed <- new("AffyGenePDInfoPkgSeed",pgfFile = pgf, clfFile = clf,probeFile = prob, author = "Javier",email = "email",biocViews = "AnnotationData",genomebuild = "NCBI Build 36",organism = "Human", species = "Homo Sapiens",url = "") makePdInfoPackage(seed, destDir = ".") And I installed the package: R CMD INSTALL pd.hugene.1.0.st.v1\ The package was installed OK and I read and preprocessed the CEL files using RMA, but the number of probesets are 253002!!!! So many probesets compared to the ones given by oneChannelGUI. Any comments for such big difference?? Thanks, Javier
Preprocessing cdf oligo oneChannelGUI Preprocessing cdf oligo oneChannelGUI • 2.3k views
ADD COMMENT
0
Entering edit mode
@benilton-carvalho-1375
Last seen 4.7 years ago
Brazil/Campinas/UNICAMP
Dear Javier, You have not provided the exact call to RMA you used nor your sessionInfo() information. If you're using the latest oligo (BioC 2.5), you can call: results = rma(object, target="core") to get the 33297 "probesets" you refer to... Note that building the package yourself is a nice exercise, but you could just download it via biocLite(). Cheers, b On Oct 29, 2009, at 5:42 PM, Javier P?rez Florido wrote: > Dear list, > Some time ago I analysed a set of Human Gene ST Arrays with > oneChannelGUI. Now I'm trying to reproduce the results using oligo > package but I am quite surprised with the results obtained. With oligo > package, after preprocessing with rma, the number of probesets are > 253002 while with oneChannelGUI the number of probesets are 33297, and > the CEL files are the same!!! > > For oligo package, and prior to read the CEL files, I had to build > the > annotation package using pdInfoPackage, since the CDF file is not > supported by Affymetrix. For this purpose, first I had to download the > library files "Human Gene 1.0 ST Array, Analysis" from Affymetrix > website. The necessary files for building the package are: > HuGene-1_0-st-v1.r4.pgf > HuGene-1_0-st-v1.r4.clf > HuGene-1_0-st-v1.na29.hg18.probeset (CSV file) > > Then, I executed the following commands: > library(pdInfoBuilder) > baseDir <- "pathWhereTheFilesAre" > (pgf <- list.files(baseDir, pattern = ".pgf",full.names = TRUE)) > (clf <- list.files(baseDir, pattern = ".clf",full.names = TRUE)) > (prob <- list.files(baseDir, pattern = ".probeset.csv",full.names = > TRUE)) > seed <- new("AffyGenePDInfoPkgSeed",pgfFile = pgf, clfFile = > clf,probeFile = prob, author = "Javier",email = "email",biocViews = > "AnnotationData",genomebuild = "NCBI Build 36",organism = "Human", > species = "Homo Sapiens",url = "") > makePdInfoPackage(seed, destDir = ".") > > And I installed the package: > R CMD INSTALL pd.hugene.1.0.st.v1\ > > The package was installed OK and I read and preprocessed the CEL files > using RMA, but the number of probesets are 253002!!!! So many > probesets > compared to the ones given by oneChannelGUI. > > Any comments for such big difference?? > Thanks, > Javier > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Dear Benilton, Thanks for your quick reply. Now, it works with the target argument. However, I searched on the web for the meaning of this argument and couldn't find anything. What is "target" for? Why does oligo's manual say: "The ExpressionSet returned when either Exon/Gene-FeatureSet objects are passed contain extra annotation on the featureData slot that the user should take into account for exon/gene-level analyses"? I didn't work with Human Gene ST arrays before, so, I quite new on this topic. Thanks again, Javier Benilton Carvalho escribi?: > Dear Javier, > > You have not provided the exact call to RMA you used nor your > sessionInfo() information. > > If you're using the latest oligo (BioC 2.5), you can call: > > results = rma(object, target="core") > > to get the 33297 "probesets" you refer to... > > Note that building the package yourself is a nice exercise, but you > could just download it via biocLite(). > > Cheers, > > b > > On Oct 29, 2009, at 5:42 PM, Javier P?rez Florido wrote: > >> Dear list, >> Some time ago I analysed a set of Human Gene ST Arrays with >> oneChannelGUI. Now I'm trying to reproduce the results using oligo >> package but I am quite surprised with the results obtained. With oligo >> package, after preprocessing with rma, the number of probesets are >> 253002 while with oneChannelGUI the number of probesets are 33297, and >> the CEL files are the same!!! >> >> For oligo package, and prior to read the CEL files, I had to build the >> annotation package using pdInfoPackage, since the CDF file is not >> supported by Affymetrix. For this purpose, first I had to download the >> library files "Human Gene 1.0 ST Array, Analysis" from Affymetrix >> website. The necessary files for building the package are: >> HuGene-1_0-st-v1.r4.pgf >> HuGene-1_0-st-v1.r4.clf >> HuGene-1_0-st-v1.na29.hg18.probeset (CSV file) >> >> Then, I executed the following commands: >> library(pdInfoBuilder) >> baseDir <- "pathWhereTheFilesAre" >> (pgf <- list.files(baseDir, pattern = ".pgf",full.names = TRUE)) >> (clf <- list.files(baseDir, pattern = ".clf",full.names = TRUE)) >> (prob <- list.files(baseDir, pattern = ".probeset.csv",full.names = >> TRUE)) >> seed <- new("AffyGenePDInfoPkgSeed",pgfFile = pgf, clfFile = >> clf,probeFile = prob, author = "Javier",email = "email",biocViews = >> "AnnotationData",genomebuild = "NCBI Build 36",organism = "Human", >> species = "Homo Sapiens",url = "") >> makePdInfoPackage(seed, destDir = ".") >> >> And I installed the package: >> R CMD INSTALL pd.hugene.1.0.st.v1\ >> >> The package was installed OK and I read and preprocessed the CEL files >> using RMA, but the number of probesets are 253002!!!! So many probesets >> compared to the ones given by oneChannelGUI. >> >> Any comments for such big difference?? >> Thanks, >> Javier >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD REPLY
0
Entering edit mode
That makes me think that I forgot one 'svn commit' sometime in the past... Apologies for that. In the meantime, please use the following description. Until BioC 2.4, oligo summarized only to the probeset level (as defined in the PGF file). Affymetrix made available meta-probeset files (MPS) that define "new probesets", which allow summarization to the gene-level. For exon arrays, there are 3 MPSs (depending on the quality): core (best), extended and full. For gene arrays, there's only "core" MPS. Therefore, summaries to the gene level should use this additional annotation. So, using the 'target' argument, you can set to what level you want the summarization to be: "probeset", "core", "extended" and "full" are the possible values (this is available starting now on BioC 2.5). I'll make sure the documentation is updated soon to reflect this change. Once again, apologies. b On Oct 29, 2009, at 8:21 PM, Javier P?rez Florido wrote: > Dear Benilton, > Thanks for your quick reply. Now, it works with the target argument. > However, I searched on the web for the meaning of this argument and > couldn't find anything. What is "target" for? > Why does oligo's manual say: "The ExpressionSet returned when either > Exon/Gene-FeatureSet objects are passed contain extra annotation on > the > featureData slot that the user should take into account for > exon/gene-level analyses"? > I didn't work with Human Gene ST arrays before, so, I quite new on > this > topic. > Thanks again, > Javier > > > > > > Benilton Carvalho escribi?: >> Dear Javier, >> >> You have not provided the exact call to RMA you used nor your >> sessionInfo() information. >> >> If you're using the latest oligo (BioC 2.5), you can call: >> >> results = rma(object, target="core") >> >> to get the 33297 "probesets" you refer to... >> >> Note that building the package yourself is a nice exercise, but you >> could just download it via biocLite(). >> >> Cheers, >> >> b >> >> On Oct 29, 2009, at 5:42 PM, Javier P?rez Florido wrote: >> >>> Dear list, >>> Some time ago I analysed a set of Human Gene ST Arrays with >>> oneChannelGUI. Now I'm trying to reproduce the results using oligo >>> package but I am quite surprised with the results obtained. With >>> oligo >>> package, after preprocessing with rma, the number of probesets are >>> 253002 while with oneChannelGUI the number of probesets are 33297, >>> and >>> the CEL files are the same!!! >>> >>> For oligo package, and prior to read the CEL files, I had to >>> build the >>> annotation package using pdInfoPackage, since the CDF file is not >>> supported by Affymetrix. For this purpose, first I had to download >>> the >>> library files "Human Gene 1.0 ST Array, Analysis" from Affymetrix >>> website. The necessary files for building the package are: >>> HuGene-1_0-st-v1.r4.pgf >>> HuGene-1_0-st-v1.r4.clf >>> HuGene-1_0-st-v1.na29.hg18.probeset (CSV file) >>> >>> Then, I executed the following commands: >>> library(pdInfoBuilder) >>> baseDir <- "pathWhereTheFilesAre" >>> (pgf <- list.files(baseDir, pattern = ".pgf",full.names = TRUE)) >>> (clf <- list.files(baseDir, pattern = ".clf",full.names = TRUE)) >>> (prob <- list.files(baseDir, pattern = ".probeset.csv",full.names = >>> TRUE)) >>> seed <- new("AffyGenePDInfoPkgSeed",pgfFile = pgf, clfFile = >>> clf,probeFile = prob, author = "Javier",email = "email",biocViews = >>> "AnnotationData",genomebuild = "NCBI Build 36",organism = "Human", >>> species = "Homo Sapiens",url = "") >>> makePdInfoPackage(seed, destDir = ".") >>> >>> And I installed the package: >>> R CMD INSTALL pd.hugene.1.0.st.v1\ >>> >>> The package was installed OK and I read and preprocessed the CEL >>> files >>> using RMA, but the number of probesets are 253002!!!! So many >>> probesets >>> compared to the ones given by oneChannelGUI. >>> >>> Any comments for such big difference?? >>> Thanks, >>> Javier >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >
ADD REPLY
0
Entering edit mode
Dear Benilton, Thanks for your help. I have more questions. What is the summarization at gene-level? I thought that a probeset = gene. The "new probesets" defined in the MPS file, are related to the experiment or are they controls? Two more things: - I would like to perform an analysis without the control genes. How may I know which genes are controls to remove them from the analysis? - What is the best package available for annotation?hugene10stprobeset.db? I suppose that using the featureNames of the expression set, I can get the ENTREZID of the probesets through this annotation package. Thanks again, Javier Benilton Carvalho escribi?: > That makes me think that I forgot one 'svn commit' sometime in the > past... Apologies for that. > > In the meantime, please use the following description. > > Until BioC 2.4, oligo summarized only to the probeset level (as > defined in the PGF file). Affymetrix made available meta-probeset > files (MPS) that define "new probesets", which allow summarization to > the gene-level. For exon arrays, there are 3 MPSs (depending on the > quality): core (best), extended and full. For gene arrays, there's > only "core" MPS. > > Therefore, summaries to the gene level should use this additional > annotation. > > So, using the 'target' argument, you can set to what level you want > the summarization to be: "probeset", "core", "extended" and "full" are > the possible values (this is available starting now on BioC 2.5). > > I'll make sure the documentation is updated soon to reflect this change. > > Once again, apologies. > > b > > On Oct 29, 2009, at 8:21 PM, Javier P?rez Florido wrote: > >> Dear Benilton, >> Thanks for your quick reply. Now, it works with the target argument. >> However, I searched on the web for the meaning of this argument and >> couldn't find anything. What is "target" for? >> Why does oligo's manual say: "The ExpressionSet returned when either >> Exon/Gene-FeatureSet objects are passed contain extra annotation on the >> featureData slot that the user should take into account for >> exon/gene-level analyses"? >> I didn't work with Human Gene ST arrays before, so, I quite new on this >> topic. >> Thanks again, >> Javier >> >> >> >> >> >> Benilton Carvalho escribi?: >>> Dear Javier, >>> >>> You have not provided the exact call to RMA you used nor your >>> sessionInfo() information. >>> >>> If you're using the latest oligo (BioC 2.5), you can call: >>> >>> results = rma(object, target="core") >>> >>> to get the 33297 "probesets" you refer to... >>> >>> Note that building the package yourself is a nice exercise, but you >>> could just download it via biocLite(). >>> >>> Cheers, >>> >>> b >>> >>> On Oct 29, 2009, at 5:42 PM, Javier P?rez Florido wrote: >>> >>>> Dear list, >>>> Some time ago I analysed a set of Human Gene ST Arrays with >>>> oneChannelGUI. Now I'm trying to reproduce the results using oligo >>>> package but I am quite surprised with the results obtained. With oligo >>>> package, after preprocessing with rma, the number of probesets are >>>> 253002 while with oneChannelGUI the number of probesets are 33297, and >>>> the CEL files are the same!!! >>>> >>>> For oligo package, and prior to read the CEL files, I had to build >>>> the >>>> annotation package using pdInfoPackage, since the CDF file is not >>>> supported by Affymetrix. For this purpose, first I had to download the >>>> library files "Human Gene 1.0 ST Array, Analysis" from Affymetrix >>>> website. The necessary files for building the package are: >>>> HuGene-1_0-st-v1.r4.pgf >>>> HuGene-1_0-st-v1.r4.clf >>>> HuGene-1_0-st-v1.na29.hg18.probeset (CSV file) >>>> >>>> Then, I executed the following commands: >>>> library(pdInfoBuilder) >>>> baseDir <- "pathWhereTheFilesAre" >>>> (pgf <- list.files(baseDir, pattern = ".pgf",full.names = TRUE)) >>>> (clf <- list.files(baseDir, pattern = ".clf",full.names = TRUE)) >>>> (prob <- list.files(baseDir, pattern = ".probeset.csv",full.names = >>>> TRUE)) >>>> seed <- new("AffyGenePDInfoPkgSeed",pgfFile = pgf, clfFile = >>>> clf,probeFile = prob, author = "Javier",email = "email",biocViews = >>>> "AnnotationData",genomebuild = "NCBI Build 36",organism = "Human", >>>> species = "Homo Sapiens",url = "") >>>> makePdInfoPackage(seed, destDir = ".") >>>> >>>> And I installed the package: >>>> R CMD INSTALL pd.hugene.1.0.st.v1\ >>>> >>>> The package was installed OK and I read and preprocessed the CEL files >>>> using RMA, but the number of probesets are 253002!!!! So many >>>> probesets >>>> compared to the ones given by oneChannelGUI. >>>> >>>> Any comments for such big difference?? >>>> Thanks, >>>> Javier >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >> > >
ADD REPLY
0
Entering edit mode
Javier, the gene array is a "subset" of the exon array, therefore the probesets map to exons. The (core) MPS file groups (reliable) probesets forming "meta probesets", which map to genes and, AFAIK, do not include controls. I'm not sure what you mean with "best package available for annotation". The "pd.hugene*" package is the one used by oligo so you can preprocess the data. The hugene10stprobeset.db will give you information on the *probeset*. If you summarize to the gene level, you'll be looking at the hugene10sttranscriptcluster.db package. In this case, say you want the ENTREZID for "7896759", then you can just: library(hugene10sttranscriptcluster.db) hugene10sttranscriptclusterENTREZID[["7896759"]] Cheers, b On Oct 30, 2009, at 10:41 AM, Javier P?rez Florido wrote: > Dear Benilton, > Thanks for your help. I have more questions. What is the summarization > at gene-level? I thought that a probeset = gene. > The "new probesets" defined in the MPS file, are related to the > experiment or are they controls? > > Two more things: > - I would like to perform an analysis without the control genes. How > may > I know which genes are controls to remove them from the analysis? > - What is the best package available for > annotation?hugene10stprobeset.db? I suppose that using the > featureNames > of the expression set, I can get the ENTREZID of the probesets through > this annotation package. > > Thanks again, > Javier > > > Benilton Carvalho escribi?: >> That makes me think that I forgot one 'svn commit' sometime in the >> past... Apologies for that. >> >> In the meantime, please use the following description. >> >> Until BioC 2.4, oligo summarized only to the probeset level (as >> defined in the PGF file). Affymetrix made available meta-probeset >> files (MPS) that define "new probesets", which allow summarization to >> the gene-level. For exon arrays, there are 3 MPSs (depending on the >> quality): core (best), extended and full. For gene arrays, there's >> only "core" MPS. >> >> Therefore, summaries to the gene level should use this additional >> annotation. >> >> So, using the 'target' argument, you can set to what level you want >> the summarization to be: "probeset", "core", "extended" and "full" >> are >> the possible values (this is available starting now on BioC 2.5). >> >> I'll make sure the documentation is updated soon to reflect this >> change. >> >> Once again, apologies. >> >> b >> >> On Oct 29, 2009, at 8:21 PM, Javier P?rez Florido wrote: >> >>> Dear Benilton, >>> Thanks for your quick reply. Now, it works with the target argument. >>> However, I searched on the web for the meaning of this argument and >>> couldn't find anything. What is "target" for? >>> Why does oligo's manual say: "The ExpressionSet returned when either >>> Exon/Gene-FeatureSet objects are passed contain extra annotation >>> on the >>> featureData slot that the user should take into account for >>> exon/gene-level analyses"? >>> I didn't work with Human Gene ST arrays before, so, I quite new on >>> this >>> topic. >>> Thanks again, >>> Javier >>> >>> >>> >>> >>> >>> Benilton Carvalho escribi?: >>>> Dear Javier, >>>> >>>> You have not provided the exact call to RMA you used nor your >>>> sessionInfo() information. >>>> >>>> If you're using the latest oligo (BioC 2.5), you can call: >>>> >>>> results = rma(object, target="core") >>>> >>>> to get the 33297 "probesets" you refer to... >>>> >>>> Note that building the package yourself is a nice exercise, but you >>>> could just download it via biocLite(). >>>> >>>> Cheers, >>>> >>>> b >>>> >>>> On Oct 29, 2009, at 5:42 PM, Javier P?rez Florido wrote: >>>> >>>>> Dear list, >>>>> Some time ago I analysed a set of Human Gene ST Arrays with >>>>> oneChannelGUI. Now I'm trying to reproduce the results using oligo >>>>> package but I am quite surprised with the results obtained. With >>>>> oligo >>>>> package, after preprocessing with rma, the number of probesets are >>>>> 253002 while with oneChannelGUI the number of probesets are >>>>> 33297, and >>>>> the CEL files are the same!!! >>>>> >>>>> For oligo package, and prior to read the CEL files, I had to >>>>> build >>>>> the >>>>> annotation package using pdInfoPackage, since the CDF file is not >>>>> supported by Affymetrix. For this purpose, first I had to >>>>> download the >>>>> library files "Human Gene 1.0 ST Array, Analysis" from Affymetrix >>>>> website. The necessary files for building the package are: >>>>> HuGene-1_0-st-v1.r4.pgf >>>>> HuGene-1_0-st-v1.r4.clf >>>>> HuGene-1_0-st-v1.na29.hg18.probeset (CSV file) >>>>> >>>>> Then, I executed the following commands: >>>>> library(pdInfoBuilder) >>>>> baseDir <- "pathWhereTheFilesAre" >>>>> (pgf <- list.files(baseDir, pattern = ".pgf",full.names = TRUE)) >>>>> (clf <- list.files(baseDir, pattern = ".clf",full.names = TRUE)) >>>>> (prob <- list.files(baseDir, pattern = >>>>> ".probeset.csv",full.names = >>>>> TRUE)) >>>>> seed <- new("AffyGenePDInfoPkgSeed",pgfFile = pgf, clfFile = >>>>> clf,probeFile = prob, author = "Javier",email = >>>>> "email",biocViews = >>>>> "AnnotationData",genomebuild = "NCBI Build 36",organism = "Human", >>>>> species = "Homo Sapiens",url = "") >>>>> makePdInfoPackage(seed, destDir = ".") >>>>> >>>>> And I installed the package: >>>>> R CMD INSTALL pd.hugene.1.0.st.v1\ >>>>> >>>>> The package was installed OK and I read and preprocessed the CEL >>>>> files >>>>> using RMA, but the number of probesets are 253002!!!! So many >>>>> probesets >>>>> compared to the ones given by oneChannelGUI. >>>>> >>>>> Any comments for such big difference?? >>>>> Thanks, >>>>> Javier >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at stat.math.ethz.ch >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>> >> >> >
ADD REPLY

Login before adding your answer.

Traffic: 638 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6