Inconsistency in RMA results from 'affy' and results from 'oligo'

0

Entering edit mode

Peng Yu ▴ 950

@peng-yu-3586

Last seen 10.6 years ago

Hi Benilton, If both of scripts do not generate errors, does it mean that both of them are correct? But the RMA results are very different, so one of them must be wrong? Regards, Peng On Mon, Aug 10, 2009 at 7:54 PM, Benilton Carvalho<bcarvalh at="" jhsph.edu=""> wrote: > For the Mouse Gene ST Array, 'oligo' needs the "pd.mogene.1.0.st.v1" > package. If it didn't complain about it when reading the CEL files, it's > because you already had it installed on your system. The next version of > 'oligo' will be able to identify the required package (as it does currently) > and automatically download/install the required packages, if available on > the BioC website. > > As a side note, please observe that the annotation packages used by 'affy' > are not compatible with those used by 'oligo', which are built via > 'pdInfoBuilder'. > > b > > On Aug 10, 2009, at 9:27 PM, Peng Yu wrote: > >> Hi Benilton, >> >> One thing I don't understand isf why 'oligo' didin't require me to >> install the .cdf file, but 'affy' required me to install the .cdf >> file. I am curious that how 'oligo' figure how where the correct .cdf >> file is and use it? >> >> Regards, >> Peng >> >> On Mon, Aug 10, 2009 at 5:25 PM, Benilton Carvalho<bcarvalh at="" jhsph.edu=""> >> wrote: >>> >>> Dear Peng, >>> >>> I can speak for oligo and the annotation package used by it. >>> >>> The current release of oligo summarizes to the probeset level. The next >>> release of oligo and annotation packages will allow you to summarize to >>> the >>> gene level. In your particular case, the count you'll get is roughly 35K. >>> >>> The updated packages have already been submitted to BioC and soon should >>> show up on the devel branch. >>> >>> Best wishes, >>> >>> b >>> >>> On Aug 10, 2009, at 6:39 PM, Peng Yu wrote: >>> >>>> Hi, >>>> >>>> I have run the two different R script to do RMA. Neither of them gives >>>> me any error messages. However, the RMA results are very >>>> different---they have very different number of lines. I don't know >>>> which one I should believe. Or neither of them is correct. It might be >>>> due to the difference in the cdf file used. Would you please point to >>>> me how to figure out the problem? >>>> >>>> $ Rscript probe2expr_affy.R >>>>> >>>>> library(affy) >>>> >>>> Loading required package: Biobase >>>> Loading required package: methods >>>> >>>> Welcome to Bioconductor >>>> >>>> Vignettes contain introductory material. To view, type >>>> 'openVignette()'. To cite Bioconductor, see >>>> 'citation("Biobase")' and for packages 'citation(pkgname)'. >>>> >>>>> Data <- ReadAffy() >>>>> eset <- rma(Data) >>>> >>>> Background correcting >>>> Normalizing >>>> Calculating Expression >>>>> >>>>> write.exprs(eset, file="gene_expr_affy.txt", sep="\t") >>>>> >>>> >>>> $ Rscript probe2expr_oligo.R >>>>> >>>>> library(oligo) >>>> >>>> Loading required package: oligoClasses >>>> Loading required package: Biobase >>>> Loading required package: methods >>>> >>>> Welcome to Bioconductor >>>> >>>> Vignettes contain introductory material. To view, type >>>> 'openVignette()'. To cite Bioconductor, see >>>> 'citation("Biobase")' and for packages 'citation(pkgname)'. >>>> >>>> Loading required package: preprocessCore >>>> Welcome to oligo version 1.8.1 >>>>> >>>>> data<-read.celfiles(list.celfiles()) >>>> >>>> Loading required package: pd.mogene.1.0.st.v1 >>>> Loading required package: RSQLite >>>> Loading required package: DBI >>>> Platform design info loaded. >>>> Reading in : koA-mth_HZ_5238_MST1_19389.cel >>>> Reading in : koB-mth_HZ_5238_MST1_19390.cel >>>> Reading in : koC-mth_HZ_5238_MST1_19391.cel >>>> Reading in : koD-mth_HZ_5238_MST1_19392.cel >>>> Reading in : wt1-mth_HZ_5238_MST1_19385.cel >>>> Reading in : wt2-mth_HZ_5238_MST1_19386.cel >>>> Reading in : wt3-mth_HZ_5238_MST1_19387.cel >>>> Reading in : wt4-mth_HZ_5238_MST1_19388.cel >>>>> >>>>> eset<-rma(data) >>>> >>>> Background correcting >>>> Normalizing >>>> Calculating Expression >>>>> >>>>> write.exprs(eset, file="gene_expr_oligo.txt", sep="\t") >>>>> >>>> >>>> $ wc gene_expr_affy.txt gene_expr_oligo.txt >>>> 34761 ? 312848 ?5002519 gene_expr_affy.txt >>>> 234591 ?2111318 33763075 gene_expr_oligo.txt >>>> 269352 ?2424166 38765594 total >>>> >>>> BTW, before I run 'Rscript probe2expr_affy.R'. I downloaded >>>> MoGene-1_0-st-v1.r3.unsupported-cdf and I run the following script. >>>> Then, I install the generated package 'mogene10stv1cdf'. >>>> $ cat make_cdf_package.R >>>> library(makecdfenv) >>>> make.cdf.package("MoGene-1_0-st-v1.r3.cdf", >>>> packagename="mogene10stv1cdf", species="Mus_musculus") >>>> >>>> Regards, >>>> Peng >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >

Annotation cdf oligo Annotation cdf oligo • 1.2k views

ADD COMMENT • link updated 15.7 years ago by Steve Lianoglou ★ 13k • written 15.7 years ago by Peng Yu ▴ 950

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 10 weeks ago

United States

On Mon, Aug 10, 2009 at 9:30 PM, Peng Yu <pengyu.ut@gmail.com> wrote: > Hi Benilton, > > If both of scripts do not generate errors, does it mean that both of > them are correct? But the RMA results are very different, so one of > them must be wrong? > The probesets for the genest arrays are designed roughly against exons and as Benilton pointed out, the current behavior of oligo is to summarize the probesets. Given that there are >200k exons in the human genome, one should expect >200k rows in the rma-normalized data from oligo. Affy, on the other hand, uses the unofficial cdf that you downloaded and summarizes to transcripts. So, both are correct, but you are trying to compare apples to oranges. Sean > > Regards, > Peng > > On Mon, Aug 10, 2009 at 7:54 PM, Benilton Carvalho<bcarvalh@jhsph.edu> > wrote: > > For the Mouse Gene ST Array, 'oligo' needs the "pd.mogene.1.0.st.v1" > > package. If it didn't complain about it when reading the CEL files, it's > > because you already had it installed on your system. The next version of > > 'oligo' will be able to identify the required package (as it does > currently) > > and automatically download/install the required packages, if available on > > the BioC website. > > > > As a side note, please observe that the annotation packages used by > 'affy' > > are not compatible with those used by 'oligo', which are built via > > 'pdInfoBuilder'. > > > > b > > > > On Aug 10, 2009, at 9:27 PM, Peng Yu wrote: > > > >> Hi Benilton, > >> > >> One thing I don't understand isf why 'oligo' didin't require me to > >> install the .cdf file, but 'affy' required me to install the .cdf > >> file. I am curious that how 'oligo' figure how where the correct .cdf > >> file is and use it? > >> > >> Regards, > >> Peng > >> > >> On Mon, Aug 10, 2009 at 5:25 PM, Benilton Carvalho<bcarvalh@jhsph.edu> > >> wrote: > >>> > >>> Dear Peng, > >>> > >>> I can speak for oligo and the annotation package used by it. > >>> > >>> The current release of oligo summarizes to the probeset level. The next > >>> release of oligo and annotation packages will allow you to summarize to > >>> the > >>> gene level. In your particular case, the count you'll get is roughly > 35K. > >>> > >>> The updated packages have already been submitted to BioC and soon > should > >>> show up on the devel branch. > >>> > >>> Best wishes, > >>> > >>> b > >>> > >>> On Aug 10, 2009, at 6:39 PM, Peng Yu wrote: > >>> > >>>> Hi, > >>>> > >>>> I have run the two different R script to do RMA. Neither of them gives > >>>> me any error messages. However, the RMA results are very > >>>> different---they have very different number of lines. I don't know > >>>> which one I should believe. Or neither of them is correct. It might be > >>>> due to the difference in the cdf file used. Would you please point to > >>>> me how to figure out the problem? > >>>> > >>>> $ Rscript probe2expr_affy.R > >>>>> > >>>>> library(affy) > >>>> > >>>> Loading required package: Biobase > >>>> Loading required package: methods > >>>> > >>>> Welcome to Bioconductor > >>>> > >>>> Vignettes contain introductory material. To view, type > >>>> 'openVignette()'. To cite Bioconductor, see > >>>> 'citation("Biobase")' and for packages 'citation(pkgname)'. > >>>> > >>>>> Data <- ReadAffy() > >>>>> eset <- rma(Data) > >>>> > >>>> Background correcting > >>>> Normalizing > >>>> Calculating Expression > >>>>> > >>>>> write.exprs(eset, file="gene_expr_affy.txt", sep="\t") > >>>>> > >>>> > >>>> $ Rscript probe2expr_oligo.R > >>>>> > >>>>> library(oligo) > >>>> > >>>> Loading required package: oligoClasses > >>>> Loading required package: Biobase > >>>> Loading required package: methods > >>>> > >>>> Welcome to Bioconductor > >>>> > >>>> Vignettes contain introductory material. To view, type > >>>> 'openVignette()'. To cite Bioconductor, see > >>>> 'citation("Biobase")' and for packages 'citation(pkgname)'. > >>>> > >>>> Loading required package: preprocessCore > >>>> Welcome to oligo version 1.8.1 > >>>>> > >>>>> data<-read.celfiles(list.celfiles()) > >>>> > >>>> Loading required package: pd.mogene.1.0.st.v1 > >>>> Loading required package: RSQLite > >>>> Loading required package: DBI > >>>> Platform design info loaded. > >>>> Reading in : koA-mth_HZ_5238_MST1_19389.cel > >>>> Reading in : koB-mth_HZ_5238_MST1_19390.cel > >>>> Reading in : koC-mth_HZ_5238_MST1_19391.cel > >>>> Reading in : koD-mth_HZ_5238_MST1_19392.cel > >>>> Reading in : wt1-mth_HZ_5238_MST1_19385.cel > >>>> Reading in : wt2-mth_HZ_5238_MST1_19386.cel > >>>> Reading in : wt3-mth_HZ_5238_MST1_19387.cel > >>>> Reading in : wt4-mth_HZ_5238_MST1_19388.cel > >>>>> > >>>>> eset<-rma(data) > >>>> > >>>> Background correcting > >>>> Normalizing > >>>> Calculating Expression > >>>>> > >>>>> write.exprs(eset, file="gene_expr_oligo.txt", sep="\t") > >>>>> > >>>> > >>>> $ wc gene_expr_affy.txt gene_expr_oligo.txt > >>>> 34761 312848 5002519 gene_expr_affy.txt > >>>> 234591 2111318 33763075 gene_expr_oligo.txt > >>>> 269352 2424166 38765594 total > >>>> > >>>> BTW, before I run 'Rscript probe2expr_affy.R'. I downloaded > >>>> MoGene-1_0-st-v1.r3.unsupported-cdf and I run the following script. > >>>> Then, I install the generated package 'mogene10stv1cdf'. > >>>> $ cat make_cdf_package.R > >>>> library(makecdfenv) > >>>> make.cdf.package("MoGene-1_0-st-v1.r3.cdf", > >>>> packagename="mogene10stv1cdf", species="Mus_musculus") > >>>> > >>>> Regards, > >>>> Peng > >>>> > >>>> _______________________________________________ > >>>> Bioconductor mailing list > >>>> Bioconductor@stat.math.ethz.ch > >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>>> Search the archives: > >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >>> > >>> > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@stat.math.ethz.ch > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 15.7 years ago Sean Davis 21k

0

Entering edit mode

On Mon, Aug 10, 2009 at 9:22 PM, Sean Davis<seandavi at="" gmail.com=""> wrote: > > > On Mon, Aug 10, 2009 at 9:30 PM, Peng Yu <pengyu.ut at="" gmail.com=""> wrote: >> >> Hi Benilton, >> >> If both of scripts do not generate errors, does it mean that both of >> them are correct? But the RMA results are very different, so one of >> them must be wrong? > > The probesets for the genest arrays are designed roughly against exons and > as Benilton pointed out, the current behavior of oligo is to summarize the > probesets. ?Given that there are >200k exons in the human genome, one should > expect >200k rows in the rma-normalized data from oligo. ?Affy, on the other > hand, uses the unofficial cdf that you downloaded and summarizes to > transcripts. ?So, both are correct, but you are trying to compare apples to > oranges. I'm still confused what a probeset does not corresponds a transcript. Could you give me a formal definition on probeset? The book Bioinformatics and Computational Biology Solutions Using R and Bioconductor says "Affymetrix arrays typically use between 11 and 20 probe pairs, referred to as a probeset, for each gene." But I don't see a clear definition of probeset for all types of arrays listed on http://bioconductor.org/docs/workflows/oligoarrays/ Another question: since 'affy' can also process Affymetrix Gene ST Arrays, why only 'oligo' but not 'affy' is listed on the above workflow page? Regards, Peng

ADD REPLY • link 15.7 years ago Peng Yu ▴ 950

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 18 days ago

United States

Hi, On Aug 10, 2009, at 9:30 PM, Peng Yu wrote: > Hi Benilton, > > If both of scripts do not generate errors, does it mean that both of > them are correct? But the RMA results are very different, so one of > them must be wrong? Benilton, I think, has provided some hints which you can chew on for a bit. Sometimes it helps to digest some of the information and try to *really* figure out what it *means* instead of hoping to jump straight to an answer. Let's see if we can't do that with the information we have here: > On Mon, Aug 10, 2009 at 5:25 PM, Benilton Carvalho<bcarvalh at="" jhsph.edu=""> > wrote: >> >> Dear Peng, >> >> I can speak for oligo and the annotation package used by it. >> >> The current release of oligo summarizes to the probeset level. The >> next >> release of oligo and annotation packages will allow you to >> summarize to >> the >> gene level. In your particular case, the count you'll get is >> roughly 35K. He mentions summaries at "the probeset level," and then "the gene level." He also mentions that the upcoming release of oligo WILL allow you to summarize to THE GENE LEVEL, which I guess suggests that currently oligo is providing summaries at the probeset level. Once you can get GENE LEVEL summaries, you will get ROUGHLY 35K VALUES. Now, you say your files are very different, and that one (or both!) of these package must be wrong, since you've looked at this: $ wc gene_expr_affy.txt gene_expr_oligo.txt 34761 312848 5002519 gene_expr_affy.txt 234591 2111318 33763075 gene_expr_oligo.txt 269352 2424166 38765594 total It looks like your current oligo summary has many more numbers than your expr_affy file, which only has ROUGHLY 35K VALUES (oh wait, I think I've seen this number before). Now the question is: *what* are these two files telling you? Have you looked at their outputs, aside from the number of lines in each file? Do you see probes by the same names in each file? What is a probe set? How is it different from a gene? Since you have many more probeset level numbers than gene/transcript numbers, does that mean there are more than 1 probeset to a gene? Is that what this is saying here? http://www.affymetrix.com/support/help/faqs/mouse_430/faq_8.jsp Or this figure (and the rest of the publication), here?: http://www.biomedcentral.com/1471-2105/8/108/figure/F1 Has someone asked something like this on this mailing list before? https://stat.ethz.ch/pipermail/bioconductor/2009-January/025791.html I actually have no idea what your output files look like (I haven't used oligo before, actually), but I'm just trying to help put some pieces together. In the meantime, I see that Sean has provided you a direct answer, so kudos, but I'd just suggest taking a bit more of this work on yourself. I think it will help to make more sense of the answers you will inevitably get on this list. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD COMMENT • link 15.7 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

On Mon, Aug 10, 2009 at 9:53 PM, Steve Lianoglou<mailinglist.honeypot at="" gmail.com=""> wrote: > Hi, > > On Aug 10, 2009, at 9:30 PM, Peng Yu wrote: > >> Hi Benilton, >> >> If both of scripts do not generate errors, does it mean that both of >> them are correct? But the RMA results are very different, so one of >> them must be wrong? > > Benilton, I think, has provided some hints which you can chew on for a bit. > Sometimes it helps to digest some of the information and try to *really* > figure out what it *means* instead of hoping to jump straight to an answer. > Let's see if we can't do that with the information we have here: > >> On Mon, Aug 10, 2009 at 5:25 PM, Benilton Carvalho<bcarvalh at="" jhsph.edu=""> >> wrote: >>> >>> Dear Peng, >>> >>> I can speak for oligo and the annotation package used by it. >>> >>> The current release of oligo summarizes to the probeset level. The next >>> release of oligo and annotation packages will allow you to summarize to >>> the >>> gene level. In your particular case, the count you'll get is roughly 35K. > > He mentions summaries at "the probeset level," and then "the gene level." He > also mentions that the upcoming release of oligo WILL allow you to summarize > to THE GENE LEVEL, which I guess suggests that currently oligo is providing > summaries at the probeset level. Once you can get GENE LEVEL summaries, you > will get ROUGHLY 35K VALUES. > > Now, you say your files are very different, and that one (or both!) of these > package must be wrong, since you've looked at this: > > $ wc gene_expr_affy.txt gene_expr_oligo.txt > 34761 ? 312848 ?5002519 gene_expr_affy.txt > 234591 ?2111318 33763075 gene_expr_oligo.txt > 269352 ?2424166 38765594 total > > It looks like your current oligo summary has many more numbers than your > expr_affy file, which only has ROUGHLY 35K VALUES (oh wait, I think I've > seen this number before). > > Now the question is: *what* are these two files telling you? > Have you looked at their outputs, aside from the number of lines in each > file? > Do you see probes by the same names in each file? Yes. I looked at these two files. I randomly pick a "Transcript ID" (for example 10571312) from the analysis output of the same set of CEL files from a third party software. I found that number in 'gene_expr_affy.txt' but not in 'gene_expr_oligo.txt'. And all the IDs in both files are numbers (no letters and underscore). > What is a probe set? How is it different from a gene? Since you have many > more probeset level numbers than gene/transcript numbers, does that mean > there are more than 1 probeset to a gene? Is that what this is saying here? > http://www.affymetrix.com/support/help/faqs/mouse_430/faq_8.jsp The above webpage says "Probes in a gene family probe set (_a set) all cross-hybridize to the same set of sequences that belong to the same gene family (i.e. having same name in the "geneCluster" column)." But I don't understand what the word "sequence" means in this context? Does it mean a transcript? > Or this figure (and the rest of the publication), here?: > http://www.biomedcentral.com/1471-2105/8/108/figure/F1 > > Has someone asked something like this on this mailing list before? > https://stat.ethz.ch/pipermail/bioconductor/2009-January/025791.html This thread does not provide very useful information. > I actually have no idea what your output files look like (I haven't used > oligo before, actually), but I'm just trying to help put some pieces > together. Thank you for collecting the links. > In the meantime, I see that Sean has provided you a direct answer, so kudos, > but I'd just suggest taking a bit more of this work on yourself. I think it > will help to make more sense of the answers you will inevitably get on this > list. What does "kudos" mean? Regards, Peng

ADD REPLY • link 15.7 years ago Peng Yu ▴ 950

0

Entering edit mode

On Aug 10, 2009, at 11:24 PM, Peng Yu wrote: > On Mon, Aug 10, 2009 at 9:53 PM, Steve > Lianoglou<mailinglist.honeypot at="" gmail.com=""> wrote: [snip] >> What is a probe set? How is it different from a gene? Since you >> have many >> more probeset level numbers than gene/transcript numbers, does that >> mean >> there are more than 1 probeset to a gene? Is that what this is >> saying here? >> http://www.affymetrix.com/support/help/faqs/mouse_430/faq_8.jsp > > The above webpage says "Probes in a gene family probe set (_a set) all > cross-hybridize to the same set of > sequences that belong to the same gene family (i.e. having same name > in the > "geneCluster" column)." But I don't understand what the word > "sequence" means in this context? Look further below: does the diagram make it any more clear? When the text says that the probes are (cross-)hybridizing to *some* sequence, the sequence being referred to is the input cDNA from your sample that you are testing on your microarray. > Does it mean a transcript? The text is explaining different ways of how a probe might hybridize uniquely (or not) to something in the genome that's under test. Unfortunately not every probe on a microarray hybridizes to exactly one place in the genome, so there is some ambiguity that arises in your signal that you should understand/appreciate. >> Or this figure (and the rest of the publication), here?: >> http://www.biomedcentral.com/1471-2105/8/108/figure/F1 >> >> Has someone asked something like this on this mailing list before? >> https://stat.ethz.ch/pipermail/bioconductor/2009-January/025791.html > > This thread does not provide very useful information. Are you sure? The poster is asking how to get probe level data (what you are getting from the oligo package) from using rma in the affy package. There's even code in a separate message in that thread: https://stat.ethz.ch/pipermail/bioconductor/attachments/20090112/cc275 d96/attachment.pl See where it says "If you need normalized but unsummarized probe level intensities ..." and the code that follows? If you use something similar to the code there, will the affy "rma" recapitulate something more like what you're seeing the oligo::rma? (I don't know, just asking). >> In the meantime, I see that Sean has provided you a direct answer, >> so kudos, >> but I'd just suggest taking a bit more of this work on yourself. I >> think it >> will help to make more sense of the answers you will inevitably get >> on this >> list. > > What does "kudos" mean? Sorry, it's a bit of an english colloquialism that means something like "congratulations" http://dictionary.reference.com/browse/kudos -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 15.7 years ago Steve Lianoglou ★ 13k

Login before adding your answer.