Is GCRMA influenced by lexicographical order?

0

Entering edit mode

markus.boenn ▴ 50

@markusboenn-6784

Last seen 10.2 years ago

European Union

Dear all, I've made some experiment about GCRMA using bg.adjust.gcrma() contained in the gcrma package. I take two CEL files, A.cel and B.cel (CASE I). Both are from arrays of the same type, for instance Mouse Gene 430 2.0 Arrays. In addition, I rename both files in a way, which changes the lexicographical order, say, I rename A.cel to BA.cel and I rename B.cel to AB.cel (CASE II). Using bg.adjust.gcrma() and exprs() to obtain the expression values, for some probe sets I get different values for A.cel (in CASE I) than for BA.cel (in CASE II), although both files contain the same data. Of course, for B.cel and AB.cel the same phenomenon becomes obvious. How can this be possible? To me, it seems as if GCRMA normalizes the data with respect to the lexicographical order, but as far as I know, GCRMA treats each array independently. This effect is not reproducible using call.exprs() with argument algorithm="rma" or "mas5". But is reproducible for other arrays like HGU95A, for example. Please, can anybody try to explain me, if this effect is a bug or a feature (and why)? Best wishes, Markus ################################################################# sessionInfo() R version 2.12.0 (2010-10-15) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] hgu133plus2probe_2.7.0 AnnotationDbi_1.12.0 hgu133plus2cdf_2.7.0 [4] simpleaffy_2.26.0 genefilter_1.32.0 gcrma_2.22.0 [7] affy_1.28.0 Biobase_2.10.0 loaded via a namespace (and not attached): [1] Biostrings_2.18.0 DBI_0.2-5 IRanges_1.8.2 [4] RSQLite_0.9-2 affyio_1.18.0 annotate_1.28.0 [7] preprocessCore_1.12.0 splines_2.12.0 survival_2.35-8 [10] tools_2.12.0 xtable_1.5-6 Also tried on other machine with similar packages sessionInfo() R version 2.11.1 Patched (2010-09-16 r52943) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=de_DE.UTF-8 [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] simpleaffy_2.24.0 genefilter_1.30.0 gcrma_2.20.0 affy_1.26.1 [5] Biobase_2.8.0 loaded via a namespace (and not attached): [1] affyio_1.16.0 annotate_1.26.1 AnnotationDbi_1.10.2 [4] Biostrings_2.16.9 DBI_0.2-5 IRanges_1.6.17 [7] preprocessCore_1.10.0 RSQLite_0.9-2 splines_2.11.1 [10] survival_2.35-8 tools_2.11.1 xtable_1.5-6 ################################################################# Code example setwd("PATH TO CASE I DATA") #the directory containing A.cel and B.cel CEL <- ReadAffy() bag.1 <- bg.adjust.gcrma(CEL) ebag.1 <- exprs(bag.1) setwd("PATH TO CASE II DATA") #the directory containing BA.cel and AB.cel CEL <- ReadAffy() bag.2 <- bg.adjust.gcrma(CEL) ebag.2 <- exprs(bag.2) par(mfrow=c(2,1)) # differences should be zero plot(ebag.1[,1]-ebag.2[,2]) plot(ebag.1[,2]-ebag.2[,1])

probe gcrma probe gcrma • 1.9k views

ADD COMMENT • link updated 14.1 years ago by Wolfgang Huber ★ 13k • written 14.1 years ago by markus.boenn ▴ 50

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 3 months ago

EMBL European Molecular Biology Laborat…

Hi Markus, Jean or Rafa will be able to give a more competent response, but as far as I can see from the code, there are some places where the first array is treated specially: ~/madman/Rpacks/gcrma/R$ grep ",1]" * gcrma.R: index2=which(!is.na(anc[,1])) gcrma.engine2.R: index.affinities <- which(!is.na(pm.affinities[,1])) justGCRMA.R: mm <- read.probematrix(filenames=filenames[i], which="mm")$mm[,1] justGCRMA.R: mm <- read.probematrix(filenames=filenames[i], which="mm")$mm[,1] I agree that this is not the most desirable feature. Best wishes Wolfgang Il Nov/8/10 10:19 AM, Markus Boenn ha scritto: > > Dear all, > > I've made some experiment about GCRMA using bg.adjust.gcrma() contained > in the gcrma package. > > I take two CEL files, A.cel and B.cel (CASE I). Both are from arrays of > the same type, for instance Mouse Gene 430 2.0 Arrays. In addition, I > rename both files in a way, which changes the lexicographical order, > say, I rename A.cel to BA.cel and I rename B.cel to AB.cel (CASE II). > > Using bg.adjust.gcrma() and exprs() to obtain the expression values, for > some probe sets I get different values for A.cel (in CASE I) than for > BA.cel (in CASE II), although both files contain the same data. Of > course, for B.cel and AB.cel the same phenomenon becomes obvious. > > How can this be possible? To me, it seems as if GCRMA normalizes the > data with respect to the lexicographical order, but as far as I know, > GCRMA treats each array independently. > > This effect is not reproducible using call.exprs() with argument > algorithm="rma" or "mas5". But is reproducible for other arrays like > HGU95A, for example. > > Please, can anybody try to explain me, if this effect is a bug or a > feature (and why)? > > Best wishes, > Markus > > > > ################################################################# > > sessionInfo() > R version 2.12.0 (2010-10-15) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > other attached packages: > [1] hgu133plus2probe_2.7.0 AnnotationDbi_1.12.0 hgu133plus2cdf_2.7.0 [4] > simpleaffy_2.26.0 genefilter_1.32.0 gcrma_2.22.0 [7] affy_1.28.0 > Biobase_2.10.0 > loaded via a namespace (and not attached): > [1] Biostrings_2.18.0 DBI_0.2-5 IRanges_1.8.2 [4] RSQLite_0.9-2 > affyio_1.18.0 annotate_1.28.0 [7] preprocessCore_1.12.0 splines_2.12.0 > survival_2.35-8 [10] tools_2.12.0 xtable_1.5-6 > > Also tried on other machine with similar packages > > sessionInfo() > R version 2.11.1 Patched (2010-09-16 r52943) > Platform: i686-pc-linux-gnu (32-bit) > > locale: > [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C > [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=de_DE.UTF-8 > [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] simpleaffy_2.24.0 genefilter_1.30.0 gcrma_2.20.0 affy_1.26.1 > [5] Biobase_2.8.0 > > loaded via a namespace (and not attached): > [1] affyio_1.16.0 annotate_1.26.1 AnnotationDbi_1.10.2 > [4] Biostrings_2.16.9 DBI_0.2-5 IRanges_1.6.17 > [7] preprocessCore_1.10.0 RSQLite_0.9-2 splines_2.11.1 > [10] survival_2.35-8 tools_2.11.1 xtable_1.5-6 > > > ################################################################# > > Code example > > setwd("PATH TO CASE I DATA") #the directory containing A.cel and B.cel > CEL <- ReadAffy() > bag.1 <- bg.adjust.gcrma(CEL) > ebag.1 <- exprs(bag.1) > > setwd("PATH TO CASE II DATA") #the directory containing BA.cel and AB.cel > CEL <- ReadAffy() > bag.2 <- bg.adjust.gcrma(CEL) > ebag.2 <- exprs(bag.2) > > par(mfrow=c(2,1)) > # differences should be zero > plot(ebag.1[,1]-ebag.2[,2]) > plot(ebag.1[,2]-ebag.2[,1]) > > > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 14.1 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Hi, It seem to be more correct to say that it depends on the ordering of the arrays loaded, which is in turn by default loaded in lexicographic ordering. On Mon, Nov 8, 2010 at 7:50 AM, Wolfgang Huber <whuber at="" embl.de=""> wrote: > Hi Markus, > > Jean or Rafa will be able to give a more competent response, but as far as I > can see from the code, there are some places where the first array is > treated specially: > > ~/madman/Rpacks/gcrma/R$ grep ",1]" * > gcrma.R: ? ? ?index2=which(!is.na(anc[,1])) > gcrma.engine2.R: ? ?index.affinities <- which(!is.na(pm.affinities[,1])) > justGCRMA.R: ? ? ? mm <- read.probematrix(filenames=filenames[i], > which="mm")$mm[,1] > justGCRMA.R: ? ?mm <- read.probematrix(filenames=filenames[i], > which="mm")$mm[,1] The latter to lines only reads one array, so there [,1] is just to pull out the signals, I think. Note also that the GCRMA algorithm is using a random subset to estimate some the parameters, which could affect this too. However, looking at the devel code for GCRMA is see that there are "set.seed(1)" calls before each sample() call, i.e. they are no longer "random subsets". I though this was discussed a long time ago and all fixing of the seeds were dropped, but now it's back again. Is my bad memory playing me a trick? /H > > I agree that this is not the most desirable feature. > > Best wishes > ? ? ? ?Wolfgang > > > Il Nov/8/10 10:19 AM, Markus Boenn ha scritto: >> >> Dear all, >> >> I've made some experiment about GCRMA using bg.adjust.gcrma() contained >> in the gcrma package. >> >> I take two CEL files, A.cel and B.cel (CASE I). Both are from arrays of >> the same type, for instance Mouse Gene 430 2.0 Arrays. In addition, I >> rename both files in a way, which changes the lexicographical order, >> say, I rename A.cel to BA.cel and I rename B.cel to AB.cel (CASE II). >> >> Using bg.adjust.gcrma() and exprs() to obtain the expression values, for >> some probe sets I get different values for A.cel (in CASE I) than for >> BA.cel (in CASE II), although both files contain the same data. Of >> course, for B.cel and AB.cel the same phenomenon becomes obvious. >> >> How can this be possible? To me, it seems as if GCRMA normalizes the >> data with respect to the lexicographical order, but as far as I know, >> GCRMA treats each array independently. >> >> This effect is not reproducible using call.exprs() with argument >> algorithm="rma" or "mas5". But is reproducible for other arrays like >> HGU95A, for example. >> >> Please, can anybody try to explain me, if this effect is a bug or a >> feature (and why)? >> >> Best wishes, >> Markus >> >> >> >> ################################################################# >> >> sessionInfo() >> R version 2.12.0 (2010-10-15) >> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >> >> locale: >> [1] C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> other attached packages: >> [1] hgu133plus2probe_2.7.0 AnnotationDbi_1.12.0 hgu133plus2cdf_2.7.0 [4] >> simpleaffy_2.26.0 genefilter_1.32.0 gcrma_2.22.0 [7] affy_1.28.0 >> Biobase_2.10.0 >> loaded via a namespace (and not attached): >> [1] Biostrings_2.18.0 DBI_0.2-5 IRanges_1.8.2 [4] RSQLite_0.9-2 >> affyio_1.18.0 annotate_1.28.0 [7] preprocessCore_1.12.0 splines_2.12.0 >> survival_2.35-8 [10] tools_2.12.0 xtable_1.5-6 >> >> Also tried on other machine with similar packages >> >> sessionInfo() >> R version 2.11.1 Patched (2010-09-16 r52943) >> Platform: i686-pc-linux-gnu (32-bit) >> >> locale: >> [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8 >> [5] LC_MONETARY=C LC_MESSAGES=de_DE.UTF-8 >> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] simpleaffy_2.24.0 genefilter_1.30.0 gcrma_2.20.0 affy_1.26.1 >> [5] Biobase_2.8.0 >> >> loaded via a namespace (and not attached): >> [1] affyio_1.16.0 annotate_1.26.1 AnnotationDbi_1.10.2 >> [4] Biostrings_2.16.9 DBI_0.2-5 IRanges_1.6.17 >> [7] preprocessCore_1.10.0 RSQLite_0.9-2 splines_2.11.1 >> [10] survival_2.35-8 tools_2.11.1 xtable_1.5-6 >> >> >> ################################################################# >> >> Code example >> >> setwd("PATH TO CASE I DATA") #the directory containing A.cel and B.cel >> CEL <- ReadAffy() >> bag.1 <- bg.adjust.gcrma(CEL) >> ebag.1 <- exprs(bag.1) >> >> setwd("PATH TO CASE II DATA") #the directory containing BA.cel and AB.cel >> CEL <- ReadAffy() >> bag.2 <- bg.adjust.gcrma(CEL) >> ebag.2 <- exprs(bag.2) >> >> par(mfrow=c(2,1)) >> # differences should be zero >> plot(ebag.1[,1]-ebag.2[,2]) >> plot(ebag.1[,2]-ebag.2[,1]) >> >> >> >> >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 14.1 years ago Henrik Bengtsson ★ 2.4k

0

Entering edit mode

Hi Henrik It's not your bad memory. We had used a seed in estimating bg parameters but later avoided random sampling by sampling fixed quantiles from the affinities. On 11/8/2010 11:15 AM, Henrik Bengtsson wrote: > Hi, > > It seem to be more correct to say that it depends on the ordering of > the arrays loaded, which is in turn by default loaded in lexicographic > ordering. > > On Mon, Nov 8, 2010 at 7:50 AM, Wolfgang Huber<whuber at="" embl.de=""> wrote: >> Hi Markus, >> >> Jean or Rafa will be able to give a more competent response, but as far as I >> can see from the code, there are some places where the first array is >> treated specially: >> >> ~/madman/Rpacks/gcrma/R$ grep ",1]" * >> gcrma.R: index2=which(!is.na(anc[,1])) >> gcrma.engine2.R: index.affinities<- which(!is.na(pm.affinities[,1])) >> justGCRMA.R: mm<- read.probematrix(filenames=filenames[i], >> which="mm")$mm[,1] >> justGCRMA.R: mm<- read.probematrix(filenames=filenames[i], >> which="mm")$mm[,1] > > The latter to lines only reads one array, so there [,1] is just to > pull out the signals, I think. > > > Note also that the GCRMA algorithm is using a random subset to > estimate some the parameters, which could affect this too. However, > looking at the devel code for GCRMA is see that there are > "set.seed(1)" calls before each sample() call, i.e. they are no longer > "random subsets". > > I though this was discussed a long time ago and all fixing of the > seeds were dropped, but now it's back again. Is my bad memory playing > me a trick? > > /H > >> >> I agree that this is not the most desirable feature. >> >> Best wishes >> Wolfgang >> >> >> Il Nov/8/10 10:19 AM, Markus Boenn ha scritto: >>> >>> Dear all, >>> >>> I've made some experiment about GCRMA using bg.adjust.gcrma() contained >>> in the gcrma package. >>> >>> I take two CEL files, A.cel and B.cel (CASE I). Both are from arrays of >>> the same type, for instance Mouse Gene 430 2.0 Arrays. In addition, I >>> rename both files in a way, which changes the lexicographical order, >>> say, I rename A.cel to BA.cel and I rename B.cel to AB.cel (CASE II). >>> >>> Using bg.adjust.gcrma() and exprs() to obtain the expression values, for >>> some probe sets I get different values for A.cel (in CASE I) than for >>> BA.cel (in CASE II), although both files contain the same data. Of >>> course, for B.cel and AB.cel the same phenomenon becomes obvious. >>> >>> How can this be possible? To me, it seems as if GCRMA normalizes the >>> data with respect to the lexicographical order, but as far as I know, >>> GCRMA treats each array independently. >>> >>> This effect is not reproducible using call.exprs() with argument >>> algorithm="rma" or "mas5". But is reproducible for other arrays like >>> HGU95A, for example. >>> >>> Please, can anybody try to explain me, if this effect is a bug or a >>> feature (and why)? >>> >>> Best wishes, >>> Markus >>> >>> >>> >>> ################################################################# >>> >>> sessionInfo() >>> R version 2.12.0 (2010-10-15) >>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >>> >>> locale: >>> [1] C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> other attached packages: >>> [1] hgu133plus2probe_2.7.0 AnnotationDbi_1.12.0 hgu133plus2cdf_2.7.0 [4] >>> simpleaffy_2.26.0 genefilter_1.32.0 gcrma_2.22.0 [7] affy_1.28.0 >>> Biobase_2.10.0 >>> loaded via a namespace (and not attached): >>> [1] Biostrings_2.18.0 DBI_0.2-5 IRanges_1.8.2 [4] RSQLite_0.9-2 >>> affyio_1.18.0 annotate_1.28.0 [7] preprocessCore_1.12.0 splines_2.12.0 >>> survival_2.35-8 [10] tools_2.12.0 xtable_1.5-6 >>> >>> Also tried on other machine with similar packages >>> >>> sessionInfo() >>> R version 2.11.1 Patched (2010-09-16 r52943) >>> Platform: i686-pc-linux-gnu (32-bit) >>> >>> locale: >>> [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8 >>> [5] LC_MONETARY=C LC_MESSAGES=de_DE.UTF-8 >>> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] simpleaffy_2.24.0 genefilter_1.30.0 gcrma_2.20.0 affy_1.26.1 >>> [5] Biobase_2.8.0 >>> >>> loaded via a namespace (and not attached): >>> [1] affyio_1.16.0 annotate_1.26.1 AnnotationDbi_1.10.2 >>> [4] Biostrings_2.16.9 DBI_0.2-5 IRanges_1.6.17 >>> [7] preprocessCore_1.10.0 RSQLite_0.9-2 splines_2.11.1 >>> [10] survival_2.35-8 tools_2.11.1 xtable_1.5-6 >>> >>> >>> ################################################################# >>> >>> Code example >>> >>> setwd("PATH TO CASE I DATA") #the directory containing A.cel and B.cel >>> CEL<- ReadAffy() >>> bag.1<- bg.adjust.gcrma(CEL) >>> ebag.1<- exprs(bag.1) >>> >>> setwd("PATH TO CASE II DATA") #the directory containing BA.cel and AB.cel >>> CEL<- ReadAffy() >>> bag.2<- bg.adjust.gcrma(CEL) >>> ebag.2<- exprs(bag.2) >>> >>> par(mfrow=c(2,1)) >>> # differences should be zero >>> plot(ebag.1[,1]-ebag.2[,2]) >>> plot(ebag.1[,2]-ebag.2[,1]) >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- ------------------------------------ Zhijin (Jean) Wu Assistant Professor of Biostatistics Brown University, Box G-S121 Providence, RI 02912 Tel: 401 863 1230 Fax: 401 863 9182 http://www.stat.brown.edu/zwu

ADD REPLY • link 14.1 years ago Zhijin Wu ▴ 260

0

Entering edit mode

On 11/8/2010 10:50 AM, Wolfgang Huber wrote: > Hi Markus, > > Jean or Rafa will be able to give a more competent response, but as far > as I can see from the code, there are some places where the first array > is treated specially: > > ~/madman/Rpacks/gcrma/R$ grep ",1]" * > gcrma.R: index2=which(!is.na(anc[,1])) > gcrma.engine2.R: index.affinities <- which(!is.na(pm.affinities[,1])) Those two "[,1] " in the code is actually "without loss of generality" reference. Affinities are missing if the probe sequence information is missing, and if it does it would be for all arrays. Thus by finding out which ones are missing in any array suffice -- and since there will always be at least one array, we did [,1] instead of [,10] or any other integer. I will test it and see if I can reproduce your problem and see where the discrepancy could arise. Best, Jean > Il Nov/8/10 10:19 AM, Markus Boenn ha scritto: >> >> Dear all, >> >> I've made some experiment about GCRMA using bg.adjust.gcrma() contained >> in the gcrma package. >> >> I take two CEL files, A.cel and B.cel (CASE I). Both are from arrays of >> the same type, for instance Mouse Gene 430 2.0 Arrays. In addition, I >> rename both files in a way, which changes the lexicographical order, >> say, I rename A.cel to BA.cel and I rename B.cel to AB.cel (CASE II). >> >> Using bg.adjust.gcrma() and exprs() to obtain the expression values, for >> some probe sets I get different values for A.cel (in CASE I) than for >> BA.cel (in CASE II), although both files contain the same data. Of >> course, for B.cel and AB.cel the same phenomenon becomes obvious. >> >> How can this be possible? To me, it seems as if GCRMA normalizes the >> data with respect to the lexicographical order, but as far as I know, >> GCRMA treats each array independently. >> >> This effect is not reproducible using call.exprs() with argument >> algorithm="rma" or "mas5". But is reproducible for other arrays like >> HGU95A, for example. >> >> Please, can anybody try to explain me, if this effect is a bug or a >> feature (and why)? >> >> Best wishes, >> Markus >> >> >> >> ################################################################# >> >> sessionInfo() >> R version 2.12.0 (2010-10-15) >> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >> >> locale: >> [1] C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> other attached packages: >> [1] hgu133plus2probe_2.7.0 AnnotationDbi_1.12.0 hgu133plus2cdf_2.7.0 [4] >> simpleaffy_2.26.0 genefilter_1.32.0 gcrma_2.22.0 [7] affy_1.28.0 >> Biobase_2.10.0 >> loaded via a namespace (and not attached): >> [1] Biostrings_2.18.0 DBI_0.2-5 IRanges_1.8.2 [4] RSQLite_0.9-2 >> affyio_1.18.0 annotate_1.28.0 [7] preprocessCore_1.12.0 splines_2.12.0 >> survival_2.35-8 [10] tools_2.12.0 xtable_1.5-6 >> >> Also tried on other machine with similar packages >> >> sessionInfo() >> R version 2.11.1 Patched (2010-09-16 r52943) >> Platform: i686-pc-linux-gnu (32-bit) >> >> locale: >> [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8 >> [5] LC_MONETARY=C LC_MESSAGES=de_DE.UTF-8 >> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] simpleaffy_2.24.0 genefilter_1.30.0 gcrma_2.20.0 affy_1.26.1 >> [5] Biobase_2.8.0 >> >> loaded via a namespace (and not attached): >> [1] affyio_1.16.0 annotate_1.26.1 AnnotationDbi_1.10.2 >> [4] Biostrings_2.16.9 DBI_0.2-5 IRanges_1.6.17 >> [7] preprocessCore_1.10.0 RSQLite_0.9-2 splines_2.11.1 >> [10] survival_2.35-8 tools_2.11.1 xtable_1.5-6 >> >> >> ################################################################# >> >> Code example >> >> setwd("PATH TO CASE I DATA") #the directory containing A.cel and B.cel >> CEL <- ReadAffy() >> bag.1 <- bg.adjust.gcrma(CEL) >> ebag.1 <- exprs(bag.1) >> >> setwd("PATH TO CASE II DATA") #the directory containing BA.cel and AB.cel >> CEL <- ReadAffy() >> bag.2 <- bg.adjust.gcrma(CEL) >> ebag.2 <- exprs(bag.2) >> >> par(mfrow=c(2,1)) >> # differences should be zero >> plot(ebag.1[,1]-ebag.2[,2]) >> plot(ebag.1[,2]-ebag.2[,1]) >> >> >> >> >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- ------------------------------------ Zhijin (Jean) Wu Assistant Professor of Biostatistics Brown University, Box G-S121 Providence, RI 02912 Tel: 401 863 1230 Fax: 401 863 9182 http://www.stat.brown.edu/zwu

ADD REPLY • link 14.1 years ago Zhijin Wu ▴ 260

0

Entering edit mode

On Mon, Nov 8, 2010 at 8:13 AM, Zhijin Wu <zwu at="" stat.brown.edu=""> wrote: > > On 11/8/2010 10:50 AM, Wolfgang Huber wrote: >> >> Hi Markus, >> >> Jean or Rafa will be able to give a more competent response, but as far >> as I can see from the code, there are some places where the first array >> is treated specially: >> >> ~/madman/Rpacks/gcrma/R$ grep ",1]" * >> gcrma.R: index2=which(!is.na(anc[,1])) >> gcrma.engine2.R: index.affinities <- which(!is.na(pm.affinities[,1])) > > Those two "[,1] " in the code is actually "without loss of generality" > reference. Affinities are missing if the probe sequence information is > missing, and if it does it would be for all arrays. Thus by finding out > which ones are missing in any array suffice -- and since there will always > be at least one array, we did [,1] instead of [,10] or any other integer. A safer strategy would be to identify it across *all* arrays (using OR logic), in case the is.na() is not the same for each array (which could happen under certain cases). /Henrik > > I will test it and see if I can reproduce your problem and see where the > discrepancy could arise. > > Best, > Jean > > > >> Il Nov/8/10 10:19 AM, Markus Boenn ha scritto: >>> >>> Dear all, >>> >>> I've made some experiment about GCRMA using bg.adjust.gcrma() contained >>> in the gcrma package. >>> >>> I take two CEL files, A.cel and B.cel (CASE I). Both are from arrays of >>> the same type, for instance Mouse Gene 430 2.0 Arrays. In addition, I >>> rename both files in a way, which changes the lexicographical order, >>> say, I rename A.cel to BA.cel and I rename B.cel to AB.cel (CASE II). >>> >>> Using bg.adjust.gcrma() and exprs() to obtain the expression values, for >>> some probe sets I get different values for A.cel (in CASE I) than for >>> BA.cel (in CASE II), although both files contain the same data. Of >>> course, for B.cel and AB.cel the same phenomenon becomes obvious. >>> >>> How can this be possible? To me, it seems as if GCRMA normalizes the >>> data with respect to the lexicographical order, but as far as I know, >>> GCRMA treats each array independently. >>> >>> This effect is not reproducible using call.exprs() with argument >>> algorithm="rma" or "mas5". But is reproducible for other arrays like >>> HGU95A, for example. >>> >>> Please, can anybody try to explain me, if this effect is a bug or a >>> feature (and why)? >>> >>> Best wishes, >>> Markus >>> >>> >>> >>> ################################################################# >>> >>> sessionInfo() >>> R version 2.12.0 (2010-10-15) >>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >>> >>> locale: >>> [1] C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> other attached packages: >>> [1] hgu133plus2probe_2.7.0 AnnotationDbi_1.12.0 hgu133plus2cdf_2.7.0 [4] >>> simpleaffy_2.26.0 genefilter_1.32.0 gcrma_2.22.0 [7] affy_1.28.0 >>> Biobase_2.10.0 >>> loaded via a namespace (and not attached): >>> [1] Biostrings_2.18.0 DBI_0.2-5 IRanges_1.8.2 [4] RSQLite_0.9-2 >>> affyio_1.18.0 annotate_1.28.0 [7] preprocessCore_1.12.0 splines_2.12.0 >>> survival_2.35-8 [10] tools_2.12.0 xtable_1.5-6 >>> >>> Also tried on other machine with similar packages >>> >>> sessionInfo() >>> R version 2.11.1 Patched (2010-09-16 r52943) >>> Platform: i686-pc-linux-gnu (32-bit) >>> >>> locale: >>> [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8 >>> [5] LC_MONETARY=C LC_MESSAGES=de_DE.UTF-8 >>> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] simpleaffy_2.24.0 genefilter_1.30.0 gcrma_2.20.0 affy_1.26.1 >>> [5] Biobase_2.8.0 >>> >>> loaded via a namespace (and not attached): >>> [1] affyio_1.16.0 annotate_1.26.1 AnnotationDbi_1.10.2 >>> [4] Biostrings_2.16.9 DBI_0.2-5 IRanges_1.6.17 >>> [7] preprocessCore_1.10.0 RSQLite_0.9-2 splines_2.11.1 >>> [10] survival_2.35-8 tools_2.11.1 xtable_1.5-6 >>> >>> >>> ################################################################# >>> >>> Code example >>> >>> setwd("PATH TO CASE I DATA") #the directory containing A.cel and B.cel >>> CEL <- ReadAffy() >>> bag.1 <- bg.adjust.gcrma(CEL) >>> ebag.1 <- exprs(bag.1) >>> >>> setwd("PATH TO CASE II DATA") #the directory containing BA.cel and AB.cel >>> CEL <- ReadAffy() >>> bag.2 <- bg.adjust.gcrma(CEL) >>> ebag.2 <- exprs(bag.2) >>> >>> par(mfrow=c(2,1)) >>> # differences should be zero >>> plot(ebag.1[,1]-ebag.2[,2]) >>> plot(ebag.1[,2]-ebag.2[,1]) >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > ------------------------------------ > Zhijin (Jean) Wu > Assistant Professor of Biostatistics > Brown University, Box G-S121 > Providence, RI ?02912 > > Tel: 401 863 1230 > Fax: 401 863 9182 > http://www.stat.brown.edu/zwu > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 14.1 years ago Henrik Bengtsson ★ 2.4k

0

Entering edit mode

just set the random seed to the same value prior to running bg.correct.gcrma() and you should be fine... ## setwd(CASEI) set.seed(1) CEL = ReadAffy() ebag.1 = exprs(bg.adjust.gcrma(CEL)) ## setwd(CASEII) set.seed(1) CEL = ReadAffy() ebag.2 = exprs(bg.adjust.gcrma(CEL)) b On 8 November 2010 15:50, Wolfgang Huber <whuber at="" embl.de=""> wrote: > Hi Markus, > > Jean or Rafa will be able to give a more competent response, but as far as I > can see from the code, there are some places where the first array is > treated specially: > > ~/madman/Rpacks/gcrma/R$ grep ",1]" * > gcrma.R: ? ? ?index2=which(!is.na(anc[,1])) > gcrma.engine2.R: ? ?index.affinities <- which(!is.na(pm.affinities[,1])) > justGCRMA.R: ? ? ? mm <- read.probematrix(filenames=filenames[i], > which="mm")$mm[,1] > justGCRMA.R: ? ?mm <- read.probematrix(filenames=filenames[i], > which="mm")$mm[,1] > > I agree that this is not the most desirable feature. > > Best wishes > ? ? ? ?Wolfgang > > > Il Nov/8/10 10:19 AM, Markus Boenn ha scritto: >> >> Dear all, >> >> I've made some experiment about GCRMA using bg.adjust.gcrma() contained >> in the gcrma package. >> >> I take two CEL files, A.cel and B.cel (CASE I). Both are from arrays of >> the same type, for instance Mouse Gene 430 2.0 Arrays. In addition, I >> rename both files in a way, which changes the lexicographical order, >> say, I rename A.cel to BA.cel and I rename B.cel to AB.cel (CASE II). >> >> Using bg.adjust.gcrma() and exprs() to obtain the expression values, for >> some probe sets I get different values for A.cel (in CASE I) than for >> BA.cel (in CASE II), although both files contain the same data. Of >> course, for B.cel and AB.cel the same phenomenon becomes obvious. >> >> How can this be possible? To me, it seems as if GCRMA normalizes the >> data with respect to the lexicographical order, but as far as I know, >> GCRMA treats each array independently. >> >> This effect is not reproducible using call.exprs() with argument >> algorithm="rma" or "mas5". But is reproducible for other arrays like >> HGU95A, for example. >> >> Please, can anybody try to explain me, if this effect is a bug or a >> feature (and why)? >> >> Best wishes, >> Markus >> >> >> >> ################################################################# >> >> sessionInfo() >> R version 2.12.0 (2010-10-15) >> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >> >> locale: >> [1] C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> other attached packages: >> [1] hgu133plus2probe_2.7.0 AnnotationDbi_1.12.0 hgu133plus2cdf_2.7.0 [4] >> simpleaffy_2.26.0 genefilter_1.32.0 gcrma_2.22.0 [7] affy_1.28.0 >> Biobase_2.10.0 >> loaded via a namespace (and not attached): >> [1] Biostrings_2.18.0 DBI_0.2-5 IRanges_1.8.2 [4] RSQLite_0.9-2 >> affyio_1.18.0 annotate_1.28.0 [7] preprocessCore_1.12.0 splines_2.12.0 >> survival_2.35-8 [10] tools_2.12.0 xtable_1.5-6 >> >> Also tried on other machine with similar packages >> >> sessionInfo() >> R version 2.11.1 Patched (2010-09-16 r52943) >> Platform: i686-pc-linux-gnu (32-bit) >> >> locale: >> [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8 >> [5] LC_MONETARY=C LC_MESSAGES=de_DE.UTF-8 >> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] simpleaffy_2.24.0 genefilter_1.30.0 gcrma_2.20.0 affy_1.26.1 >> [5] Biobase_2.8.0 >> >> loaded via a namespace (and not attached): >> [1] affyio_1.16.0 annotate_1.26.1 AnnotationDbi_1.10.2 >> [4] Biostrings_2.16.9 DBI_0.2-5 IRanges_1.6.17 >> [7] preprocessCore_1.10.0 RSQLite_0.9-2 splines_2.11.1 >> [10] survival_2.35-8 tools_2.11.1 xtable_1.5-6 >> >> >> ################################################################# >> >> Code example >> >> setwd("PATH TO CASE I DATA") #the directory containing A.cel and B.cel >> CEL <- ReadAffy() >> bag.1 <- bg.adjust.gcrma(CEL) >> ebag.1 <- exprs(bag.1) >> >> setwd("PATH TO CASE II DATA") #the directory containing BA.cel and AB.cel >> CEL <- ReadAffy() >> bag.2 <- bg.adjust.gcrma(CEL) >> ebag.2 <- exprs(bag.2) >> >> par(mfrow=c(2,1)) >> # differences should be zero >> plot(ebag.1[,1]-ebag.2[,2]) >> plot(ebag.1[,2]-ebag.2[,1]) >> >> >> >> >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 14.1 years ago Benilton Carvalho ★ 4.3k

0

Entering edit mode

On Mon, Nov 8, 2010 at 8:10 AM, Benilton Carvalho <beniltoncarvalho at="" gmail.com=""> wrote: > just set the random seed to the same value prior to running > bg.correct.gcrma() and you should be fine... > > ## setwd(CASEI) > set.seed(1) > CEL = ReadAffy() > ebag.1 = exprs(bg.adjust.gcrma(CEL)) > ## setwd(CASEII) > set.seed(1) > CEL = ReadAffy() > ebag.2 = exprs(bg.adjust.gcrma(CEL)) I don't think that works, because of how subsets used for estimates are drawn from all arrays, e.g. Subset <- sample(1:length(as.matrix(pms)[index.affinities,]),25000) y <- log2(pms)[index.affinities,][Subset] It samples 25,000 signals out of all probes across all arrays - note, not the same probes across all arrays. Thus, the ordering of the arrays (the columns) in 'pms' matters. Basically, by design of the estimators, that is due to the random sampling in GCRMA, one cannot/should not treat GCRMA to give identical estimates and signals from run to run. Any comparison between multiple runs should be done allowing for some deviation where the assumption is that it gives almost the same results from run to run. The reintroduction of "set.seed(1)" in the internal GCRMA code, makes this less clear and a bit confusing. In aroma.affymetrix we have ported GCRMA partly by reimplementing GCRMA and partly by using wrappers. Part of the redundancy testing is to validate the correctness. See http://aroma-project.org/replication/gcRMA which show how closely we reproduce the results. Those results are without fixing the seed, meaning that they illustrate the amount of randomness you should expect in the estimates/GCRMA signals. /Henrik > > b > > On 8 November 2010 15:50, Wolfgang Huber <whuber at="" embl.de=""> wrote: >> Hi Markus, >> >> Jean or Rafa will be able to give a more competent response, but as far as I >> can see from the code, there are some places where the first array is >> treated specially: >> >> ~/madman/Rpacks/gcrma/R$ grep ",1]" * >> gcrma.R: ? ? ?index2=which(!is.na(anc[,1])) >> gcrma.engine2.R: ? ?index.affinities <- which(!is.na(pm.affinities[,1])) >> justGCRMA.R: ? ? ? mm <- read.probematrix(filenames=filenames[i], >> which="mm")$mm[,1] >> justGCRMA.R: ? ?mm <- read.probematrix(filenames=filenames[i], >> which="mm")$mm[,1] >> >> I agree that this is not the most desirable feature. >> >> Best wishes >> ? ? ? ?Wolfgang >> >> >> Il Nov/8/10 10:19 AM, Markus Boenn ha scritto: >>> >>> Dear all, >>> >>> I've made some experiment about GCRMA using bg.adjust.gcrma() contained >>> in the gcrma package. >>> >>> I take two CEL files, A.cel and B.cel (CASE I). Both are from arrays of >>> the same type, for instance Mouse Gene 430 2.0 Arrays. In addition, I >>> rename both files in a way, which changes the lexicographical order, >>> say, I rename A.cel to BA.cel and I rename B.cel to AB.cel (CASE II). >>> >>> Using bg.adjust.gcrma() and exprs() to obtain the expression values, for >>> some probe sets I get different values for A.cel (in CASE I) than for >>> BA.cel (in CASE II), although both files contain the same data. Of >>> course, for B.cel and AB.cel the same phenomenon becomes obvious. >>> >>> How can this be possible? To me, it seems as if GCRMA normalizes the >>> data with respect to the lexicographical order, but as far as I know, >>> GCRMA treats each array independently. >>> >>> This effect is not reproducible using call.exprs() with argument >>> algorithm="rma" or "mas5". But is reproducible for other arrays like >>> HGU95A, for example. >>> >>> Please, can anybody try to explain me, if this effect is a bug or a >>> feature (and why)? >>> >>> Best wishes, >>> Markus >>> >>> >>> >>> ################################################################# >>> >>> sessionInfo() >>> R version 2.12.0 (2010-10-15) >>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >>> >>> locale: >>> [1] C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> other attached packages: >>> [1] hgu133plus2probe_2.7.0 AnnotationDbi_1.12.0 hgu133plus2cdf_2.7.0 [4] >>> simpleaffy_2.26.0 genefilter_1.32.0 gcrma_2.22.0 [7] affy_1.28.0 >>> Biobase_2.10.0 >>> loaded via a namespace (and not attached): >>> [1] Biostrings_2.18.0 DBI_0.2-5 IRanges_1.8.2 [4] RSQLite_0.9-2 >>> affyio_1.18.0 annotate_1.28.0 [7] preprocessCore_1.12.0 splines_2.12.0 >>> survival_2.35-8 [10] tools_2.12.0 xtable_1.5-6 >>> >>> Also tried on other machine with similar packages >>> >>> sessionInfo() >>> R version 2.11.1 Patched (2010-09-16 r52943) >>> Platform: i686-pc-linux-gnu (32-bit) >>> >>> locale: >>> [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8 >>> [5] LC_MONETARY=C LC_MESSAGES=de_DE.UTF-8 >>> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] simpleaffy_2.24.0 genefilter_1.30.0 gcrma_2.20.0 affy_1.26.1 >>> [5] Biobase_2.8.0 >>> >>> loaded via a namespace (and not attached): >>> [1] affyio_1.16.0 annotate_1.26.1 AnnotationDbi_1.10.2 >>> [4] Biostrings_2.16.9 DBI_0.2-5 IRanges_1.6.17 >>> [7] preprocessCore_1.10.0 RSQLite_0.9-2 splines_2.11.1 >>> [10] survival_2.35-8 tools_2.11.1 xtable_1.5-6 >>> >>> >>> ################################################################# >>> >>> Code example >>> >>> setwd("PATH TO CASE I DATA") #the directory containing A.cel and B.cel >>> CEL <- ReadAffy() >>> bag.1 <- bg.adjust.gcrma(CEL) >>> ebag.1 <- exprs(bag.1) >>> >>> setwd("PATH TO CASE II DATA") #the directory containing BA.cel and AB.cel >>> CEL <- ReadAffy() >>> bag.2 <- bg.adjust.gcrma(CEL) >>> ebag.2 <- exprs(bag.2) >>> >>> par(mfrow=c(2,1)) >>> # differences should be zero >>> plot(ebag.1[,1]-ebag.2[,2]) >>> plot(ebag.1[,2]-ebag.2[,1]) >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 14.1 years ago Henrik Bengtsson ★ 2.4k

0

Entering edit mode

Henrik is correct that the non-identical result is due to the sampling of the subsets in adjusting for specific binding (GSB.adj). The discrepancy I get from re-ordering the arrays on some human arrays I have is ranged between -.004 and 0.006. However I can change the code so that one does get absolutely identical result on the same data set. Jean On 11/8/2010 11:44 AM, Henrik Bengtsson wrote: > On Mon, Nov 8, 2010 at 8:10 AM, Benilton Carvalho > <beniltoncarvalho at="" gmail.com=""> wrote: >> just set the random seed to the same value prior to running >> bg.correct.gcrma() and you should be fine... >> >> ## setwd(CASEI) >> set.seed(1) >> CEL = ReadAffy() >> ebag.1 = exprs(bg.adjust.gcrma(CEL)) >> ## setwd(CASEII) >> set.seed(1) >> CEL = ReadAffy() >> ebag.2 = exprs(bg.adjust.gcrma(CEL)) > > I don't think that works, because of how subsets used for estimates > are drawn from all arrays, e.g. > > Subset<- sample(1:length(as.matrix(pms)[index.affinities,]),25000) > y<- log2(pms)[index.affinities,][Subset] > > It samples 25,000 signals out of all probes across all arrays - note, > not the same probes across all arrays. Thus, the ordering of the > arrays (the columns) in 'pms' matters. > > Basically, by design of the estimators, that is due to the random > sampling in GCRMA, one cannot/should not treat GCRMA to give identical > estimates and signals from run to run. Any comparison between > multiple runs should be done allowing for some deviation where the > assumption is that it gives almost the same results from run to run. > The reintroduction of "set.seed(1)" in the internal GCRMA code, makes > this less clear and a bit confusing. > > In aroma.affymetrix we have ported GCRMA partly by reimplementing > GCRMA and partly by using wrappers. Part of the redundancy testing is > to validate the correctness. See > > http://aroma-project.org/replication/gcRMA > > which show how closely we reproduce the results. Those results are > without fixing the seed, meaning that they illustrate the amount of > randomness you should expect in the estimates/GCRMA signals. > > /Henrik > >> >> b >> >> On 8 November 2010 15:50, Wolfgang Huber<whuber at="" embl.de=""> wrote: >>> Hi Markus, >>> >>> Jean or Rafa will be able to give a more competent response, but as far as I >>> can see from the code, there are some places where the first array is >>> treated specially: >>> >>> ~/madman/Rpacks/gcrma/R$ grep ",1]" * >>> gcrma.R: index2=which(!is.na(anc[,1])) >>> gcrma.engine2.R: index.affinities<- which(!is.na(pm.affinities[,1])) >>> justGCRMA.R: mm<- read.probematrix(filenames=filenames[i], >>> which="mm")$mm[,1] >>> justGCRMA.R: mm<- read.probematrix(filenames=filenames[i], >>> which="mm")$mm[,1] >>> >>> I agree that this is not the most desirable feature. >>> >>> Best wishes >>> Wolfgang >>> >>> >>> Il Nov/8/10 10:19 AM, Markus Boenn ha scritto: >>>> >>>> Dear all, >>>> >>>> I've made some experiment about GCRMA using bg.adjust.gcrma() contained >>>> in the gcrma package. >>>> >>>> I take two CEL files, A.cel and B.cel (CASE I). Both are from arrays of >>>> the same type, for instance Mouse Gene 430 2.0 Arrays. In addition, I >>>> rename both files in a way, which changes the lexicographical order, >>>> say, I rename A.cel to BA.cel and I rename B.cel to AB.cel (CASE II). >>>> >>>> Using bg.adjust.gcrma() and exprs() to obtain the expression values, for >>>> some probe sets I get different values for A.cel (in CASE I) than for >>>> BA.cel (in CASE II), although both files contain the same data. Of >>>> course, for B.cel and AB.cel the same phenomenon becomes obvious. >>>> >>>> How can this be possible? To me, it seems as if GCRMA normalizes the >>>> data with respect to the lexicographical order, but as far as I know, >>>> GCRMA treats each array independently. >>>> >>>> This effect is not reproducible using call.exprs() with argument >>>> algorithm="rma" or "mas5". But is reproducible for other arrays like >>>> HGU95A, for example. >>>> >>>> Please, can anybody try to explain me, if this effect is a bug or a >>>> feature (and why)? >>>> >>>> Best wishes, >>>> Markus >>>> >>>> >>>> >>>> ################################################################# >>>> >>>> sessionInfo() >>>> R version 2.12.0 (2010-10-15) >>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >>>> >>>> locale: >>>> [1] C >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> other attached packages: >>>> [1] hgu133plus2probe_2.7.0 AnnotationDbi_1.12.0 hgu133plus2cdf_2.7.0 [4] >>>> simpleaffy_2.26.0 genefilter_1.32.0 gcrma_2.22.0 [7] affy_1.28.0 >>>> Biobase_2.10.0 >>>> loaded via a namespace (and not attached): >>>> [1] Biostrings_2.18.0 DBI_0.2-5 IRanges_1.8.2 [4] RSQLite_0.9-2 >>>> affyio_1.18.0 annotate_1.28.0 [7] preprocessCore_1.12.0 splines_2.12.0 >>>> survival_2.35-8 [10] tools_2.12.0 xtable_1.5-6 >>>> >>>> Also tried on other machine with similar packages >>>> >>>> sessionInfo() >>>> R version 2.11.1 Patched (2010-09-16 r52943) >>>> Platform: i686-pc-linux-gnu (32-bit) >>>> >>>> locale: >>>> [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C >>>> [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8 >>>> [5] LC_MONETARY=C LC_MESSAGES=de_DE.UTF-8 >>>> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C >>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] simpleaffy_2.24.0 genefilter_1.30.0 gcrma_2.20.0 affy_1.26.1 >>>> [5] Biobase_2.8.0 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] affyio_1.16.0 annotate_1.26.1 AnnotationDbi_1.10.2 >>>> [4] Biostrings_2.16.9 DBI_0.2-5 IRanges_1.6.17 >>>> [7] preprocessCore_1.10.0 RSQLite_0.9-2 splines_2.11.1 >>>> [10] survival_2.35-8 tools_2.11.1 xtable_1.5-6 >>>> >>>> >>>> ################################################################# >>>> >>>> Code example >>>> >>>> setwd("PATH TO CASE I DATA") #the directory containing A.cel and B.cel >>>> CEL<- ReadAffy() >>>> bag.1<- bg.adjust.gcrma(CEL) >>>> ebag.1<- exprs(bag.1) >>>> >>>> setwd("PATH TO CASE II DATA") #the directory containing BA.cel and AB.cel >>>> CEL<- ReadAffy() >>>> bag.2<- bg.adjust.gcrma(CEL) >>>> ebag.2<- exprs(bag.2) >>>> >>>> par(mfrow=c(2,1)) >>>> # differences should be zero >>>> plot(ebag.1[,1]-ebag.2[,2]) >>>> plot(ebag.1[,2]-ebag.2[,1]) >>>> >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- ------------------------------------ Zhijin (Jean) Wu Assistant Professor of Biostatistics Brown University, Box G-S121 Providence, RI 02912 Tel: 401 863 1230 Fax: 401 863 9182 http://www.stat.brown.edu/zwu

ADD REPLY • link 14.1 years ago Zhijin Wu ▴ 260

0

Entering edit mode

On Mon, Nov 8, 2010 at 9:11 AM, Zhijin Wu <zwu at="" stat.brown.edu=""> wrote: > Henrik is correct that the non-identical result is due to the sampling of > the subsets in adjusting for specific binding (GSB.adj). > > The discrepancy I get from re-ordering the arrays on some human arrays I > have is ranged between -.004 and 0.006. However I can change the code so > that one does get absolutely identical result on the same data set. I would like to suggest that for backward compatibility you add some argument to the methods so that it is possible for users/developers to use older variants too. This lower the risk for surprises. For example: raw <- ReadAffy(filenames=getPathnames(cs)); es <- gcrma(raw, flavor="2010-11-08", verbose=TRUE); # The new one es <- gcrma(raw, flavor="2010-04-08", verbose=TRUE); # The current one (say) where the default es <- gcrma(raw, verbose=TRUE); is one of the "flavors" recommended (there may be better options than using dates to specify the variant). Argument 'flavor' need to be added to more low-level functions, e.g. gcrma.engine() etc. The above strategy allows you to add new variants of the estimators and when you find them be stable you can make one of them the new default, while still not breaking support for the old ones. Also, with an explicit argument 'flavor', it is more clear to the user/developer that there are/have been different variants of the algorithm. This would avoid surprises such as when gcRMA was significantly restructured a couple of years ago. Eventually you can deprecate some variants and drop them in the future. This makes the transition smoother for everyone. Being able to reproduce also old results is an important feature. Just some thoughts /Henrik > > Jean > On 11/8/2010 11:44 AM, Henrik Bengtsson wrote: >> >> On Mon, Nov 8, 2010 at 8:10 AM, Benilton Carvalho >> <beniltoncarvalho at="" gmail.com=""> ?wrote: >>> >>> just set the random seed to the same value prior to running >>> bg.correct.gcrma() and you should be fine... >>> >>> ## setwd(CASEI) >>> set.seed(1) >>> CEL = ReadAffy() >>> ebag.1 = exprs(bg.adjust.gcrma(CEL)) >>> ## setwd(CASEII) >>> set.seed(1) >>> CEL = ReadAffy() >>> ebag.2 = exprs(bg.adjust.gcrma(CEL)) >> >> I don't think that works, because of how subsets used for estimates >> are drawn from all arrays, e.g. >> >> ? ? Subset<- sample(1:length(as.matrix(pms)[index.affinities,]),25000) >> ? ? y<- log2(pms)[index.affinities,][Subset] >> >> It samples 25,000 signals out of all probes across all arrays - note, >> not the same probes across all arrays. ?Thus, the ordering of the >> arrays (the columns) in 'pms' matters. >> >> Basically, by design of the estimators, that is due to the random >> sampling in GCRMA, one cannot/should not treat GCRMA to give identical >> estimates and signals from run to run. ?Any comparison between >> multiple runs should be done allowing for some deviation where the >> assumption is that it gives almost the same results from run to run. >> The reintroduction of "set.seed(1)" in the internal GCRMA code, makes >> this less clear and a bit confusing. >> >> In aroma.affymetrix we have ported GCRMA partly by reimplementing >> GCRMA and partly by using wrappers. ?Part of the redundancy testing is >> to validate the correctness. ?See >> >> ? http://aroma-project.org/replication/gcRMA >> >> which show how closely we reproduce the results. ?Those results are >> without fixing the seed, meaning that they illustrate the amount of >> randomness you should expect in the estimates/GCRMA signals. >> >> /Henrik >> >>> >>> b >>> >>> On 8 November 2010 15:50, Wolfgang Huber<whuber at="" embl.de=""> ?wrote: >>>> >>>> Hi Markus, >>>> >>>> Jean or Rafa will be able to give a more competent response, but as far >>>> as I >>>> can see from the code, there are some places where the first array is >>>> treated specially: >>>> >>>> ~/madman/Rpacks/gcrma/R$ grep ",1]" * >>>> gcrma.R: ? ? ?index2=which(!is.na(anc[,1])) >>>> gcrma.engine2.R: ? ?index.affinities<- which(!is.na(pm.affinities[,1])) >>>> justGCRMA.R: ? ? ? mm<- read.probematrix(filenames=filenames[i], >>>> which="mm")$mm[,1] >>>> justGCRMA.R: ? ?mm<- read.probematrix(filenames=filenames[i], >>>> which="mm")$mm[,1] >>>> >>>> I agree that this is not the most desirable feature. >>>> >>>> Best wishes >>>> ? ? ? ?Wolfgang >>>> >>>> >>>> Il Nov/8/10 10:19 AM, Markus Boenn ha scritto: >>>>> >>>>> Dear all, >>>>> >>>>> I've made some experiment about GCRMA using bg.adjust.gcrma() contained >>>>> in the gcrma package. >>>>> >>>>> I take two CEL files, A.cel and B.cel (CASE I). Both are from arrays of >>>>> the same type, for instance Mouse Gene 430 2.0 Arrays. In addition, I >>>>> rename both files in a way, which changes the lexicographical order, >>>>> say, I rename A.cel to BA.cel and I rename B.cel to AB.cel (CASE II). >>>>> >>>>> Using bg.adjust.gcrma() and exprs() to obtain the expression values, >>>>> for >>>>> some probe sets I get different values for A.cel (in CASE I) than for >>>>> BA.cel (in CASE II), although both files contain the same data. Of >>>>> course, for B.cel and AB.cel the same phenomenon becomes obvious. >>>>> >>>>> How can this be possible? To me, it seems as if GCRMA normalizes the >>>>> data with respect to the lexicographical order, but as far as I know, >>>>> GCRMA treats each array independently. >>>>> >>>>> This effect is not reproducible using call.exprs() with argument >>>>> algorithm="rma" or "mas5". But is reproducible for other arrays like >>>>> HGU95A, for example. >>>>> >>>>> Please, can anybody try to explain me, if this effect is a bug or a >>>>> feature (and why)? >>>>> >>>>> Best wishes, >>>>> Markus >>>>> >>>>> >>>>> >>>>> ################################################################# >>>>> >>>>> sessionInfo() >>>>> R version 2.12.0 (2010-10-15) >>>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >>>>> >>>>> locale: >>>>> [1] C >>>>> >>>>> attached base packages: >>>>> [1] stats graphics grDevices utils datasets methods base >>>>> other attached packages: >>>>> [1] hgu133plus2probe_2.7.0 AnnotationDbi_1.12.0 hgu133plus2cdf_2.7.0 >>>>> [4] >>>>> simpleaffy_2.26.0 genefilter_1.32.0 gcrma_2.22.0 [7] affy_1.28.0 >>>>> Biobase_2.10.0 >>>>> loaded via a namespace (and not attached): >>>>> [1] Biostrings_2.18.0 DBI_0.2-5 IRanges_1.8.2 [4] RSQLite_0.9-2 >>>>> affyio_1.18.0 annotate_1.28.0 [7] preprocessCore_1.12.0 splines_2.12.0 >>>>> survival_2.35-8 [10] tools_2.12.0 xtable_1.5-6 >>>>> >>>>> Also tried on other machine with similar packages >>>>> >>>>> sessionInfo() >>>>> R version 2.11.1 Patched (2010-09-16 r52943) >>>>> Platform: i686-pc-linux-gnu (32-bit) >>>>> >>>>> locale: >>>>> [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C >>>>> [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8 >>>>> [5] LC_MONETARY=C LC_MESSAGES=de_DE.UTF-8 >>>>> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C >>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>>> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C >>>>> >>>>> attached base packages: >>>>> [1] stats graphics grDevices utils datasets methods base >>>>> >>>>> other attached packages: >>>>> [1] simpleaffy_2.24.0 genefilter_1.30.0 gcrma_2.20.0 affy_1.26.1 >>>>> [5] Biobase_2.8.0 >>>>> >>>>> loaded via a namespace (and not attached): >>>>> [1] affyio_1.16.0 annotate_1.26.1 AnnotationDbi_1.10.2 >>>>> [4] Biostrings_2.16.9 DBI_0.2-5 IRanges_1.6.17 >>>>> [7] preprocessCore_1.10.0 RSQLite_0.9-2 splines_2.11.1 >>>>> [10] survival_2.35-8 tools_2.11.1 xtable_1.5-6 >>>>> >>>>> >>>>> ################################################################# >>>>> >>>>> Code example >>>>> >>>>> setwd("PATH TO CASE I DATA") #the directory containing A.cel and B.cel >>>>> CEL<- ReadAffy() >>>>> bag.1<- bg.adjust.gcrma(CEL) >>>>> ebag.1<- exprs(bag.1) >>>>> >>>>> setwd("PATH TO CASE II DATA") #the directory containing BA.cel and >>>>> AB.cel >>>>> CEL<- ReadAffy() >>>>> bag.2<- bg.adjust.gcrma(CEL) >>>>> ebag.2<- exprs(bag.2) >>>>> >>>>> par(mfrow=c(2,1)) >>>>> # differences should be zero >>>>> plot(ebag.1[,1]-ebag.2[,2]) >>>>> plot(ebag.1[,2]-ebag.2[,1]) >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at stat.math.ethz.ch >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > ------------------------------------ > Zhijin (Jean) Wu > Assistant Professor of Biostatistics > Brown University, Box G-S121 > Providence, RI ?02912 > > Tel: 401 863 1230 > Fax: 401 863 9182 > http://www.stat.brown.edu/zwu > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 14.1 years ago Henrik Bengtsson ★ 2.4k

0

Entering edit mode

I see your point... thanks for the clarification. b On 8 November 2010 16:44, Henrik Bengtsson <hb at="" biostat.ucsf.edu=""> wrote: > On Mon, Nov 8, 2010 at 8:10 AM, Benilton Carvalho > <beniltoncarvalho at="" gmail.com=""> wrote: >> just set the random seed to the same value prior to running >> bg.correct.gcrma() and you should be fine... >> >> ## setwd(CASEI) >> set.seed(1) >> CEL = ReadAffy() >> ebag.1 = exprs(bg.adjust.gcrma(CEL)) >> ## setwd(CASEII) >> set.seed(1) >> CEL = ReadAffy() >> ebag.2 = exprs(bg.adjust.gcrma(CEL)) > > I don't think that works, because of how subsets used for estimates > are drawn from all arrays, e.g. > > ? ?Subset <- sample(1:length(as.matrix(pms)[index.affinities,]),25000) > ? ?y <- log2(pms)[index.affinities,][Subset] > > It samples 25,000 signals out of all probes across all arrays - note, > not the same probes across all arrays. ?Thus, the ordering of the > arrays (the columns) in 'pms' matters. > > Basically, by design of the estimators, that is due to the random > sampling in GCRMA, one cannot/should not treat GCRMA to give identical > estimates and signals from run to run. ?Any comparison between > multiple runs should be done allowing for some deviation where the > assumption is that it gives almost the same results from run to run. > The reintroduction of "set.seed(1)" in the internal GCRMA code, makes > this less clear and a bit confusing. > > In aroma.affymetrix we have ported GCRMA partly by reimplementing > GCRMA and partly by using wrappers. ?Part of the redundancy testing is > to validate the correctness. ?See > > ?http://aroma-project.org/replication/gcRMA > > which show how closely we reproduce the results. ?Those results are > without fixing the seed, meaning that they illustrate the amount of > randomness you should expect in the estimates/GCRMA signals. > > /Henrik > >> >> b >> >> On 8 November 2010 15:50, Wolfgang Huber <whuber at="" embl.de=""> wrote: >>> Hi Markus, >>> >>> Jean or Rafa will be able to give a more competent response, but as far as I >>> can see from the code, there are some places where the first array is >>> treated specially: >>> >>> ~/madman/Rpacks/gcrma/R$ grep ",1]" * >>> gcrma.R: ? ? ?index2=which(!is.na(anc[,1])) >>> gcrma.engine2.R: ? ?index.affinities <- which(!is.na(pm.affinities[,1])) >>> justGCRMA.R: ? ? ? mm <- read.probematrix(filenames=filenames[i], >>> which="mm")$mm[,1] >>> justGCRMA.R: ? ?mm <- read.probematrix(filenames=filenames[i], >>> which="mm")$mm[,1] >>> >>> I agree that this is not the most desirable feature. >>> >>> Best wishes >>> ? ? ? ?Wolfgang >>> >>> >>> Il Nov/8/10 10:19 AM, Markus Boenn ha scritto: >>>> >>>> Dear all, >>>> >>>> I've made some experiment about GCRMA using bg.adjust.gcrma() contained >>>> in the gcrma package. >>>> >>>> I take two CEL files, A.cel and B.cel (CASE I). Both are from arrays of >>>> the same type, for instance Mouse Gene 430 2.0 Arrays. In addition, I >>>> rename both files in a way, which changes the lexicographical order, >>>> say, I rename A.cel to BA.cel and I rename B.cel to AB.cel (CASE II). >>>> >>>> Using bg.adjust.gcrma() and exprs() to obtain the expression values, for >>>> some probe sets I get different values for A.cel (in CASE I) than for >>>> BA.cel (in CASE II), although both files contain the same data. Of >>>> course, for B.cel and AB.cel the same phenomenon becomes obvious. >>>> >>>> How can this be possible? To me, it seems as if GCRMA normalizes the >>>> data with respect to the lexicographical order, but as far as I know, >>>> GCRMA treats each array independently. >>>> >>>> This effect is not reproducible using call.exprs() with argument >>>> algorithm="rma" or "mas5". But is reproducible for other arrays like >>>> HGU95A, for example. >>>> >>>> Please, can anybody try to explain me, if this effect is a bug or a >>>> feature (and why)? >>>> >>>> Best wishes, >>>> Markus >>>> >>>> >>>> >>>> ################################################################# >>>> >>>> sessionInfo() >>>> R version 2.12.0 (2010-10-15) >>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >>>> >>>> locale: >>>> [1] C >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> other attached packages: >>>> [1] hgu133plus2probe_2.7.0 AnnotationDbi_1.12.0 hgu133plus2cdf_2.7.0 [4] >>>> simpleaffy_2.26.0 genefilter_1.32.0 gcrma_2.22.0 [7] affy_1.28.0 >>>> Biobase_2.10.0 >>>> loaded via a namespace (and not attached): >>>> [1] Biostrings_2.18.0 DBI_0.2-5 IRanges_1.8.2 [4] RSQLite_0.9-2 >>>> affyio_1.18.0 annotate_1.28.0 [7] preprocessCore_1.12.0 splines_2.12.0 >>>> survival_2.35-8 [10] tools_2.12.0 xtable_1.5-6 >>>> >>>> Also tried on other machine with similar packages >>>> >>>> sessionInfo() >>>> R version 2.11.1 Patched (2010-09-16 r52943) >>>> Platform: i686-pc-linux-gnu (32-bit) >>>> >>>> locale: >>>> [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C >>>> [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8 >>>> [5] LC_MONETARY=C LC_MESSAGES=de_DE.UTF-8 >>>> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C >>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] simpleaffy_2.24.0 genefilter_1.30.0 gcrma_2.20.0 affy_1.26.1 >>>> [5] Biobase_2.8.0 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] affyio_1.16.0 annotate_1.26.1 AnnotationDbi_1.10.2 >>>> [4] Biostrings_2.16.9 DBI_0.2-5 IRanges_1.6.17 >>>> [7] preprocessCore_1.10.0 RSQLite_0.9-2 splines_2.11.1 >>>> [10] survival_2.35-8 tools_2.11.1 xtable_1.5-6 >>>> >>>> >>>> ################################################################# >>>> >>>> Code example >>>> >>>> setwd("PATH TO CASE I DATA") #the directory containing A.cel and B.cel >>>> CEL <- ReadAffy() >>>> bag.1 <- bg.adjust.gcrma(CEL) >>>> ebag.1 <- exprs(bag.1) >>>> >>>> setwd("PATH TO CASE II DATA") #the directory containing BA.cel and AB.cel >>>> CEL <- ReadAffy() >>>> bag.2 <- bg.adjust.gcrma(CEL) >>>> ebag.2 <- exprs(bag.2) >>>> >>>> par(mfrow=c(2,1)) >>>> # differences should be zero >>>> plot(ebag.1[,1]-ebag.2[,2]) >>>> plot(ebag.1[,2]-ebag.2[,1]) >>>> >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >

ADD REPLY • link 14.1 years ago Benilton Carvalho ★ 4.3k

Login before adding your answer.