Question

Vsn transformation repeatability

0

Entering edit mode

Paolo Innocenti ▴ 320

@paolo-innocenti-2191

Last seen 10.4 years ago

Hi all, We are attempting to identify significantly differentially expressed genes between two treatments of female Drosophila melanogaster, using results from Affymetrix Drosophila2 microarray chip hybridization (8 chips, 4 per treatment). The end result is that the number of significantly differentially expressed genes for a particular FDR changes with each run, despite the input data and the program code remaining unchanged. We are using ?vsnrma? for pre-processing, which utilizes a search algorithm to find the maximum likelihood transformed values. The procedure produces slightly different calibrated GLOG2-transformed expression values on each run, despite the untransformed expression values and the R code being identical each time. Our first guess is that the search algorithm is unable to locate the likelihood peak and settles on a slightly different set of transformed values each time. Is this a possibility? If so, is there anything we can do to fix it? E.g. increase the number of iterations, or modify the search algorithm. Many thanks in advance. Paolo Innocenti > library(affy) Loading required package: Biobase Loading required package: tools Welcome to Bioconductor Vignettes contain introductory material. To view, type 'openVignette()'. To cite Bioconductor, see 'citation("Biobase")' and for packages 'citation(pkgname)'. > library(vsn) Loading required package: lattice Loading required package: limma > data <- ReadAffy(celfile.path="./celfiles") > sampleNames(data) <- c("z1","z2","z3","z4","m1","m2","m3","m4") > eset1 <- vsnrma(data) 100% done.4 x 8 matrix (1 stratum). 0% done. Please use 'meanSdPlot' to verify the fit. Calculating Expression > eset2 <- vsnrma(data) 100% done.4 x 8 matrix (1 stratum). 0% done. Please use 'meanSdPlot' to verify the fit. Calculating Expression > exprs(eset1)[1,] z1 z2 z3 z4 m1 m2 m3 m4 12.52630 12.23441 12.54579 12.35582 12.45190 12.35984 12.33933 12.35031 > exprs(eset2)[1,] z1 z2 z3 z4 m1 m2 m3 m4 12.52422 12.24059 12.53899 12.35949 12.44828 12.35460 12.34465 12.35277 > head(exprs(eset1))==head(exprs(eset2)) z1 z2 z3 z4 m1 m2 m3 m4 1616608_a_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 1622892_s_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 1622893_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 1622894_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 1622895_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 1622896_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > sessionInfo() R version 2.8.0 (2008-10-20) i686-pc-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US .UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_N AME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTI FICATION=C attached base packages: [1] tools stats graphics grDevices utils datasets methods [8] base other attached packages: [1] drosophila2cdf_2.3.0 vsn_3.8.0 limma_2.16.2 [4] lattice_0.17-15 affy_1.20.0 Biobase_2.2.0 loaded via a namespace (and not attached): [1] affyio_1.10.0 grid_2.8.0 preprocessCore_1.4.0 -- Paolo Innocenti Department of Animal Ecology, EBC Uppsala University Norbyv?gen 18D 75236 Uppsala, Sweden

Microarray Drosophila melanogaster drosophila2 Microarray Drosophila melanogaster drosophila2 • 1.3k views

ADD COMMENT • link updated 16.3 years ago by Wolfgang Huber ★ 13k • written 16.3 years ago by Paolo Innocenti ▴ 320

score 0 · Answer 1 · 2008-10-29

Dear Paolo, thank you for your feedback! There are three issues: 1. You say "number of ... differentially expressed genes for a particular FDR changes with each run". By how much? The variation should be tiny (negligible?), otherwise I would be worried about the stability of your method to compute differential expressedness, and FDR. 2. "vsnrma" calls the "vsn2" method for "AffyBatch" objects, which by default estimates the transformation from a random subsample of the microarray features, see showMethods(f="vsn2", classes="AffyBatch", includeDefs=TRUE) Please see the code example below for how to deactivate the subsampling. This should be the only source of (pseudo)randomness, everything else (including the numerical optimisation) should be deterministic. 3. Of course, there is also "set.seed" (which is acceptable if you have convinced yourself about point 1 above.) Best wishes Wolfgang library("vsn") library("affydata") data("Dilution") u1 = vsnrma(Dilution) u2 = vsnrma(Dilution) identical(exprs(u1), exprs(u2)) # [1] FALSE u3 = vsnrma(Dilution, subsample=0) u4 = vsnrma(Dilution, subsample=0) identical(exprs(u3), exprs(u4)) # [1] TRUE sessionInfo() R version 2.9.0 Under development (unstable) (2008-10-28 r46793) x86_64-unknown-linux-gnu locale: LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT .UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_N AME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTI FICATION=C attached base packages: [1] tools stats graphics grDevices utils datasets methods [8] base other attached packages: [1] hgu95av2cdf_2.2.0 affydata_1.11.3 vsn_3.9.2 [4] limma_2.14.7 affy_1.18.2 preprocessCore_1.2.1 [7] affyio_1.11.0 Biobase_2.0.1 lattice_0.17-15 [10] fortunes_1.3-5 loaded via a namespace (and not attached): [1] grid_2.9.0 ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber 29/10/2008 10:20 Paolo Innocenti scripsit > Hi all, > > We are attempting to identify significantly differentially expressed > genes between two treatments of female Drosophila melanogaster, using > results from Affymetrix Drosophila2 microarray chip hybridization (8 > chips, 4 per treatment). The end result is that the number of > significantly differentially expressed genes for a particular FDR > changes with each run, despite the input data and the program code > remaining unchanged. > > We are using ?vsnrma? for pre-processing, which utilizes a search > algorithm to find the maximum likelihood transformed values. The > procedure produces slightly different calibrated GLOG2-transformed > expression values on each run, despite the untransformed expression > values and the R code being identical each time. Our first guess is that > the search algorithm is unable to locate the likelihood peak and settles > on a slightly different set of transformed values each time. Is this a > possibility? If so, is there anything we can do to fix it? E.g. increase > the number of iterations, or modify the search algorithm. > > Many thanks in advance. > Paolo Innocenti > > >> library(affy) > Loading required package: Biobase > Loading required package: tools > > Welcome to Bioconductor > > Vignettes contain introductory material. To view, type > 'openVignette()'. To cite Bioconductor, see > 'citation("Biobase")' and for packages 'citation(pkgname)'. > >> library(vsn) > Loading required package: lattice > Loading required package: limma >> data <- ReadAffy(celfile.path="./celfiles") >> sampleNames(data) <- c("z1","z2","z3","z4","m1","m2","m3","m4") >> eset1 <- vsnrma(data) > 100% done.4 x 8 matrix (1 stratum). 0% done. > Please use 'meanSdPlot' to verify the fit. > Calculating Expression >> eset2 <- vsnrma(data) > 100% done.4 x 8 matrix (1 stratum). 0% done. > Please use 'meanSdPlot' to verify the fit. > Calculating Expression >> exprs(eset1)[1,] > z1 z2 z3 z4 m1 m2 m3 m4 > 12.52630 12.23441 12.54579 12.35582 12.45190 12.35984 12.33933 12.35031 >> exprs(eset2)[1,] > z1 z2 z3 z4 m1 m2 m3 m4 > 12.52422 12.24059 12.53899 12.35949 12.44828 12.35460 12.34465 12.35277 >> head(exprs(eset1))==head(exprs(eset2)) > z1 z2 z3 z4 m1 m2 m3 m4 > 1616608_a_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > 1622892_s_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > 1622893_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > 1622894_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > 1622895_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > 1622896_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE >> sessionInfo() > R version 2.8.0 (2008-10-20) > i686-pc-linux-gnu > > locale: > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_ US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC _NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDEN TIFICATION=C > > > attached base packages: > [1] tools stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] drosophila2cdf_2.3.0 vsn_3.8.0 limma_2.16.2 > [4] lattice_0.17-15 affy_1.20.0 Biobase_2.2.0 > > loaded via a namespace (and not attached): > [1] affyio_1.10.0 grid_2.8.0 preprocessCore_1.4.0 > >