Question

VSN: minimum number of controls?

0

Entering edit mode

Eric E. Snyder ▴ 20

@eric-e-snyder-4010

Last seen 10.6 years ago

Hello, In my first project with R and BioConductor, I am analyzing some small microarrays, starting with variance normalization with vsn. Using Wolfgang Huber's VSN.pdf tutorial I was able to do the exercise with the "kidney" dataset without trouble. However, when trying to run: > fit = vsn2( noDNAcontrols ) Error in .local(x, reference, strata, ...) : One or more of the strata contain less than 42 elements. Please reduce the number of strata so that there is enough in each stratum. using my own data, I got the error above. I finally got around the error by simulating a dataset containing 50 controls (my original data had only 6). Surprisingly, even 42 controls was insufficient. A collaborator, using the same dataset, was able to run vsn successfully using an earlier version of R (2.9.0) and Bioconductor (version ?). Is anyone familiar with this problem? I see two ways forward: 1, Find the appropriate (old) version of Bioconductor and analyze with the original controls. 2. Use the current R/Bioconductor releases and either find a software patch or a work-around. As for #2, maybe it is not unreasonable to use >42 controls on most microarrays. However, this particular dataset is from a series of small protein arrays (each probed with patient serum then visualized with labeled anti-IgG) that contain only 214 antigens and 6 no DNA (meaning "no protein") controls per patient (with a total 853 patients in the dataset). Consequently, it is not possible to run a huge number of controls, given the number of experimental cells per slide. On a related note, in my effort to inflate the controls that I did have into a sufficiently large number, I used "rnorm" to simulate/synthesize the data. Here "noDNAstats" is a 2 x 853 matrix consisting of the mean and standard deviation from the patients' noDNAcontrols in the first and second rows, respectively. i=1 noDNAsim50 = rnorm(50, noDNAstats[1,i], noDNAstats[2,i]) for(i in c( 2:ncol(noDNAstats) ) ){ noDNAsim50 = cbind(noDNAsim50, rnorm(50, noDNAstats[1,i], noDNAstats[2,i])) } My understanding was that rnorm would create a dataset of the requested size with the requested mean and SD. The numbers I get are in the same ballpark but the means and SD are not the same. Am I missing something? Thanks! eesnyder -- Eric E. Snyder, Ph.D. Virginia Bioinformatics Institute Virginia Polytechnic Institute and State University Blacksburg, VA 24061-0447 USA Email: eesnyder at vbi.vt.edu Phone: (540) 231-5428 JDAM: N 37 13.248', W 80 25.551'

Normalization vsn Normalization vsn • 1.4k views

ADD COMMENT • link updated 15.1 years ago by Martin Morgan 25k • written 15.1 years ago by Eric E. Snyder ▴ 20

score 0 · Answer 1 · 2010-04-03

Hi Eric -- On 04/02/2010 02:41 PM, Eric E. Snyder wrote: > Hello, > > In my first project with R and BioConductor, I am analyzing some small > microarrays, starting with variance normalization with vsn. Using > Wolfgang Huber's VSN.pdf tutorial I was able to do the exercise with the > "kidney" dataset without trouble. However, when trying to run: > >> fit = vsn2( noDNAcontrols ) > Error in .local(x, reference, strata, ...) : > One or more of the strata contain less than 42 elements. > Please reduce the number of strata so that there is enough in each stratum. Always good to provide sessionInfo() so that we know the details of the software you're using > library(vsn) > sessionInfo() R version 2.10.1 Patched (2010-03-27 r51570) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] vsn_3.14.0 Biobase_2.6.1 loaded via a namespace (and not attached): [1] affy_1.24.2 affyio_1.14.0 grid_2.10.1 [4] lattice_0.18-3 limma_3.2.3 preprocessCore_1.8.0 and then good to try for a reproducible example, or at least enough info for other to reproduce your error. I started with example(vsn2) and then > vsn2(kidney[1:20,]) Error in vsnMatrix(exprs(x), reference, strata, ...) : One or more of the strata contain less than 42 elements. Please reduce the number of strata so that there is enough in each stratum. My guess is that noDNAcontrols is a matrix-like object with rows and columns transposed, i.e., samples x features rather than features x samples. What is class(noDNAcontrols) and dim(noDNAcontrols) ? Might as well copy and paste the output directly from R > using my own data, I got the error above. I finally got around the > error by simulating a dataset containing 50 controls (my original data > had only 6). Surprisingly, even 42 controls was insufficient. > > A collaborator, using the same dataset, was able to run vsn successfully > using an earlier version of R (2.9.0) and Bioconductor (version ?). > > Is anyone familiar with this problem? > > I see two ways forward: > > 1, Find the appropriate (old) version of Bioconductor and analyze with > the original controls. > > 2. Use the current R/Bioconductor releases and either find a software > patch or a work-around. > > As for #2, maybe it is not unreasonable to use >42 controls on most > microarrays. However, this particular dataset is from a series of small > protein arrays (each probed with patient serum then visualized with > labeled anti-IgG) that contain only 214 antigens and 6 no DNA (meaning > "no protein") controls per patient (with a total 853 patients in the > dataset). Consequently, it is not possible to run a huge number of > controls, given the number of experimental cells per slide. > > On a related note, in my effort to inflate the controls that I did have > into a sufficiently large number, I used "rnorm" to simulate/synthesize > the data. Here "noDNAstats" is a 2 x 853 matrix consisting of the mean > and standard deviation from the patients' noDNAcontrols in the first and > second rows, respectively. > > i=1 > noDNAsim50 = rnorm(50, noDNAstats[1,i], noDNAstats[2,i]) > for(i in c( 2:ncol(noDNAstats) ) ){ > noDNAsim50 = cbind(noDNAsim50, rnorm(50, noDNAstats[1,i], > noDNAstats[2,i])) > } > > My understanding was that rnorm would create a dataset of the requested > size with the requested mean and SD. The numbers I get are in the same > ballpark but the means and SD are not the same. Am I missing something? at one level this looks ok, but there isn't enough info to reproduce, or to see precisely what your problem is. Can you be more specific, maybe with a simpler example, say creating a matrix with two columns, where you specify mean and sd as numbers directly rather than 'hidden' in a matrix that we don't have access to? Martin > > Thanks! > eesnyder -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793