Entering edit mode
Eric E. Snyder
▴
20
@eric-e-snyder-4010
Last seen 10.3 years ago
Hello,
In my first project with R and BioConductor, I am analyzing some small
microarrays, starting with variance normalization with vsn. Using
Wolfgang Huber's VSN.pdf tutorial I was able to do the exercise with
the
"kidney" dataset without trouble. However, when trying to run:
> fit = vsn2( noDNAcontrols )
Error in .local(x, reference, strata, ...) :
One or more of the strata contain less than 42 elements.
Please reduce the number of strata so that there is enough in each
stratum.
using my own data, I got the error above. I finally got around the
error by simulating a dataset containing 50 controls (my original data
had only 6). Surprisingly, even 42 controls was insufficient.
A collaborator, using the same dataset, was able to run vsn
successfully
using an earlier version of R (2.9.0) and Bioconductor (version ?).
Is anyone familiar with this problem?
I see two ways forward:
1, Find the appropriate (old) version of Bioconductor and analyze
with
the original controls.
2. Use the current R/Bioconductor releases and either find a software
patch or a work-around.
As for #2, maybe it is not unreasonable to use >42 controls on most
microarrays. However, this particular dataset is from a series of
small
protein arrays (each probed with patient serum then visualized with
labeled anti-IgG) that contain only 214 antigens and 6 no DNA (meaning
"no protein") controls per patient (with a total 853 patients in the
dataset). Consequently, it is not possible to run a huge number of
controls, given the number of experimental cells per slide.
On a related note, in my effort to inflate the controls that I did
have
into a sufficiently large number, I used "rnorm" to
simulate/synthesize
the data. Here "noDNAstats" is a 2 x 853 matrix consisting of the
mean
and standard deviation from the patients' noDNAcontrols in the first
and
second rows, respectively.
i=1
noDNAsim50 = rnorm(50, noDNAstats[1,i], noDNAstats[2,i])
for(i in c( 2:ncol(noDNAstats) ) ){
noDNAsim50 = cbind(noDNAsim50, rnorm(50, noDNAstats[1,i],
noDNAstats[2,i]))
}
My understanding was that rnorm would create a dataset of the
requested
size with the requested mean and SD. The numbers I get are in the
same
ballpark but the means and SD are not the same. Am I missing
something?
Thanks!
eesnyder
--
Eric E. Snyder, Ph.D.
Virginia Bioinformatics Institute
Virginia Polytechnic Institute and State University
Blacksburg, VA 24061-0447
USA
Email: eesnyder at vbi.vt.edu
Phone: (540) 231-5428
JDAM: N 37 13.248', W 80 25.551'