siggenes permutation count problem

0

Entering edit mode

Paul Boutros ▴ 340

@paul-boutros-371

Last seen 10.2 years ago

Hello, I'm having some troubles interpreting how/why siggenes performed a certain number of permutations on my dataset. This is an affy dataset that was normalized by: data <- ReadAffy(filenames=cel.files, phenoData="phenodata.txt"); eset <- expresso(data, normalize.method="constant", bgcorrect.method="none", pmcorrect.method="mas", summary.method="avgdiff"); I realize that the normalization is a bit unusual: this study is actually testing a range of normalization methods. This is a two-class experiment with 3 arrays in each group: > eset; Expression Set (exprSet) with 22690 genes 6 samples phenoData object with 1 variables and 6 cases varLabels Group: read from file > design; [1] 1 1 0 1 0 0 So to do a SAM-like analysis I used: SAM.data <- sam(data=eset, cl=design, var.equal=FALSE, B=1000); And I expected there to be 6! = 720 total possible permutations. So I was surprised to get this output: > SAM.data <- sam(data=eset, cl=design, var.equal=FALSE, B=1000); We're doing 20 complete permutations Why does siggenes think there are only 20 complete permutations to be used? Have I done something wrong, or is my understanding of how the permutations are done in error? This is R 2.2.1 and siggenes 1.4.0 on WinXP. Paul

Normalization affy siggenes Normalization affy siggenes • 1.3k views

ADD COMMENT • link updated 18.9 years ago by James W. MacDonald 67k • written 18.9 years ago by Paul Boutros ▴ 340

0

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 3 days ago

United States

paul.boutros at utoronto.ca wrote: > Hello, > > I'm having some troubles interpreting how/why siggenes performed a certain > number of permutations on my dataset. This is an affy dataset that was > normalized by: > > data <- ReadAffy(filenames=cel.files, phenoData="phenodata.txt"); > eset <- expresso(data, normalize.method="constant", bgcorrect.method="none", > pmcorrect.method="mas", summary.method="avgdiff"); > > I realize that the normalization is a bit unusual: this study is actually > testing a range of normalization methods. This is a two-class experiment with > 3 arrays in each group: > > >>eset; > > Expression Set (exprSet) with > 22690 genes > 6 samples > phenoData object with 1 variables and 6 cases > varLabels > Group: read from file > >>design; > > [1] 1 1 0 1 0 0 > > > So to do a SAM-like analysis I used: > SAM.data <- sam(data=eset, cl=design, var.equal=FALSE, B=1000); > > And I expected there to be 6! = 720 total possible permutations. So I was > surprised to get this output: > >>SAM.data <- sam(data=eset, cl=design, var.equal=FALSE, B=1000); > > > We're doing 20 complete permutations > > > Why does siggenes think there are only 20 complete permutations to be used? > Have I done something wrong, or is my understanding of how the permutations are > done in error? It's a combination of incorrect terminology and (possibly) a misunderstanding on your part. First, there *are* 720 possible permutations, but we don't care about the ordering within each group since we are simply comparing group means. What we really want here are combinations, and there are only 20 combinations when you have 6 samples and you are choosing three for each group (see ?choose). If you did all 720 permutations it would result in only 20 unique t-statistics with a lot of replication. This terminology is a hold over from SAM, which AFAIK really did do the permutations rather than combinations. However, this is very wasteful of computing time especially when the number of replicates gets large, so siggenes rightly does the combinations and abuses terminology by calling them 'complete permutations'. Best, Jim > > This is R 2.2.1 and siggenes 1.4.0 on WinXP. > > Paul > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor -- James W. MacDonald University of Michigan Affymetrix and cDNA Microarray Core 1500 E Medical Center Drive Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

ADD COMMENT • link 18.9 years ago James W. MacDonald 67k

0

Entering edit mode

Hi Jim (and others who replied off-list), Thank you -- when I saw the term "complete permutations", it didn't register in my head that it really meant combinations. Paul Quoting "James W. MacDonald" <jmacdon at="" med.umich.edu="">: > paul.boutros at utoronto.ca wrote: > > Hello, > > > > I'm having some troubles interpreting how/why siggenes performed a certain > > > number of permutations on my dataset. This is an affy dataset that was > > normalized by: > > > > data <- ReadAffy(filenames=cel.files, phenoData="phenodata.txt"); > > eset <- expresso(data, normalize.method="constant", > bgcorrect.method="none", > > pmcorrect.method="mas", summary.method="avgdiff"); > > > > I realize that the normalization is a bit unusual: this study is actually > > testing a range of normalization methods. This is a two-class experiment > with > > 3 arrays in each group: > > > > > >>eset; > > > > Expression Set (exprSet) with > > 22690 genes > > 6 samples > > phenoData object with 1 variables and 6 cases > > varLabels > > Group: read from file > > > >>design; > > > > [1] 1 1 0 1 0 0 > > > > > > So to do a SAM-like analysis I used: > > SAM.data <- sam(data=eset, cl=design, var.equal=FALSE, B=1000); > > > > And I expected there to be 6! = 720 total possible permutations. So I was > > > surprised to get this output: > > > >>SAM.data <- sam(data=eset, cl=design, var.equal=FALSE, B=1000); > > > > > > We're doing 20 complete permutations > > > > > > Why does siggenes think there are only 20 complete permutations to be used? > > > Have I done something wrong, or is my understanding of how the permutations > are > > done in error? > > It's a combination of incorrect terminology and (possibly) a > misunderstanding on your part. First, there *are* 720 possible > permutations, but we don't care about the ordering within each group > since we are simply comparing group means. What we really want here are > combinations, and there are only 20 combinations when you have 6 samples > and you are choosing three for each group (see ?choose). If you did all > 720 permutations it would result in only 20 unique t-statistics with a > lot of replication. > > This terminology is a hold over from SAM, which AFAIK really did do the > permutations rather than combinations. However, this is very wasteful of > computing time especially when the number of replicates gets large, so > siggenes rightly does the combinations and abuses terminology by calling > them 'complete permutations'. > > Best, > > Jim > > > > > > This is R 2.2.1 and siggenes 1.4.0 on WinXP. > > > > Paul > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > -- > James W. MacDonald > University of Michigan > Affymetrix and cDNA Microarray Core > 1500 E Medical Center Drive > Ann Arbor MI 48109 > 734-647-5623 > > > > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not be > used for urgent or sensitive issues. >

ADD REPLY • link 18.9 years ago Paul Boutros ▴ 340

Login before adding your answer.