siggenes permutation count problem
1
0
Entering edit mode
Paul Boutros ▴ 340
@paul-boutros-371
Last seen 10.2 years ago
Hello, I'm having some troubles interpreting how/why siggenes performed a certain number of permutations on my dataset. This is an affy dataset that was normalized by: data <- ReadAffy(filenames=cel.files, phenoData="phenodata.txt"); eset <- expresso(data, normalize.method="constant", bgcorrect.method="none", pmcorrect.method="mas", summary.method="avgdiff"); I realize that the normalization is a bit unusual: this study is actually testing a range of normalization methods. This is a two-class experiment with 3 arrays in each group: > eset; Expression Set (exprSet) with 22690 genes 6 samples phenoData object with 1 variables and 6 cases varLabels Group: read from file > design; [1] 1 1 0 1 0 0 So to do a SAM-like analysis I used: SAM.data <- sam(data=eset, cl=design, var.equal=FALSE, B=1000); And I expected there to be 6! = 720 total possible permutations. So I was surprised to get this output: > SAM.data <- sam(data=eset, cl=design, var.equal=FALSE, B=1000); We're doing 20 complete permutations Why does siggenes think there are only 20 complete permutations to be used? Have I done something wrong, or is my understanding of how the permutations are done in error? This is R 2.2.1 and siggenes 1.4.0 on WinXP. Paul
Normalization affy siggenes Normalization affy siggenes • 1.3k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 3 days ago
United States
paul.boutros at utoronto.ca wrote: > Hello, > > I'm having some troubles interpreting how/why siggenes performed a certain > number of permutations on my dataset. This is an affy dataset that was > normalized by: > > data <- ReadAffy(filenames=cel.files, phenoData="phenodata.txt"); > eset <- expresso(data, normalize.method="constant", bgcorrect.method="none", > pmcorrect.method="mas", summary.method="avgdiff"); > > I realize that the normalization is a bit unusual: this study is actually > testing a range of normalization methods. This is a two-class experiment with > 3 arrays in each group: > > >>eset; > > Expression Set (exprSet) with > 22690 genes > 6 samples > phenoData object with 1 variables and 6 cases > varLabels > Group: read from file > >>design; > > [1] 1 1 0 1 0 0 > > > So to do a SAM-like analysis I used: > SAM.data <- sam(data=eset, cl=design, var.equal=FALSE, B=1000); > > And I expected there to be 6! = 720 total possible permutations. So I was > surprised to get this output: > >>SAM.data <- sam(data=eset, cl=design, var.equal=FALSE, B=1000); > > > We're doing 20 complete permutations > > > Why does siggenes think there are only 20 complete permutations to be used? > Have I done something wrong, or is my understanding of how the permutations are > done in error? It's a combination of incorrect terminology and (possibly) a misunderstanding on your part. First, there *are* 720 possible permutations, but we don't care about the ordering within each group since we are simply comparing group means. What we really want here are combinations, and there are only 20 combinations when you have 6 samples and you are choosing three for each group (see ?choose). If you did all 720 permutations it would result in only 20 unique t-statistics with a lot of replication. This terminology is a hold over from SAM, which AFAIK really did do the permutations rather than combinations. However, this is very wasteful of computing time especially when the number of replicates gets large, so siggenes rightly does the combinations and abuses terminology by calling them 'complete permutations'. Best, Jim > > This is R 2.2.1 and siggenes 1.4.0 on WinXP. > > Paul > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor -- James W. MacDonald University of Michigan Affymetrix and cDNA Microarray Core 1500 E Medical Center Drive Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.
ADD COMMENT
0
Entering edit mode
Hi Jim (and others who replied off-list), Thank you -- when I saw the term "complete permutations", it didn't register in my head that it really meant combinations. Paul Quoting "James W. MacDonald" <jmacdon at="" med.umich.edu="">: > paul.boutros at utoronto.ca wrote: > > Hello, > > > > I'm having some troubles interpreting how/why siggenes performed a certain > > > number of permutations on my dataset. This is an affy dataset that was > > normalized by: > > > > data <- ReadAffy(filenames=cel.files, phenoData="phenodata.txt"); > > eset <- expresso(data, normalize.method="constant", > bgcorrect.method="none", > > pmcorrect.method="mas", summary.method="avgdiff"); > > > > I realize that the normalization is a bit unusual: this study is actually > > testing a range of normalization methods. This is a two-class experiment > with > > 3 arrays in each group: > > > > > >>eset; > > > > Expression Set (exprSet) with > > 22690 genes > > 6 samples > > phenoData object with 1 variables and 6 cases > > varLabels > > Group: read from file > > > >>design; > > > > [1] 1 1 0 1 0 0 > > > > > > So to do a SAM-like analysis I used: > > SAM.data <- sam(data=eset, cl=design, var.equal=FALSE, B=1000); > > > > And I expected there to be 6! = 720 total possible permutations. So I was > > > surprised to get this output: > > > >>SAM.data <- sam(data=eset, cl=design, var.equal=FALSE, B=1000); > > > > > > We're doing 20 complete permutations > > > > > > Why does siggenes think there are only 20 complete permutations to be used? > > > Have I done something wrong, or is my understanding of how the permutations > are > > done in error? > > It's a combination of incorrect terminology and (possibly) a > misunderstanding on your part. First, there *are* 720 possible > permutations, but we don't care about the ordering within each group > since we are simply comparing group means. What we really want here are > combinations, and there are only 20 combinations when you have 6 samples > and you are choosing three for each group (see ?choose). If you did all > 720 permutations it would result in only 20 unique t-statistics with a > lot of replication. > > This terminology is a hold over from SAM, which AFAIK really did do the > permutations rather than combinations. However, this is very wasteful of > computing time especially when the number of replicates gets large, so > siggenes rightly does the combinations and abuses terminology by calling > them 'complete permutations'. > > Best, > > Jim > > > > > > This is R 2.2.1 and siggenes 1.4.0 on WinXP. > > > > Paul > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > -- > James W. MacDonald > University of Michigan > Affymetrix and cDNA Microarray Core > 1500 E Medical Center Drive > Ann Arbor MI 48109 > 734-647-5623 > > > > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not be > used for urgent or sensitive issues. >
ADD REPLY

Login before adding your answer.

Traffic: 509 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6