nsFilter cutoff

0

Entering edit mode

james perkins ▴ 300

@james-perkins-2675

Last seen 10.1 years ago

Hi, I am finding the nsFilter IQR cutoff somewhat confusing. It says it is using IQR with a default cutoff of 0.5. This gives the impression that if you line up the data and take the value between the 0.25 and 0.75 quantiles, you would keep the probeset if this value was < 0.5 However this is not the case, so I would like to know how exactly does this work? Regards, James

• 1.1k views

ADD COMMENT • link updated 16.3 years ago by James W. MacDonald 67k • written 16.3 years ago by james perkins ▴ 300

0

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 5 hours ago

United States

Hi James, james perkins wrote: > Hi, > > I am finding the nsFilter IQR cutoff somewhat confusing. > > It says it is using IQR with a default cutoff of 0.5. > > This gives the impression that if you line up the data and take the > value between the 0.25 and 0.75 quantiles, you would keep the probeset > if this value was < 0.5 > > However this is not the case, so I would like to know how exactly does > this work? Actually it _is_ the case - perhaps you misunderstand something. First, get all probesets with an IQR > 0.5 > T1 <- apply(exprs(sample.ExpressionSet), 1, IQR) > 0.5 Now do the same using nsFilter() > T2 <- nsFilter(sample.ExpressionSet, FALSE, filterByQuantile = FALSE, feature.exclude="", remove.dupEntrez = FALSE) Are they the same? > all.equal(featureNames(sample.ExpressionSet)[T1], featureNames(T2$eset)) [1] TRUE Best, Jim > > Regards, > > James > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623

ADD COMMENT • link 16.3 years ago James W. MacDonald 67k

0

Entering edit mode

Hi James I meant when we have filterByQuantile as TRUE. In this case it seems to behave differently, and I can't figure out why, and I don't want to guess! Regards, Jim James W. MacDonald wrote: > Hi James, > > james perkins wrote: >> Hi, >> >> I am finding the nsFilter IQR cutoff somewhat confusing. >> >> It says it is using IQR with a default cutoff of 0.5. >> >> This gives the impression that if you line up the data and take the >> value between the 0.25 and 0.75 quantiles, you would keep the >> probeset if this value was < 0.5 >> >> However this is not the case, so I would like to know how exactly >> does this work? > > Actually it _is_ the case - perhaps you misunderstand something. > > First, get all probesets with an IQR > 0.5 > > T1 <- apply(exprs(sample.ExpressionSet), 1, IQR) > 0.5 > > Now do the same using nsFilter() > > T2 <- nsFilter(sample.ExpressionSet, FALSE, filterByQuantile = > FALSE, feature.exclude="", remove.dupEntrez = FALSE) > > Are they the same? > > all.equal(featureNames(sample.ExpressionSet)[T1], > featureNames(T2$eset)) > [1] TRUE > > Best, > > Jim > > > >> >> Regards, >> >> James >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 16.3 years ago james perkins ▴ 300

0

Entering edit mode

Hi James, james perkins wrote: > Hi James > > I meant when we have filterByQuantile as TRUE. In this case it seems to > behave differently, and I can't figure out why, and I don't want to guess! OK. That's a different question. The details section of the help page explains this: Note that by default the numerical-filter cutoff is interpreted as a quantile, so leaving the default values intact would filter out 50% of the genes remaining at this stage. If you prefer to set the cutoff at some absolute threshold, change the value of 'varByQuantile' to 'FALSE', and modify 'var.cutoff' accordingly. And looking at the code should help further: if (var.filter) { esetIqr <- apply(exprs(eset), 1, var.func) if (filterByQuantile) { if (0 < var.cutoff && var.cutoff < 1) { var.cutoff = quantile(esetIqr, var.cutoff) } else stop("Cutoff Quantile has to be between 0 and 1.") } selected <- esetIqr > var.cutoff So if you leave varByQuantile = TRUE then after you do the annotation-based filtering (GO, Entrez Gene, AFFX probesets, duplicates), you will take what remains and filter out 50%. Does that help? Best, Jim > > Regards, > > Jim > > James W. MacDonald wrote: >> Hi James, >> >> james perkins wrote: >>> Hi, >>> >>> I am finding the nsFilter IQR cutoff somewhat confusing. >>> >>> It says it is using IQR with a default cutoff of 0.5. >>> >>> This gives the impression that if you line up the data and take the >>> value between the 0.25 and 0.75 quantiles, you would keep the >>> probeset if this value was < 0.5 >>> >>> However this is not the case, so I would like to know how exactly >>> does this work? >> >> Actually it _is_ the case - perhaps you misunderstand something. >> >> First, get all probesets with an IQR > 0.5 >> > T1 <- apply(exprs(sample.ExpressionSet), 1, IQR) > 0.5 >> >> Now do the same using nsFilter() >> > T2 <- nsFilter(sample.ExpressionSet, FALSE, filterByQuantile = >> FALSE, feature.exclude="", remove.dupEntrez = FALSE) >> >> Are they the same? >> > all.equal(featureNames(sample.ExpressionSet)[T1], >> featureNames(T2$eset)) >> [1] TRUE >> >> Best, >> >> Jim >> >> >> >>> >>> Regards, >>> >>> James >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623

ADD REPLY • link 16.3 years ago James W. MacDonald 67k

0

Entering edit mode

Yes that makes perfect sense now. I thought this might be the case, but the additional filtering (by having Entrez id for example) meant that I didn't have half the number of initial probesets, which threw me a little. Thanks and regards, Jim James W. MacDonald wrote: > Hi James, > > james perkins wrote: >> Hi James >> >> I meant when we have filterByQuantile as TRUE. In this case it seems >> to behave differently, and I can't figure out why, and I don't want >> to guess! > > OK. That's a different question. The details section of the help page > explains this: > > Note that by default the numerical-filter cutoff is interpreted as > a quantile, so leaving the default values intact would filter out > 50% of the genes remaining at this stage. If you prefer to set the > cutoff at some absolute threshold, change the value of > 'varByQuantile' to 'FALSE', and modify 'var.cutoff' accordingly. > > And looking at the code should help further: > > > if (var.filter) { > esetIqr <- apply(exprs(eset), 1, var.func) > if (filterByQuantile) { > if (0 < var.cutoff && var.cutoff < 1) { > var.cutoff = quantile(esetIqr, var.cutoff) > } > else stop("Cutoff Quantile has to be between 0 and 1.") > } > selected <- esetIqr > var.cutoff > > So if you leave varByQuantile = TRUE then after you do the > annotation-based filtering (GO, Entrez Gene, AFFX probesets, > duplicates), you will take what remains and filter out 50%. > > Does that help? > > Best, > > Jim > > >> >> Regards, >> >> Jim >> >> James W. MacDonald wrote: >>> Hi James, >>> >>> james perkins wrote: >>>> Hi, >>>> >>>> I am finding the nsFilter IQR cutoff somewhat confusing. >>>> >>>> It says it is using IQR with a default cutoff of 0.5. >>>> >>>> This gives the impression that if you line up the data and take the >>>> value between the 0.25 and 0.75 quantiles, you would keep the >>>> probeset if this value was < 0.5 >>>> >>>> However this is not the case, so I would like to know how exactly >>>> does this work? >>> >>> Actually it _is_ the case - perhaps you misunderstand something. >>> >>> First, get all probesets with an IQR > 0.5 >>> > T1 <- apply(exprs(sample.ExpressionSet), 1, IQR) > 0.5 >>> >>> Now do the same using nsFilter() >>> > T2 <- nsFilter(sample.ExpressionSet, FALSE, filterByQuantile = >>> FALSE, feature.exclude="", remove.dupEntrez = FALSE) >>> >>> Are they the same? >>> > all.equal(featureNames(sample.ExpressionSet)[T1], >>> featureNames(T2$eset)) >>> [1] TRUE >>> >>> Best, >>> >>> Jim >>> >>> >>> >>>> >>>> Regards, >>>> >>>> James >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >

ADD REPLY • link 16.3 years ago james perkins ▴ 300

Login before adding your answer.