Advise on setting up a non-specific filter for differential expression
2
0
Entering edit mode
Tobias Straub ▴ 430
@tobias-straub-2182
Last seen 10.2 years ago
Hi Lucia I am not sure if I completely understand your problem, just want to mention that I routinely apply non-specific filtering based on MAS5 calls with a very good outcome (based on a prior-knowledge training set). I do not like so much the alternative approach - filtering based on variance or IQR - as it jeopardizes my preferred way of defining responders by applying a threshold on the local false discovery rate. Could you extend a bit on how you exactly filter based on MAS5 calls, how you define responders and non-responders in qPCR, how your "FDR disaster" exactly looks like. What is your model system by the way, which arrays you use? best regards T. On Aug 13, 2010, at 7:11 PM, Lucia Peixoto wrote: > Dear All, > I want to set up a non-specific filter to eliminate genes that are juts not > expressed from further statistical analysis. I've previously tried a filter > based on Mas5 presence/absence calls which turned out to be a disaster for > the FDR (as measured by lots of qPCRs), probably because 1/3 of the MM > probes actually hybridize better than PM, who knows. > > In any case, my plan is to set up a filter based both on raw fluorescent > intensity and IQR. I am trying to get as much sensitivity as possible > without increasing my FDR too much. > I was thinking that using the intensity distributions and box plots of the > raw data may be useful to determine what the best cutoffs to obtain the best > sensitivity will be. > Any advise on how to select appropriate cutoffs? > > Thank you very much in advance > Lucia > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ---------------------------------------------------------------------- Dr. Tobias Straub ++4989218075439 Adolf-Butenandt-Institute, M?nchen D
qPCR qPCR • 1.3k views
ADD COMMENT
0
Entering edit mode
Tobias Straub ▴ 430
@tobias-straub-2182
Last seen 10.2 years ago
Hi Lucia! did you already have a look at the nsFilter function in the genefilter package? nsFilter(eset, require.entrez = F, require.symbol = F, require.GOBP = FALSE, require.GOCC = FALSE, require.GOMF = FALSE, remove.dupEntrez = TRUE, var.func = IQR, var.cutoff = 0.5, var.filter = TRUE) would simply remove half of the genes based on low IQR across the arrays. assuming that a) non-expressed genes have a low IQR and b) that frequently not more than 50% of genes are expressed in any given tissue. this is far from a "present" filter but I have no other suggestion in case you don't trust MAS calls. out of curiosity: did you quality check your arrays, could there be batch effects, what's your pre- processing? best regards T. On Aug 16, 2010, at 4:50 PM, Lucia Peixoto wrote: > Thanks Tobias for your response > > I am processing data obtained with Affymetrix mouse chips (430_2, previous > version) > The first filterning was done based on presence/absence calls, so only genes > present in 2/17 samples were used. It is a 2 condition set up, with 8 and 9 > replicates for each condition. My definition of FDR in my previous question > was strictly limited to validation in 8+ independent qPCRs of 40+ randomly > selected genes obtained using a SAM cutoff of 5% FDR. So I am talking about > independently re-testing the reproducibility of gene expression, which is > the only way to really know your FDR. Using the Mas5 presence absence calls > filter leads to about 50% of the tested genes not being reproducible. > > If I remove the filtering and redo the analysis at 5% FDR, I get all the the > previous "false positives" to become true positives. Which was not a > surprise to me since about 1/3 of MM probes are known to hybridize better > than PM probes, so I don't know what Mas5 presence/absence really means, but > definitely cannot reflect accurately the presence of a transcript if the MM > probe hybridizes better. > > The problem is that I have a great loss of sensitivity (I have a lot of > positive controls so I know that), and I would like to increase that using a > filter that can come closer to really defining "present", because MM/PM does > not. > any ideas? > thanks > > Lucia > > > On Mon, Aug 16, 2010 at 8:34 AM, Tobias Straub > <tstraub at="" med.uni-muenchen.de="">wrote: > >> Hi Lucia >> >> I am not sure if I completely understand your problem, just want to mention >> that I routinely apply non-specific filtering based on MAS5 calls with a >> very good outcome (based on a prior-knowledge training set). I do not like >> so much the alternative approach - filtering based on variance or IQR - as >> it jeopardizes my preferred way of defining responders by applying a >> threshold on the local false discovery rate. >> >> Could you extend a bit on how you exactly filter based on MAS5 calls, how >> you define responders and non-responders in qPCR, how your "FDR disaster" >> exactly looks like. >> >> What is your model system by the way, which arrays you use? >> >> best regards >> T. >> >> >> On Aug 13, 2010, at 7:11 PM, Lucia Peixoto wrote: >> >>> Dear All, >>> I want to set up a non-specific filter to eliminate genes that are juts >> not >>> expressed from further statistical analysis. I've previously tried a >> filter >>> based on Mas5 presence/absence calls which turned out to be a disaster >> for >>> the FDR (as measured by lots of qPCRs), probably because 1/3 of the MM >>> probes actually hybridize better than PM, who knows. >>> >>> In any case, my plan is to set up a filter based both on raw fluorescent >>> intensity and IQR. I am trying to get as much sensitivity as possible >>> without increasing my FDR too much. >>> I was thinking that using the intensity distributions and box plots of >> the >>> raw data may be useful to determine what the best cutoffs to obtain the >> best >>> sensitivity will be. >>> Any advise on how to select appropriate cutoffs? >>> >>> Thank you very much in advance >>> Lucia >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> ---------------------------------------------------------------------- >> Dr. Tobias Straub ++4989218075439 Adolf-Butenandt-Institute, M?nchen D >> >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ---------------------------------------------------------------------- Dr. Tobias Straub ++4989218075439 Adolf-Butenandt-Institute, M?nchen D
ADD COMMENT
0
Entering edit mode
Tobias Straub ▴ 430
@tobias-straub-2182
Last seen 10.2 years ago
Hi Wolfgang, just an experience. in some of my analyses applying variance filtering resulted in problems fitting N(0,1) to the limma t statistic. now that i had a quick look at your paper I get an idea that combining limma with the variance filter is anyway not a good idea. the performance of mas call-based filtering/limma t as compared to variance filter/standard t is however (slightly) better as estimated by ROC curve analysis on my prior-knowledge data (3 arrays/condition). this is probably not unexpected? anyway thanks for pointing to the paper, apparently a must-read before applying the nsFilter function. best regards Tobias On Aug 17, 2010, at 9:36 AM, Wolfgang Huber wrote: > Hi Tobias, > you said you were worried about "filtering based on variance or IQR - as it jeopardizes ... applying a threshold on the local false discovery rate." I am not sure I understand what you mean, but the effect (or, if properly applied, non-effect) of filtering on type-I error is also discussed in [1] in some detail. > > > > [1] Richard Bourgon et al. Independent filtering increases detection power for high-throughput experiments. PNAS, 107(21):9546-9551, 2010. > [2] Talloen et al. I/NI-calls for the exclusion of non-informative genes: a highly effective filtering tool for microarray data. > Bioinformatics, doi:10.1093/bioinformatics/btm478 > > Best wishes > Wolfgang > > > On 16/08/10 16:50, Lucia Peixoto wrote: >> Thanks Tobias for your response >> >> I am processing data obtained with Affymetrix mouse chips (430_2, previous >> version) >> The first filterning was done based on presence/absence calls, so only genes >> present in 2/17 samples were used. It is a 2 condition set up, with 8 and 9 >> replicates for each condition. My definition of FDR in my previous question >> was strictly limited to validation in 8+ independent qPCRs of 40+ randomly >> selected genes obtained using a SAM cutoff of 5% FDR. So I am talking about >> independently re-testing the reproducibility of gene expression, which is >> the only way to really know your FDR. Using the Mas5 presence absence calls >> filter leads to about 50% of the tested genes not being reproducible. >> >> If I remove the filtering and redo the analysis at 5% FDR, I get all the the >> previous "false positives" to become true positives. Which was not a >> surprise to me since about 1/3 of MM probes are known to hybridize better >> than PM probes, so I don't know what Mas5 presence/absence really means, but >> definitely cannot reflect accurately the presence of a transcript if the MM >> probe hybridizes better. >> >> The problem is that I have a great loss of sensitivity (I have a lot of >> positive controls so I know that), and I would like to increase that using a >> filter that can come closer to really defining "present", because MM/PM does >> not. >> any ideas? >> thanks >> >> Lucia >> >> >> On Mon, Aug 16, 2010 at 8:34 AM, Tobias Straub >> <tstraub at="" med.uni-muenchen.de="">wrote: >> >>> Hi Lucia >>> >>> I am not sure if I completely understand your problem, just want to mention >>> that I routinely apply non-specific filtering based on MAS5 calls with a >>> very good outcome (based on a prior-knowledge training set). I do not like >>> so much the alternative approach - filtering based on variance or IQR - as >>> it jeopardizes my preferred way of defining responders by applying a >>> threshold on the local false discovery rate. >>> >>> Could you extend a bit on how you exactly filter based on MAS5 calls, how >>> you define responders and non-responders in qPCR, how your "FDR disaster" >>> exactly looks like. >>> >>> What is your model system by the way, which arrays you use? >>> >>> best regards >>> T. >>> >>> >>> On Aug 13, 2010, at 7:11 PM, Lucia Peixoto wrote: >>> >>>> Dear All, >>>> I want to set up a non-specific filter to eliminate genes that are juts >>> not >>>> expressed from further statistical analysis. I've previously tried a >>> filter >>>> based on Mas5 presence/absence calls which turned out to be a disaster >>> for >>>> the FDR (as measured by lots of qPCRs), probably because 1/3 of the MM >>>> probes actually hybridize better than PM, who knows. >>>> >>>> In any case, my plan is to set up a filter based both on raw fluorescent >>>> intensity and IQR. I am trying to get as much sensitivity as possible >>>> without increasing my FDR too much. >>>> I was thinking that using the intensity distributions and box plots of >>> the >>>> raw data may be useful to determine what the best cutoffs to obtain the >>> best >>>> sensitivity will be. >>>> Any advise on how to select appropriate cutoffs? >>>> >>>> Thank you very much in advance >>>> Lucia >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> ---------------------------------------------------------------------- >>> Dr. Tobias Straub ++4989218075439 Adolf-Butenandt-Institute, M?nchen D >>> >>> >> >> [[alternative HTML version deleted]] >> >> >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > > > Wolfgang Huber > EMBL > http://www.embl.de/research/units/genome_biology/huber > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ---------------------------------------------------------------------- Dr. Tobias Straub ++4989218075439 Adolf-Butenandt-Institute, M?nchen D
ADD COMMENT

Login before adding your answer.

Traffic: 816 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6