Advise on setting up a non-specific filter for differential expression

0

Entering edit mode

Lucia Peixoto ▴ 330

@lucia-peixoto-4203

Last seen 10.6 years ago

Thanks Tobias for your response I am processing data obtained with Affymetrix mouse chips (430_2, previous version) The first filterning was done based on presence/absence calls, so only genes present in 2/17 samples were used. It is a 2 condition set up, with 8 and 9 replicates for each condition. My definition of FDR in my previous question was strictly limited to validation in 8+ independent qPCRs of 40+ randomly selected genes obtained using a SAM cutoff of 5% FDR. So I am talking about independently re-testing the reproducibility of gene expression, which is the only way to really know your FDR. Using the Mas5 presence absence calls filter leads to about 50% of the tested genes not being reproducible. If I remove the filtering and redo the analysis at 5% FDR, I get all the the previous "false positives" to become true positives. Which was not a surprise to me since about 1/3 of MM probes are known to hybridize better than PM probes, so I don't know what Mas5 presence/absence really means, but definitely cannot reflect accurately the presence of a transcript if the MM probe hybridizes better. The problem is that I have a great loss of sensitivity (I have a lot of positive controls so I know that), and I would like to increase that using a filter that can come closer to really defining "present", because MM/PM does not. any ideas? thanks Lucia On Mon, Aug 16, 2010 at 8:34 AM, Tobias Straub <tstraub@med.uni-muenchen.de>wrote: > Hi Lucia > > I am not sure if I completely understand your problem, just want to mention > that I routinely apply non-specific filtering based on MAS5 calls with a > very good outcome (based on a prior-knowledge training set). I do not like > so much the alternative approach - filtering based on variance or IQR - as > it jeopardizes my preferred way of defining responders by applying a > threshold on the local false discovery rate. > > Could you extend a bit on how you exactly filter based on MAS5 calls, how > you define responders and non-responders in qPCR, how your "FDR disaster" > exactly looks like. > > What is your model system by the way, which arrays you use? > > best regards > T. > > > On Aug 13, 2010, at 7:11 PM, Lucia Peixoto wrote: > > > Dear All, > > I want to set up a non-specific filter to eliminate genes that are juts > not > > expressed from further statistical analysis. I've previously tried a > filter > > based on Mas5 presence/absence calls which turned out to be a disaster > for > > the FDR (as measured by lots of qPCRs), probably because 1/3 of the MM > > probes actually hybridize better than PM, who knows. > > > > In any case, my plan is to set up a filter based both on raw fluorescent > > intensity and IQR. I am trying to get as much sensitivity as possible > > without increasing my FDR too much. > > I was thinking that using the intensity distributions and box plots of > the > > raw data may be useful to determine what the best cutoffs to obtain the > best > > sensitivity will be. > > Any advise on how to select appropriate cutoffs? > > > > Thank you very much in advance > > Lucia > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > ---------------------------------------------------------------------- > Dr. Tobias Straub ++4989218075439 Adolf-Butenandt-Institute, München D > > [[alternative HTML version deleted]]

qPCR qPCR • 1.2k views

ADD COMMENT • link 14.6 years ago Lucia Peixoto ▴ 330

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 6 weeks ago

EMBL European Molecular Biology Laborat…

Hi Lucia, the diagnostic plots in Fig.1 in [1] might be useful for choosing filter criteria. We found that for Affymetrix GeneChips, overall variance (across all samples) is a decent correlate of "presence". Other people have also proposed more specialised criteria [2], which you could try. Hi Tobias, you said you were worried about "filtering based on variance or IQR - as it jeopardizes ... applying a threshold on the local false discovery rate." I am not sure I understand what you mean, but the effect (or, if properly applied, non-effect) of filtering on type-I error is also discussed in [1] in some detail. [1] Richard Bourgon et al. Independent filtering increases detection power for high-throughput experiments. PNAS, 107(21):9546-9551, 2010. [2] Talloen et al. I/NI-calls for the exclusion of non-informative genes: a highly effective filtering tool for microarray data. Bioinformatics, doi:10.1093/bioinformatics/btm478 Best wishes Wolfgang On 16/08/10 16:50, Lucia Peixoto wrote: > Thanks Tobias for your response > > I am processing data obtained with Affymetrix mouse chips (430_2, previous > version) > The first filterning was done based on presence/absence calls, so only genes > present in 2/17 samples were used. It is a 2 condition set up, with 8 and 9 > replicates for each condition. My definition of FDR in my previous question > was strictly limited to validation in 8+ independent qPCRs of 40+ randomly > selected genes obtained using a SAM cutoff of 5% FDR. So I am talking about > independently re-testing the reproducibility of gene expression, which is > the only way to really know your FDR. Using the Mas5 presence absence calls > filter leads to about 50% of the tested genes not being reproducible. > > If I remove the filtering and redo the analysis at 5% FDR, I get all the the > previous "false positives" to become true positives. Which was not a > surprise to me since about 1/3 of MM probes are known to hybridize better > than PM probes, so I don't know what Mas5 presence/absence really means, but > definitely cannot reflect accurately the presence of a transcript if the MM > probe hybridizes better. > > The problem is that I have a great loss of sensitivity (I have a lot of > positive controls so I know that), and I would like to increase that using a > filter that can come closer to really defining "present", because MM/PM does > not. > any ideas? > thanks > > Lucia > > > On Mon, Aug 16, 2010 at 8:34 AM, Tobias Straub > <tstraub at="" med.uni-muenchen.de="">wrote: > >> Hi Lucia >> >> I am not sure if I completely understand your problem, just want to mention >> that I routinely apply non-specific filtering based on MAS5 calls with a >> very good outcome (based on a prior-knowledge training set). I do not like >> so much the alternative approach - filtering based on variance or IQR - as >> it jeopardizes my preferred way of defining responders by applying a >> threshold on the local false discovery rate. >> >> Could you extend a bit on how you exactly filter based on MAS5 calls, how >> you define responders and non-responders in qPCR, how your "FDR disaster" >> exactly looks like. >> >> What is your model system by the way, which arrays you use? >> >> best regards >> T. >> >> >> On Aug 13, 2010, at 7:11 PM, Lucia Peixoto wrote: >> >>> Dear All, >>> I want to set up a non-specific filter to eliminate genes that are juts >> not >>> expressed from further statistical analysis. I've previously tried a >> filter >>> based on Mas5 presence/absence calls which turned out to be a disaster >> for >>> the FDR (as measured by lots of qPCRs), probably because 1/3 of the MM >>> probes actually hybridize better than PM, who knows. >>> >>> In any case, my plan is to set up a filter based both on raw fluorescent >>> intensity and IQR. I am trying to get as much sensitivity as possible >>> without increasing my FDR too much. >>> I was thinking that using the intensity distributions and box plots of >> the >>> raw data may be useful to determine what the best cutoffs to obtain the >> best >>> sensitivity will be. >>> Any advise on how to select appropriate cutoffs? >>> >>> Thank you very much in advance >>> Lucia >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> ---------------------------------------------------------------------- >> Dr. Tobias Straub ++4989218075439 Adolf-Butenandt-Institute, M?nchen D >> >> > > [[alternative HTML version deleted]] > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD COMMENT • link 14.6 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Lucia Peixoto ▴ 330

@lucia-peixoto-4203

Last seen 10.6 years ago

Thanks for the great advice I am planning to use the gene filter package and I will try variance or IQR first, not sure how to implement variance but will figure it out. will try to generate a plot testing several cutoffs as in the PNAS paper and then decide the better trade off. The PNAS paper was extremely helpful!!! Tobias, I usually do quality check using the affyQCreport package, and there is not batch effect (all the samples were taken at the same time and randomized appropriately in any case). Given that now there is literature confirming that a lot of the MM probes match better than PM probes, there is no biological basis to use PM/MM relationships as a proxy for presence/absence, MM intensity from PM signal can be a major source of error in the analysis as I confirmed using lots of independent qPCRs. (see for example:Characterization of mismatch and high-signal intensity probes associated with Affymetrix genechips Wang et al 2007) So Mas5 presence/absence calls is definitely out of the picture for me. cheers Lucia On Tue, Aug 17, 2010 at 3:48 AM, Tobias Straub <tstraub@med.uni-muenchen.de>wrote: > Hi Lucia! > > did you already have a look at the nsFilter function in the genefilter > package? > > nsFilter(eset, require.entrez = F, require.symbol = F, > require.GOBP = FALSE, require.GOCC = FALSE, require.GOMF = FALSE, > remove.dupEntrez = TRUE, var.func = IQR, var.cutoff = 0.5, var.filter = > TRUE) > > would simply remove half of the genes based on low IQR across the arrays. > assuming that a) non-expressed genes have a low IQR and b) that frequently > not more than 50% of genes are expressed in any given tissue. > > this is far from a "present" filter but I have no other suggestion in case > you don't trust MAS calls. out of curiosity: did you quality check your > arrays, could there be batch effects, what's your pre-processing? > > best regards > T. > > On Aug 16, 2010, at 4:50 PM, Lucia Peixoto wrote: > > > Thanks Tobias for your response > > > > I am processing data obtained with Affymetrix mouse chips (430_2, > previous > > version) > > The first filterning was done based on presence/absence calls, so only > genes > > present in 2/17 samples were used. It is a 2 condition set up, with 8 and > 9 > > replicates for each condition. My definition of FDR in my previous > question > > was strictly limited to validation in 8+ independent qPCRs of 40+ > randomly > > selected genes obtained using a SAM cutoff of 5% FDR. So I am talking > about > > independently re-testing the reproducibility of gene expression, which is > > the only way to really know your FDR. Using the Mas5 presence absence > calls > > filter leads to about 50% of the tested genes not being reproducible. > > > > If I remove the filtering and redo the analysis at 5% FDR, I get all the > the > > previous "false positives" to become true positives. Which was not a > > surprise to me since about 1/3 of MM probes are known to hybridize better > > than PM probes, so I don't know what Mas5 presence/absence really means, > but > > definitely cannot reflect accurately the presence of a transcript if the > MM > > probe hybridizes better. > > > > The problem is that I have a great loss of sensitivity (I have a lot of > > positive controls so I know that), and I would like to increase that > using a > > filter that can come closer to really defining "present", because MM/PM > does > > not. > > any ideas? > > thanks > > > > Lucia > > > > > > On Mon, Aug 16, 2010 at 8:34 AM, Tobias Straub > > <tstraub@med.uni-muenchen.de>wrote: > > > >> Hi Lucia > >> > >> I am not sure if I completely understand your problem, just want to > mention > >> that I routinely apply non-specific filtering based on MAS5 calls with a > >> very good outcome (based on a prior-knowledge training set). I do not > like > >> so much the alternative approach - filtering based on variance or IQR - > as > >> it jeopardizes my preferred way of defining responders by applying a > >> threshold on the local false discovery rate. > >> > >> Could you extend a bit on how you exactly filter based on MAS5 calls, > how > >> you define responders and non-responders in qPCR, how your "FDR > disaster" > >> exactly looks like. > >> > >> What is your model system by the way, which arrays you use? > >> > >> best regards > >> T. > >> > >> > >> On Aug 13, 2010, at 7:11 PM, Lucia Peixoto wrote: > >> > >>> Dear All, > >>> I want to set up a non-specific filter to eliminate genes that are juts > >> not > >>> expressed from further statistical analysis. I've previously tried a > >> filter > >>> based on Mas5 presence/absence calls which turned out to be a disaster > >> for > >>> the FDR (as measured by lots of qPCRs), probably because 1/3 of the MM > >>> probes actually hybridize better than PM, who knows. > >>> > >>> In any case, my plan is to set up a filter based both on raw > fluorescent > >>> intensity and IQR. I am trying to get as much sensitivity as possible > >>> without increasing my FDR too much. > >>> I was thinking that using the intensity distributions and box plots of > >> the > >>> raw data may be useful to determine what the best cutoffs to obtain the > >> best > >>> sensitivity will be. > >>> Any advise on how to select appropriate cutoffs? > >>> > >>> Thank you very much in advance > >>> Lucia > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> _______________________________________________ > >>> Bioconductor mailing list > >>> Bioconductor@stat.math.ethz.ch > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > >> ---------------------------------------------------------------------- > >> Dr. Tobias Straub ++4989218075439 Adolf-Butenandt-Institute, München D > >> > >> > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > ---------------------------------------------------------------------- > Dr. Tobias Straub ++4989218075439 Adolf-Butenandt-Institute, München D > > [[alternative HTML version deleted]]

ADD COMMENT • link 14.6 years ago Lucia Peixoto ▴ 330

Login before adding your answer.