Question

question on the cutoff for limma package

0

Entering edit mode

Yi, Ming NIH/NCI [C] ▴ 100

@yi-ming-nihnci-c-4571

Last seen 9.8 years ago

United States

Dear List: Recently, I used limma package to analyze some miRNA array data. One of the differential lists I derived for one of the contrasts in our limma model just used P.Value <0.01 as cutoff combined FC cutoff, we noticed that in this particular contrast, all the differential miRNAs have rather high adj.P.Val almost all miRNAs are 1 or very close to 1 (e.g., 0.973 etc) (I used adj="fdr" in topTable...) although the other contrasts in the same model we set up in limma does have "normal" looking adj.P.Val ranged from 1 to about 0.01. >From our previous experience, sometimes, even with very high adj.P.Val, with decent P.Value (e.g., <0.01), we can have good validation. In this case, now we validated two miRNAs from the list both with good P.Value <0.01 but with rather high adj.P.Val (both are around 0.97 or 1). We did validate one of them as good miRNA but the other one is bad (we can not validate it as differential). I understood it is more subjective aspect and we only validated two of chosen miRNAs in this case (and we encountered similar situation before for validation of other dataset), and many people used FDR or adjusted p-value varied from 5% to 30% commonly. my first question is: what kind of situation could lead to adj.P.Val for all of genes in the list as high as 0.97 to 1 (there are about 6k features in the dataset? What shall be the cutoff for P.Value and adj.P.Val in the situation like this? Considering both or more specifically on adj.P.Val? In our case, if rely on adj.P.Val only for cutoff, which are all so high, we do not have any single miRNA that we can choose, however, our biological validation experiment indeed validate a good one (although we only validated just two of them, still kind of much higher than expected considering the fact that none of them has decent adj.P.Val but rather bad ones). If rely on P.Value (e.g. <0.01), we do have quite a few mRNAs in the list, but each one with sky high adj.P.Val! and we only can validate 1 of the 2 chosen candidates as good one. Any insight or experience to share with? Thanks a lot! Ming ABCC NCI-Frederick, Frederick, MD

miRNA limma miRNA limma • 1.7k views

ADD COMMENT • link updated 13.9 years ago by Sean Davis 21k • written 13.9 years ago by Yi, Ming NIH/NCI [C] ▴ 100

score 0 · Answer 1 · 2011-05-24

On Mon, May 23, 2011 at 8:48 PM, Yi, Ming (NIH/NCI) [C] <yiming at="" mail.nih.gov=""> wrote: > > Dear List: > > Recently, I used limma package to analyze some miRNA array data. One of the differential lists I derived for one of the contrasts in our limma model just used P.Value <0.01 as cutoff combined FC cutoff, we noticed that in this particular contrast, all the differential miRNAs have rather high adj.P.Val almost all miRNAs are 1 or very close to 1 (e.g., 0.973 etc) ?(I used adj="fdr" in topTable...) although the other contrasts in the same model we set up in limma does have "normal" looking adj.P.Val ranged from 1 to about 0.01. > > >From our previous experience, sometimes, even with very high adj.P.Val, with decent P.Value (e.g., <0.01), we can have good validation. In this case, now we validated two miRNAs from the list both with good P.Value <0.01 but with rather high adj.P.Val (both are around 0.97 or 1). We did validate one of them as good miRNA but the other one is bad (we can not validate it as differential). > > I understood it is more subjective aspect and we only validated two of chosen miRNAs in this case (and we encountered similar situation before for validation of other dataset), and many people used FDR or adjusted p-value varied from 5% to 30% commonly. > my first question is: what kind of situation could lead to adj.P.Val for all of genes in the list as high as 0.97 to 1 (there are about 6k features in the dataset? > > What shall be the cutoff for P.Value and adj.P.Val in the situation like this? Considering both or more specifically on adj.P.Val? In our case, if rely on adj.P.Val only for cutoff, which are all so high, we do not have any single miRNA that we can choose, however, our biological validation experiment indeed validate a good one (although we only validated just two of them, still kind of much higher than expected considering the fact that none of them has decent adj.P.Val but rather bad ones). If rely on P.Value (e.g. <0.01), we do have quite a few mRNAs in the list, but each one with sky high adj.P.Val! and we only can validate 1 of the 2 chosen candidates as good one. > > Any insight or experience to share with? Hi, Ming. The problem with using raw p-values is that there is no control for multiple testing. There are many methods to control for multiple testing, of which one is the 'FDR'. So, I would tend to rely on a statistical measure that attempts to control for multiple testing (such as the FDR); the raw p-values from limma do not do so. Whether or not you include a further fold change filter will be a matter of experimental specifics. That is not to say that one cannot do what you have done and "rank" genes, even those not statistically significant, by some measures, but one cannot easily conclude that there is evidence of differential expression without a multiple-testing-corrected statistical measure being significant. As for your situation, there are multiple reasons that might lead to lack of evidence of differential expression. First, there may truly be no difference for a contrast. Second, technical artifacts or noise may make such a difference difficult or impossible to detect. Third (and related to the second), the sample size may be too small to detect a difference. Remember that not rejecting the null hypothesis (of no differential expression) is not the same thing as proving the null hypothesis; we cannot prove the null hypothesis, typically. Some of the more statistically-minded might have clearer explanations for some of what I said above, but I think the rule-of-thumb is to rely on multiple-testing-corrected p-values and not on uncorrected p-values for determining statistical significance. Sean