Question

edgeR: PValue, adjusted PValue, FDR

0

Entering edit mode

zhxiaokang ▴ 20

@zhxiaokang-13311

Last seen 2.6 years ago

Norway

I'm using edgeR to do differentially expressed genes analysis. Here's part of my code:

fit <- glmFit(y, design)

# conduct likelihood ratio tests for tumour vs normal tissue differences and show the top genes
lrt <- glmLRT(fit)

# the DEA result for all the genes
dea <- lrt$table

# differentially expressed genes
toptag <- topTags(lrt, n = length(geneList), p.value = 0.05)
deg <- toptag$table

I got a 'PValue' in 'dea', then I'm wondering whether it's a p-value or adjusted p-value. Then I gave 'lrt' to 'topTags' to extract the differentially expressed genes, and set the cutoff of p.value to 0.05, then I'm wondering whether this cutoff is set for the 'PValue' in 'dea'. But then I got a 'FDR' in 'deg' (from 'toptag'), and I found that all the genes are with a FDR < 0.05, but not all the genes in 'dea' with a PValue < 0.05 are listed in 'deg'.

That's what I found from the result. So what I'm thinking now is: the PValue in dea is just a p-value, not adjusted. The function topTags will adjust those p-value with method such as 'BH', and after that, it will provide you with the differentially expressed genes with FDR smaller than the threshold that you set (but somehow it's 'p.value' here, instead of 'FDR' ~~~). So the FDR here is the same as adjusted p-value here. Is my understanding right?

edger • 8.2k views

ADD COMMENT • link updated 7.5 years ago by James W. MacDonald 68k • written 7.5 years ago by zhxiaokang ▴ 20

score 2 · Answer 1 · 2017-10-25

You don't need to try to infer what the output from a function is. You can simply read the help page. For ?glmLRT, under the 'Value' section, which lists the output, I get

  PValue: p-values.

And under the Value section for "topTags", I get

Value:

     an object of class 'TopTags' containing the following elements for
     the top 'n' most differentially expressed tags as determined by
     'sort.by':

   table: a data frame containing the elements 'logFC', the
          log-abundance ratio, i.e. fold change, for each tag in the
          two groups being compared, 'logCPM', the log-average
          concentration/abundance for each tag in the two groups being
          compared, 'PValue', exact p-value for differential expression
          using the NB model. When 'adjust.method' is not '"none"',
          there is an extra column of 'FDR' showing the adjusted
          p-value if 'adjust.method' is one of the '"BH"', '"BY"' and
          '"fdr"', or an extra column of 'FWER' if 'adjust.method' is
          one of the '"holm"', '"hochberg"', '"hommel"', and
          '"bonferroni"'.

Which is, I believe, pretty explanatory. You might have an argument that some or all of that is not actually explanatory, in which case you could present your argument and say why you think it isn't clear.