Question

Interpretation of p-value plot

0

Entering edit mode

karlien.springael • 0

@karlienspringael-15749

Last seen 6.9 years ago

Hello,

I have to analyse a microarray for differential expressed genes. The used microarray has the following accession number: GSE36980. I just picked 4 control arrays and 4 alzheimer arrays for the analysis. I made the next code:

library('affy')

library('limma')

data.raw <- ReadAffy("GSM907796.CEL.gz","GSM907808.CEL.gz","GSM907797.CEL.gz","GSM907809.CEL.gz","GSM907798.CEL.gz","GSM907810.CEL.gz","GSM907799.CEL.gz","GSM907811.CEL.gz") #Frontal

#Create dataframe
pd <- data.frame(experiment = c('Alzheimer','Control', 'Alzheimer','Control', 'Alzheimer', 'Control','Alzheimer','Control'))
sampleNames(data.raw) <- c('Alzheimer','Control', 'Alzheimer','Control', 'Alzheimer', 'Control','Alzheimer','Control')
metaData <- data.frame(labelDescription = c('experiment'))
phenoData(data.raw) <- new('AnnotatedDataFrame', data = pd, varMetadata = metaData)
protocolData(data.raw)<- new('AnnotatedDataFrame', data = pd, varMetadata = metaData)

#preprocessing

data.bg.norm <- affy::rma(data.raw)

expression <- data.bg.norm@assayData$exprs

###Differential expression###
#make model
X <- model.matrix(~0+factor(data.bg.norm$experiment))
colnames(X) <- c("Alzheimer","Control")

#Fit model -> Fit linear model for each gene given a series of arrays
LmFit <- lmFit(expression,X)
mc <- makeContrasts("Alzheimer-Control", levels = X)
c.fit <- contrasts.fit(LmFit, mc)
eb <- eBayes(c.fit)

# extract the p values of the F tests
modFpvalue <- eb$F.p.value

#make a plot of the non-adjusted and adjusted pvalues

hist(p.adjust(modFpvalue,method='BH'),probability='TRUE')
hist(modFpvalue)

#Which genes are significant
results <- decideTests(eb, method = "nestedF",adjust.method= 'BH',p.value =0.05)

neut_indices_array <- which(results == 0)
neg_indices_array <- which(results == -1)
pos_indices_array <- which(results == 1)

When I run this code I get a histogram of modFpvalues which seems to show differential expressed genes, because the distribution of the p-values isn't uniform. But the output of the last step shows that not a single gene is differential expressed (after adjustment). I don't understand this, I thought that if the pvalue distribution isn't uniform, you'll have differential expressed genes anyway but apparently not.

p values differential expression limma • 1.2k views

ADD COMMENT • link updated 6.9 years ago by Gordon Smyth 52k • written 6.9 years ago by karlien.springael • 0

score 0 · Answer 1 · 2018-05-06

No, your assumption is not entirely correct. While a p-value histogram that is enriched for small p-values will indeed tend to yield significantly DE genes, it will not necessarily do so at the reasonably stringent FDR cutoff that you have selected. The enrichment needs to be of a certain magnitude before you can achieve FDR < 0.05.

If you analysed the whole GEO dataset, instead of just picking a subset of n=4 AD arrays and n=4 controls at random, you might get more interesting results. n=4 would very rarely be anywhere near a large enough sample size for studying a human disease like Alzheimer's. I would expect to need dozens of cases and dozens of controls at least, and even then very careful batch correct (RUV perhaps) would probably be needed.

I wonder why you are using method="nestedF". Since you have only two groups here, nestedF will just give the same result as would a moderated t-test. If you don't understand well what nestedF does, it would be better to stick to the limma defaults.