Hello,
I have to analyse a microarray for differential expressed genes. The used microarray has the following accession number: GSE36980. I just picked 4 control arrays and 4 alzheimer arrays for the analysis. I made the next code:
library('affy')
library('limma')
data.raw <- ReadAffy("GSM907796.CEL.gz","GSM907808.CEL.gz","GSM907797.CEL.gz","GSM907809.CEL.gz","GSM907798.CEL.gz","GSM907810.CEL.gz","GSM907799.CEL.gz","GSM907811.CEL.gz") #Frontal
#Create dataframe
pd <- data.frame(experiment = c('Alzheimer','Control', 'Alzheimer','Control', 'Alzheimer', 'Control','Alzheimer','Control'))
sampleNames(data.raw) <- c('Alzheimer','Control', 'Alzheimer','Control', 'Alzheimer', 'Control','Alzheimer','Control')
metaData <- data.frame(labelDescription = c('experiment'))
phenoData(data.raw) <- new('AnnotatedDataFrame', data = pd, varMetadata = metaData)
protocolData(data.raw)<- new('AnnotatedDataFrame', data = pd, varMetadata = metaData)
#preprocessing
data.bg.norm <- affy::rma(data.raw)
expression <- data.bg.norm@assayData$exprs
###Differential expression###
#make model
X <- model.matrix(~0+factor(data.bg.norm$experiment))
colnames(X) <- c("Alzheimer","Control")
#Fit model -> Fit linear model for each gene given a series of arrays
LmFit <- lmFit(expression,X)
mc <- makeContrasts("Alzheimer-Control", levels = X)
c.fit <- contrasts.fit(LmFit, mc)
eb <- eBayes(c.fit)
# extract the p values of the F tests
modFpvalue <- eb$F.p.value
#make a plot of the non-adjusted and adjusted pvalues
hist(p.adjust(modFpvalue,method='BH'),probability='TRUE')
hist(modFpvalue)
#Which genes are significant
results <- decideTests(eb, method = "nestedF",adjust.method= 'BH',p.value =0.05)
neut_indices_array <- which(results == 0)
neg_indices_array <- which(results == -1)
pos_indices_array <- which(results == 1)
When I run this code I get a histogram of modFpvalues which seems to show differential expressed genes, because the distribution of the p-values isn't uniform. But the output of the last step shows that not a single gene is differential expressed (after adjustment). I don't understand this, I thought that if the pvalue distribution isn't uniform, you'll have differential expressed genes anyway but apparently not.