The eBayes function gives two p-values for each row
1
0
Entering edit mode
Caamp • 0
@d72c721e
Last seen 3.5 years ago
Spain

I am trying to write an analysis of the differential expression of genes for the experiment geod19804. After normalizing the data with RMA and building the ExpressionSet, I have all the samples grouped in two conditions for $Lung.tissue: lung_cancer and paired_normal_adjacent (cancer vs. normal). The t-test with equal variance

cancer = pData(geod19804)[,"Lung.tissue"]
tt = genefilter::rowttests(geod19804, cancer)

already gives me some very wrong results (with alpha=0.05, +27000 significant genes and 6411 after adjusting with Bonferroni).

The bigger problem comes from trying to do a moderated t-test.

cancer = pData(geod19804)[,"Lung.tissue"]

#Model matrix
design = model.matrix(~0+cancer)
colnames(design) = c("lung_cancer","paired_normal_adjacent")

#Adjust
lmadjus = lmFit(geod19804,design)
contrast.matrix = makeContrasts(dif = "lung_cancer -  paired_normal_adjacent",levels = design)

#Estimate
fitbayes = contrasts.fit(lmadjus,contrast.matrix)
fitbayes = eBayes(lmadjus)

With this, I get 81684 significant genes! (and 60681 with Bonferroni and 80126 with BH). Now, it may seem like either the normalization of the data or the creation of the ExpressionSet were wrong somewhere, but

head(fitbayes$p.value)
            lung_cancer paired_normal_adjacent
1007_s_at 1.311383e-142          5.935409e-139
1053_at   6.818996e-130          2.330951e-127
117_at    1.109553e-108          7.159564e-110

shows two columns for p-values, where there should only be one, which (I guess) means that the test is not actually contrasting between cancer and normal, but I have no idea why, and I have seen code similar to mine working just fine.

hgu133plus2.db limma DifferentialExpression affy • 1.2k views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 8 minutes ago
WEHI, Melbourne, Australia

You've made a programming error. You presumably intended:

fitbayes <- contrasts.fit(lmadjus,contrast.matrix)
fitbayes <- eBayes(fitbayes)

As it is your code is discarding the output from contrasts.fit.

ADD COMMENT
0
Entering edit mode

Oh my god, I'm so sorry. Thank you for pointing that out. I still get way too many significant genes (27054 w/ no corrections and 6041 with Bonferroni), and I don't know why, but at least they make more sense than before.

ADD REPLY
0
Entering edit mode

Cancer vs normal is always going to give a huge number of DE genes. That's just biology and the fact that adjacent normal is never truly the same cell type as the tumor.

However your data processing sounds strange. There are only about 25,000 genes in the human genome and the Affy arrays you are using contain only about 57,000 probe-sets, so it is hard to see how you could get nearly 82,000 significant genes. A typical analysis would usually include no more than about 15,000 probe-sets after filtering non-expressed probe-sets.

ADD REPLY

Login before adding your answer.

Traffic: 685 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6