Question

The eBayes function gives two p-values for each row

0

Entering edit mode

Caamp • 0

@d72c721e

Last seen 3.7 years ago

Spain

I am trying to write an analysis of the differential expression of genes for the experiment geod19804. After normalizing the data with RMA and building the ExpressionSet, I have all the samples grouped in two conditions for $Lung.tissue: lung_cancer and paired_normal_adjacent (cancer vs. normal). The t-test with equal variance

cancer = pData(geod19804)[,"Lung.tissue"]
tt = genefilter::rowttests(geod19804, cancer)

already gives me some very wrong results (with alpha=0.05, +27000 significant genes and 6411 after adjusting with Bonferroni).

The bigger problem comes from trying to do a moderated t-test.

cancer = pData(geod19804)[,"Lung.tissue"]

#Model matrix
design = model.matrix(~0+cancer)
colnames(design) = c("lung_cancer","paired_normal_adjacent")

#Adjust
lmadjus = lmFit(geod19804,design)
contrast.matrix = makeContrasts(dif = "lung_cancer -  paired_normal_adjacent",levels = design)

#Estimate
fitbayes = contrasts.fit(lmadjus,contrast.matrix)
fitbayes = eBayes(lmadjus)

With this, I get 81684 significant genes! (and 60681 with Bonferroni and 80126 with BH). Now, it may seem like either the normalization of the data or the creation of the ExpressionSet were wrong somewhere, but

head(fitbayes$p.value)
            lung_cancer paired_normal_adjacent
1007_s_at 1.311383e-142          5.935409e-139
1053_at   6.818996e-130          2.330951e-127
117_at    1.109553e-108          7.159564e-110

shows two columns for p-values, where there should only be one, which (I guess) means that the test is not actually contrasting between cancer and normal, but I have no idea why, and I have seen code similar to mine working just fine.

hgu133plus2.db limma DifferentialExpression affy • 1.3k views

ADD COMMENT • link updated 3.8 years ago by Gordon Smyth 52k • written 3.8 years ago by Caamp • 0

score 1 · Accepted Answer · 2021-05-15

1

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 13 hours ago

WEHI, Melbourne, Australia

You've made a programming error. You presumably intended:

fitbayes <- contrasts.fit(lmadjus,contrast.matrix)
fitbayes <- eBayes(fitbayes)

As it is your code is discarding the output from contrasts.fit.

ADD COMMENT • link 3.8 years ago Gordon Smyth 52k

0

Entering edit mode

Oh my god, I'm so sorry. Thank you for pointing that out. I still get way too many significant genes (27054 w/ no corrections and 6041 with Bonferroni), and I don't know why, but at least they make more sense than before.

ADD REPLY • link 3.8 years ago Caamp • 0

0

Entering edit mode

Cancer vs normal is always going to give a huge number of DE genes. That's just biology and the fact that adjacent normal is never truly the same cell type as the tumor.

However your data processing sounds strange. There are only about 25,000 genes in the human genome and the Affy arrays you are using contain only about 57,000 probe-sets, so it is hard to see how you could get nearly 82,000 significant genes. A typical analysis would usually include no more than about 15,000 probe-sets after filtering non-expressed probe-sets.

ADD REPLY • link 3.8 years ago Gordon Smyth 52k