Dear Lev,
One of the purposes of a mailing list is that many people may reply,
so please do not address your question specifically to me.
As explained in Section 10.1 of the limma User's Guide, the
B-statistic requires a prior estimate of the proportion of DE genes.
By default this is set to 1%.
Therefore, the B-statistic will tend to underestimate significance if
the true proportion of DE genes is actually more than 1% and
overestimate if the true proportion is less than 1%. In your case,
the proportion of DE genes appears to be massively more than 1%,
hence you'd expect the B-statistic to underestimate significance.
That is, you'd expect the B-statistics to be too small.
Since the moderated t does not require a prior estimate, you'd expect
the p-values to suggest more DE than the B-statistics whenever the
proportion of DE genes in your data is large.
Best wishes
Gordon
At 12:27 AM 14/07/2007, Lev Soinov wrote:
>Dear Gordon,
>
>We are analysing a dataset of 14920 genes obtained with the AB1700
platform.
>It has three treatments L1, L2, L1+L2 and control. The data is in
>the form of expression data matrix with the first column as pobe ID
>and 14 other columns correspond to 4 above conditions. Using the
>code below, we obtain a huge number of genes with adjusted p values
><0.05, about 5000 for the comparison between L1 and control for
>example. At the same time B values corresponding to these probes are
>very small, i.e. we are getting B<-4 in the bottom of the list of
>probes with adj.p<0.05.
>Could you please comment on possible causes for this? Is it normal?
>With kind regards,
>Lev.
>
> >s<-scan('Data.txt',what='character')
>Read 223800 items
> > sm<-matrix(s,byrow=TRUE,ncol=15)
> > dim(sm)
>[1] 14920 15
> > rownames(sm)<-sm[,1]
> > sm<-sm[,2:ncol(sm)]
> > snn<-apply(sm,2,as.numeric)
> > rownames(snn)<-rownames(sm)
> > signals<-snn
> > dim(signals)
>[1] 14920 14
> > temp<-normalizeBetweenArrays(log2(signals), method='quantile')
> > design <- model.matrix(~0 +factor(c(1,1,1,1,2,2,2,3,3,3,3,4,4,4)))
> > colnames(design) <- c("Control","L1","L2","L1L2")
> > contrast.matrix <-
> makeContrasts(L1-Control,L2-Control,L1L2-Control,levels=design)
> > fit <- lmFit(temp, design)
> > fit2 <- contrasts.fit(fit, contrast.matrix)
> > fit2 <- eBayes(fit2)
> > topTable(fit2, coef=1, adjust='BH')
> ID logFC t P.Value adj.P.Val B
>8790 182417 5.813459 38.16912 1.072876e-15 1.479057e-11 25.47446
>6945 165482 8.573261 35.59768 2.856130e-15 1.479057e-11 24.69238
>7132 167208 6.247484 35.49523 2.973975e-15 1.479057e-11 24.65950
>10941 202780 4.881978 33.98673 5.467499e-15 2.039377e-11 24.15906
>1102 109858 5.076380 33.01348 8.214210e-15 2.451120e-11 23.81910
>3785 135458 3.686867 32.09869 1.217373e-14 3.027202e-11 23.48654
>12355 215284 5.035617 30.68240 2.288454e-14 4.877676e-11 22.94512
>6515 161539 5.885744 29.64292 3.704033e-14 6.908021e-11 22.52582
>9789 191745 8.189347 28.65188 5.953817e-14 9.870106e-11 22.10752
>8568 180293 4.749725 27.88955 8.671664e-14 1.293812e-10 21.77270
>
>
>The bottom of the adj.p<0.05 list:
> ID logFC t P.Value
> adj.P.Val B
>207673 -0.293302 -2.70223 0.01703 0.042251 -4.2848
>213498 -0.675519 -2.70219 0.017033 0.042251 -4.2849
>186148 -0.419934 -2.70201 0.017039 0.042258 -4.2853
>233859 -0.422533 -2.70185 0.017044 0.042263 -4.2856
>188263 -0.330067 -2.70179 0.017046 0.042263 -4.2857