Question

Sub : Limma ebayes

0

Entering edit mode

ssrajan86 • 0

@ssrajan86-8563

Last seen 9.7 years ago

Italy

Dear All,

can some please suggest me about the following problem and solution to this.I am not so familiar to gene expression studies,

Q. I have an gene expression data set containing more than 16,000 probes (whole data) after normalization and filtering.

A subset of these probes (4,000 Probes) were independently considered for E Bayes test. I performed DE analysis to both set containing whole data (16, 000 Probes) and subset data(4,000 Probes).

When I compared both the dataset after DE analysis based on adj.p.value , Some Probes which are differentially expressed in the subset data were not found in the whole data. I would expect the same probes which are DE in the subset should be there in Whole data. It would be great if i get to know whether my assumption is wrong ?

limma ebayes statistical inference • 2.0k views

ADD COMMENT • link updated 9.7 years ago by Gordon Smyth 52k • written 9.7 years ago by ssrajan86 • 0

1

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 4 hours ago

The city by the bay

Without seeing the code, it's hard to tell for sure, but the most probable cause is that you've got fewer probes in the subset data. This means that the severity of the p-value adjustment for multiple testing (the BH method, in this case) is reduced in the subset. As a result, you can end up with probes in the subset that have adjusted p-values lower than the corresponding values from the full set. This would result in cases where probes are significant in the subset analysis but are not significant in the analysis with the full data.

In addition, if you ran eBayes separately on the full and subsetted data, there will be changes in the statistics due to the information being shared in empirical Bayes shrinkage, e.g., prior variance and degrees of freedom. The size of these changes will depend on how you selected the subset. For example, if the subset was formed by selecting high-abundance probes, you'd end up with more low-variance probes (assuming a decreasing mean-variance relationship) such that the estimated prior variance decreases. This would result in lower (unadjusted) p-values in the subset, as all probes are shrunk towards a smaller prior.

ADD COMMENT • link 9.7 years ago Aaron Lun ★ 28k

0

Entering edit mode

ssrajan86 • 0

@ssrajan86-8563

Last seen 9.7 years ago

Italy

Dear Aaron Lun,

Thanks for your suggestion , I agree that influence of variance and degrees of freedom in subset data.Probes which have passed P value threshold (corrected ) in the subset data, have not passed in the full data.Where as in the whole data i found few of these DE probes with adj.P.val "0.05" in other words, they were present but they are borderline.

m= as.matrix(cbind(x1[,2:4],x1[,5:7]))
rownames(m) = x1[,1]
sam_group<-read.csv("samplebfile.csv")
clas<-sam_group$Batch
design <- model.matrix(~ -1+factor(clas))
colnames(design) <- c("group1","group2")
fit = lmFit(m, design, offset=0)
dim(fit)
contrast.matrix <- makeContrasts(group1-group2,levels=design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit3 <-eBayes(fit2)
fit4<-topTable(fit3, coef=1, adjust="fdr",sort.by="P", number=4000)

ADD COMMENT • link 9.7 years ago ssrajan86 • 0

score 2 · Accepted Answer · 2015-08-10

The intention of the eBayes() function is that you will run it on all the genes, after normalization and filtering. The idea is to utilize information from the whole ensemble of genes. It is not usually correct to rerun eBayes on subsets of genes, and the results will obviously change if you do.

Similarly, you need to apply multiple testing adjustment to all the genes that you are considering in your analysis. For this reason, it is not usually correct to run topTable() on a subset of genes unless there was some a priori reason for focusing on that subset of genes.