Hello Everyone,
I'm running LIMMA on a study based on HGU133plus2 chips I dug out of GEO. I normalized using RMA, GCRMA and MAS5 as usual, and ran AffyQC to look for outliers, of which I found none. There are 20 normals and 61 patients, and they cluster well in PCA. I decided to move forward with the GCRMA normalized set. I removed the 68 Affy control probes, and discarded the 13,875 / 54,613 probes (25.4% of all probes) which had an average log2 intensity < 2.27. Intensities for all arrays range from 2.268579 to 16.02096, so my belief is that these probes are uninformative. I finally calling LIMMA using the following:
eset_filtered$cohort <- factor(eset_filtered$cohort) design <- model.matrix(~0 + eset_filtered$cohort) colnames(design) <- levels(eset_filtered$cohort) fit <- lmFit(eset_filtered, design) contrast.matrix <- makeContrasts(SLE-CTL,levels = design) fit2 <- contrasts.fit(fit, contrast.matrix) fit2 <- eBayes(fit2) topTable.filtered <- topTable(fit2, coef=1, adjust="fdr", n=nrow(eset_filtered) ) rm( contrast.matrix, design, fit, fit2) index <- which(topTable.filtered$adj.P.Val <= 0.05) # 11,763 nrow(eset_filtered) # 32,054
As you can see, I'm getting 11,763 (out of a total of 32,054) probes at an FDR of 0.05. The topmost gene has an adjusted pval of 5.360915e-19 (the last gene is 0.9999355). What am I missing here? Why am I getting so many significant results? Everything normalized well, the MA plots look good, there weren't any wild outliers, the cohorts are labeled properly because they cluster together in PCA. I'm really wracking myself on this one as I've had (I believe) success with LIMMA in the past. Any help our community can provide is PROFOUNDLY appreciated.
Best wishes,
Robert D. Robl
Thanks for the idea! I found the write up on TREAT (t-tests relative to a threshold) at http://bioinformatics.oxfordjournals.org/content/25/6/765.long. I'll apply the approach tomorrow and report back.
I incorporated the following which includes the results. That aside, is my original script free of errors? Thanks!
topTreat <- topTreat(fit2, coef=1, lfc=1, adjust="BH", n=nrow(eset_filtered)) # 1,301 probes
topTreat <- topTreat(fit2, coef=1, lfc=1.25, adjust="BH", n=nrow(eset_filtered)) # 680 probes
topTreat <- topTreat(fit2, coef=1, lfc=1.5, adjust="BH", n=nrow(eset_filtered)) # 375 probes
topTreat <- topTreat(fit2, coef=1, lfc=2, adjust="BH", n=nrow(eset_filtered)) # 149 probes
I would suggest a less extreme value of
lfc
. Remember, TREAT tests for significant differences fromlfc
, so a gene with a log-fold change of 1 would probably not be called as significant if you were also to setlfc=1
. I often use a value oflog2(1.5)
, or even 0.5, i.e., need a fold-change greater than square-root 2. Other than that, the code looks okay.