neqc normalization results in bimodal histogram of expression values
1
0
Entering edit mode
@alexvpickering
Last seen 2.6 years ago
Canada

I am using neqc to normalize (bg correct, quantile normalize, and log2 transform) Illumina microarray data downloaded from GEO but am getting results that I am suspicious of. I do not have access to the negative control probe files but do have access to Detection P values (GSE39313 and GSE49000).  After editing the raw non-normalized expression txt files so that "AVG_Signal" and "Detection" are included in the column names along with the sample names, I run the following:

 data <- read.ilmn(data_paths, probeid="ID_REF")
 data <- neqc(data)$E

If I do a histogram of the now transformed values I get a bimodal distribution (a primary peak at log2 of ~ 6 and a secondary peak at log2 of ~14). I get a similar behaviour for both GSE's so I suspect it's not an oddity of the data. Additionally, after performing differential expression analysis I get a (suspiciously?) large number of differentially expressed genes with adjusted p-values > 0.05 and log-fold-change > 1 (~3000). Also, the top log-fold change values are also very high (10-14). If I import the data and just log2 transform (not using neqc), the data is approximately normally distributed (not bimodal). I suspect that neqc has done something strange and I'm hoping to figure out why so that I can still use it.

Any suggestions or thoughts?

neqc limma microarray illumina • 2.4k views
ADD COMMENT
0
Entering edit mode

I think I figured this one out (bi-modality is perhaps to be expected - see here). It seems to be quite a common observation with RMA normalized expression data (neqc is very similar to RMA).

ADD REPLY
0
Entering edit mode

No, bimodality isn't particularly expected nor is it very common.

ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 7 hours ago
WEHI, Melbourne, Australia

Dear Alex,

Actually neqc() doesn't produce a bimodal distribution for these datasets. Nor is there any need to edit the data file to produce your own column names. Just download the non-normalized data file from GEO without any modification:

library(limma)
x <- read.ilmn("GSE39313_non-normalized.txt", probeid="ID_REF")
y <- neqc(x)
plotDensities(y[,1])

The density plot is beautifully unimodal.

It is important not to edit raw data files IMO. In this case I am guessing that you may have changed the data file so that it is no longer read correctly.

ADD COMMENT

Login before adding your answer.

Traffic: 747 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6