As I have already posted in other questions I have asked recently, I'm working with single-channel array data on small ncRNAs detected with a custom array analyzed through GenePix. I have several gpr files that I've managed to process with the limma package: I've acquired the data, performed background correction, and now I need to normalize before I proceed with differential expression analysis. Ergo my question: is it possible to take into account the control probes and perform a "within-array" normalization before the normalizeBetweenArrays() function?
Each array possesses positive and negative controls. These last ones are basically randomers that should not be expressed. So what I guess I'm trying to understand is if it is possible/advisable to perform a sort of "technical" normalization on each array on the base of these negative controls, before performing between-array normalization.
Forgive the banality of my question (if that's the case), I am aware that within-array normalization for single channel data is not applied, but I'm a newbie at this and I feel the custom array I'm dealing with puts me in a particular context which I'm not sure how to handle best. I would like to be able to take into account (normalize on) the negative controls (randomers) before normalizing between arrays. Is this possible? If so, how advisable is it? When would these controls be taken into account, if ever, along the limma pipeline? Am I missing something?
Thanks so much for any help/suggestion/elucidation you can provide!
Thank you for all your advice Prof. Smyth.
The randomers used as negative controls in this array, showed to be not that random at times! Thus, some of them (a small minority, yet consistent!) have unexpectedly high counts. For this reason, I don't believe they would serve well in background correction. Unless there were a way to pin point and exclude that minority, perform neqc() and see what comes out..
At this point, I'd rather stick with the standard normalization pipeline you suggested.
Also, I did not mention that the number of ncRNA molecules we intend to analyze with this custom microarray experiment is barely over a 100. So very small number of spots for them and about half as many randomers. What would you suggest in this case? I ask because, as you've rightly pointed out in another post (, normalizing could be trickier..
Thanks again for your help.
Unless there were a way to pin point and exclude that minority, perform neqc() and see what comes out.
That's exactly what the robust=TRUE option to neqc() does. It is designed to ignore a minority of negative control probes with overly high intensities.
Whether this will work well for your arrays, of course, I can't say.
Thank you for the additional suggestions and references. I will take the time to study them thoroughly. Nevertheless, I'm running on a tight schedule and I need to find an "optimal" solution given such circumstances.
At this point I'm tempted to try the standard pipeline and compare it to the neqc one.
I realize that the normexp + cyclic loess normalization, as suggested in your paper, would be the best approach but I wouldn't honestly know where to start, as I'd need some general code example on how to upweight the selected control probes and apply the cyclic loess normalization with the limma package (I don't seem to find anything like this in the userguide).
Last but not least, as far as intimate knowledge about the arrays, I thought I'd add something I've just realized it's "pretty" important: my arrays have triplicate spots per gene. Generally, each array is a collection of 16 12X12 subarrays, with 148 distinct small RNAs I wish to investigate, 38 distinct negative controls and 6 distinct positive controls. The rest is blanks. All genes are present in each array in triplicate spots. Given this very important piece of info (which I stupidly neglected until now), I suppose I must take this replication into account after background correction. I've read the example in section 16.4 of the limma userguide, but that's for dual-colored arrays. How could I integrate the need to compensate for triplication when normalizing between arrays (whether with quantile - considering or not the negative controls - or cyclicloess) for these single-channel arrays and use the best linear model to perform differential expression analysis?
I know it's gotten way complicated, but I'm a total newbie and yes I'm learning a lot, but at the same time I'm at a loss in front of such particular context. Thank you so much for your patience and all the help you can give me, I truly appreciate it.