Question

RUVnormalize- should input be pre-normalized?

1

Entering edit mode

Cornwell, Adam ▴ 110

@cornwell-adam-5680

Last seen 2.3 years ago

United States

Greetings,

I started working with RUVnormalize a couple of days ago, and although I wasn't able to get the iterative method to work (due to an external package that is very difficult to get working on Windows) the naive methods seemed to at least do something. However, the vignette does not specify what "level" of data should be fed into the algorithm for best performance. Assuming Affymetrix, should it be un-normalized probe-level data? Summarized but not otherwise normalized? I was inputting a matrix that was already normalized with RMA, but I'm just wondering if these other alternatives would be better.

Thanks.

ruvnormalize microarray normalization • 1.5k views

ADD COMMENT • link updated 9.8 years ago by laurent.jacob ▴ 10 • written 9.8 years ago by Cornwell, Adam ▴ 110

score 0 · Answer 1 · 2015-07-01

Hi Adam,

Regarding the iterative method, did you have trouble running the spams package? If so, you could contact Julien Mairal (http://spams-devel.gforge.inria.fr/contacts.html), who maintains the package.

There is no general answer on what level of data should be fed into the algorithm. My recommandation would be to apply all your usual array-level corrections, but no cross-array correction, e.g. on your affy arrays, do background correction and summarization but no quantile normalization. The reason is that RUV will not deal with array level corrections, but could do a better job than quantile normalization at cross-array adjustment (eg if a factor of interest leads to some samples having different probe distributions than the others). It is better to work on log intensities, which are often more suited to least square based methods.

You can have a look at the differential analysis RUV paper (http://biostatistics.oxfordjournals.org/content/13/3/539.full), Section 3, in particular 3.1 and Table 1. They tried RUV after different normalization levels, and observed little difference.

I would make an exception if you have a strong known batch which is unlikely to be associated with a factor of interest. For example, if you work on a 2 year study and a different platform was used each year, you could assume that your population was on average the same each year (unless you know this is wrong by design of the study), and explicitly mean center your arrays per year. We are doing this with the platform in the gender example of the vignette (we did the same experiment with no platform centering and obtained similar results though). Pro: if the platform effect has a much larger magnitude than other (eg biological) effects, you may need to remove too much of the other effects to get rid of the platform with random effect models (eg naiveRandRUV, nu.coeff != 0). Cons: unless you are studying a known factor of interest, you can never know for sure that you are not losing a signal of interest when you center by platform.

I hope this is helpful, don't hesitate to ask if something is unclear.

Best,

Laurent