Question

Upper-quartile normalization before RUVg normalization?

0

Entering edit mode

Jon Bråte ▴ 260

@jon-brate-6263

Last seen 4 months ago

Norway

Hi, I'm looking through the RUVseq manual and it seems that the set object used for RUVg normalization has first been normalized using betweenLaneNormalization from EDASeq? I tried RUVg normalization both with and without doing betweenLaneNormalization first and I get different results. So I just wanted to confirm whether it's recommended to do betweenLaneNormalization before RUVg normalization?

Thanks!

Jon

RUVSeq • 2.6k views

ADD COMMENT • link updated 5.5 years ago by davide risso ▴ 980 • written 5.5 years ago by Jon Bråte ▴ 260

score 1 · Answer 1 · 2019-05-14

1

Entering edit mode

davide risso ▴ 980

@davide-risso-5075

Last seen 7 months ago

University of Padova

Hi Jon,

you are right, adjusting for sequencing depth prior to RUV does influence the results. Our recommended workflow is to first run a between-sample normalization (e.g., by using upper-quartile implemented in betweenLaneNormalization) to adjust for sequencing depth and then run RUV. This is also what is suggested in the RUVSeq vignette.

Best, Davide

ADD COMMENT • link 5.5 years ago davide risso ▴ 980

0

Entering edit mode

Thanks for the clarification!

If I may ask another related thing, our spike set (ERCC genes) has a lot of zero counts, and this causes infinite and missing values after betweenLaneNormalization. We can solve this by adding +1 to each gene, but I am not sure how this will affect the results, especially for the genes which have zero in the first place. Would you recommend to add 1 to every count?

Error message after betweenLaneNormalization and RUVg normalization:

Error in svd(Ycenter[, cIdx]) : infinite or missing values in 'x'
In addition: Warning message:
In RUVg(counts, cIdx, k, drop, center, round, epsilon, tolerance,  :
The expression matrix does not contain counts.
Please, pass a matrix of counts (not logged) or set isLog to TRUE to skip the log transformation

ADD REPLY • link 5.5 years ago Jon Bråte ▴ 260

1

Entering edit mode

I would perhaps consider filtering out the spike-ins with a lot of zeros and/or choose a different normalization than upper-quartile, more robust to zeros, e.g., TMM or even scran (developed specifically with data with lots of zeros).

Alternatively, you can use RUVg without normalizing the data first. In our experience, it performs slightly worse, but it's still OK. Remember that the first factor usually picks up sequencing depth, so you will probably need to increase your k by 1.

ADD REPLY • link 5.5 years ago davide risso ▴ 980

0

Entering edit mode

We tried filtering out those spike-ins, but we were left with so very few... Thanks for these advice, we will check them out!!

ADD REPLY • link 5.5 years ago Jon Bråte ▴ 260