Question

normalized counts from RUVg run: is further library size normalization needed?

0

Entering edit mode

capricygcapricyg ▴ 10

@capricygcapricyg-17892

Last seen 2.8 years ago

United States

Hi, RUVSeq support,

After running RUVg, I got the normalized counts. My question is: are those counts also normalized against the library size?

Thanks.

Kind regards,

C.

RUVSeq • 2.7k views

ADD COMMENT • link updated 6.5 years ago by davide risso ▴ 980 • written 6.5 years ago by capricygcapricyg ▴ 10

score 0 · Answer 1 · 2018-11-07

0

Entering edit mode

davide risso ▴ 980

@davide-risso-5075

Last seen 13 months ago

University of Padova

Hi,

if you follow the workflow described in the RUVSeq vignette, you will perform library size normalization as a first step and RUVg as a second step to normalize the data.

If that's the case, then the normalized counts from RUVg are indeed normalized for library size and no further normalization is needed.

I hope this helps.

Best, Davide

ADD COMMENT • link 6.5 years ago davide risso ▴ 980

0

Entering edit mode

Hi, Davide,

Thank you very much for the response!

Could you please clarify more about "first step"?

Here are what I ran:

==

require(RUVSeq)

set.RUV=newSeqExpressionSet(as.matrix(counts.filtered),phenoData=data.frame(sampleInfo$condition,row.names=colnames(counts.filtered)))

set1.RUV=RUVg(set.RUV,counts.spike.RUV.genelist,k=1)

normCounts(set1.RUV)

==

I didn't run "betweenLaneNormalization" since it is in the "exploratory data analysis" session.

Do you think the "normCounts" here returns the counts which have been adjusted for the library sizes? I also wonder how the library size is adjusted in the normalized output.

Thanks a lot!

C.

ADD REPLY • link 6.5 years ago capricygcapricyg ▴ 10

0

Entering edit mode

I did see that running "betweenLaneNormalization()" makes difference in terms of the output of the normCounts(). So, I guess my questions are: is "betweenLaneNormalization()" required for the RUVg normalization? Does "RUVg()" alone take care of the library size-wise normalization?

I ask this question since I would like to plot individual gene counts in a boxplot.

Thanks.

C.

ADD REPLY • link 6.5 years ago capricygcapricyg ▴ 10

0

Entering edit mode

Our recommended pipeline is to first account for sequencing depth differences with betweenLaneNormalization() and then run RUV. One can run RUV without library size normalization and hope that one of the factors will account for that. It usually happens, but we find that it's preferable in practice to explicitly account for them with offsets.

ADD REPLY • link 6.5 years ago davide risso ▴ 980

0

Entering edit mode

Does that mean the "normalizedCounts" are CPM values?

ADD REPLY • link 6.5 years ago Gordon Smyth 52k

0

Entering edit mode

The "normalizedCounts" are simply the residuals of the model, so they are not CPM values, but one can fit the RUV model starting from CPM values. That's our recommended pipeline, except we don't call the result of our betweenLaneNormalization function CPM because we do not scale them to be count per million but so that the counts sum to the average of the library sizes across the samples.

ADD REPLY • link 6.5 years ago davide risso ▴ 980

0

Entering edit mode

Thanks, but I still don't follow what the normalizedCounts are, or even what scale they're on. They may be residuals, but from what model and fitted to what quantities?

The RUVseq vignette says that normalized read counts are obtained as residuals from OLS of (log Y - O) on W. If O was the log library size, then these residuals would be batch-corrected logCPM values. Is that right?

Are the normalizedCounts on the log scale?

Is the output from betweenLaneNormalization the same as the normalizedCounts slot in RUVseq or something different?

ADD REPLY • link 6.3 years ago Gordon Smyth 52k

0

Entering edit mode

Hi Gordon, Sorry, I somehow missed this comment. You are right, the residuals are essentially batch-corrected logCPM values on the log scale.

The output from betweenLaneNormalization (EDASeq package) only performs library size normalization (by default with upper-quartile normalization), so they would be logCPM values. On the other hand, the normalizedCounts slot after RUVSeq is performed contains batch-corrected logCPM values.

Hope this clarifies it.

Best, Davide

ADD REPLY • link 6.2 years ago davide risso ▴ 980