Does voom normalization quantile create difference in DGE genes when distribution of patients is similar (no big differences in quantiles)?
1
0
Entering edit mode
@200b7413
Last seen 15 months ago
Portugal

I am working with RNA counts data. Where the log(counts +1) look like this:

enter image description here

Of course 1.28.1 is an outlier and it was removed before normalization.

And by doing the right approach with having Voom <- voom(RNA_data, design, plot = TRUE) the results were this, havinng in mind that group A has 50 patients, B 25 and C 25. So the comparison is 50 to 50.

enter image description here

By adding the Voom <- voom(RNA_data, design, plot = TRUE, normalize.method = "quantile") the results changed to this:

enter image description here

I have read different posts regarding this topic even that normalize.method = "quantile" is used standardly in the original paper of voom. But is it supposed to cause such impact on the results. Given the distribution of the expression is this supposed to happen?

limmaGUI normalization limma voom Normalization • 1.5k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

Sorry, I don't follow what your question is about. Your question is apparently supposed to show a series of plots, but the plots aren't visible. The code used to make the plots also isn't included your question, so we don't know what the plots were intended to be about.

I will make these points:

  1. You cannot learn anything from a plot of log(counts+1). RNA-seq counts have to be, at very least, normalized by library size to have any meaning.
  2. Quantile is not the default normalization method in either edgeR or voom.
  3. The original voom paper used both TMM and quantile normalization.
ADD COMMENT
0
Entering edit mode

I am sorry, I have edited so that the images are visible, they are the main point of my question. I understand the log(counts+1) was only to remove the outliers before applying voom to make sure that when voom normalizes the data it does not normalize with outliers. And for both cases I filtered the genes in which counts are = 0 and applied edgeR::filterByExpr .

ADD REPLY
1
Entering edit mode

It would be better to use plotMDS to assess outlier samples rather than a plot of log(counts+1). A sample with lower counts might simply have a lower library size but might not be an outlier in terms of expression.

Anyway, yes, normalization is supposed to make a difference. That's why we recommend it! The DE results before normalization look extremely unbalanced and are unlikely to be reliable.

BTW, as I have said before on this forum, I do not recommend logFC cutoffs when assessing DE genes. I understand that it is common practice in the literature, but that doesn't make it good. Making an MD plot (plotMD) will give a better idea of the relationship between logFC and expression level.

ADD REPLY
0
Entering edit mode

One final question regarding this thread. Indeed, plotMDS had different results when comparing to log(counts +1) showing that the outlier wasn't really an outlier.

I did the plotMDS over the rna_counts data directly without normalizing. And this was the result:

enter image description here

Should the group from 1.18 to 1.36 (with five elements) be considered outlier? Additionally should it be removed before tmm + voom with normalize quantile ?

However the plot after rna_data -> tmm normalization and before voom is this one:

enter image description here

Or this way the only outlier is the that was considered before the 1.28.1. This way the plotMDS should be applied after tmm to discover outliers?

Thank you once gain!

ADD REPLY
0
Entering edit mode

A voom workflow that you can follow is shown here: RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR

  1. All analysis including plotMDS should be done after normalization.
  2. plotMDS is applied either to the DGEList object or to logCPM values rather than to a numeric matrix of counts. (The first MDS plot above is on the wrong scale, suggesting that has been applied to raw counts without any conversion to log-expression values. The difference between your two MDS plots is not just due to TMM normalization.)
  3. Outliers should not be removed unless you have a causal explanation for them (otherwise you're cherry-picking the data).
  4. I suggest you use sample weights. Any outliers will then be automatically downweighted in the analysis so there is no need to agonize about whether to remove them. Sample weights are implemented by voomWithQualityWeights or (more easily) by edgeR::voomLmFit with sample.weights=TRUE.
ADD REPLY

Login before adding your answer.

Traffic: 529 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6