How to use normalized values by EDASeq in edgeR?
2
0
Entering edit mode
rahilsethi • 0
@rahilsethi-8703
Last seen 9.3 years ago
United States

Hi,

I need an opinion about either using normalized counts from EDASeq or raw counts with offset values of EDASeq in edgeR GLM for differential expression. I read edgeR manual where it says:

"The correction factors may take the form of scaling factors for the library sizes, such as computed by calcNormFactors, which are then used to compute the effective library sizes. Alternatively, gene-specific correction factors can be entered into the glm functions of edgeR as offsets."

Also it is mentioned that estimateCommonDisp and estimateTagwiseDisp require the library sizes to be equal for all samples for exactTest. Is this applicable to GLM dispersion estimation as well? If so then it seems I would have to use calcNormFactors() if the library sizes are not equal whether normalized offset values are provided or not. But then it is also mentioned that alternative to calcNormFactors are offset values other software such as cqn or EDASeq. How should I run differential expression on values normalized by EDASeq? Should I give raw values with offset values or normalized counts from EDASeq in edgeR?

Thanks,

Rahil

edgeR EDASeq • 3.3k views
ADD COMMENT
5
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

1. estimateCommonDisp() and estimateTagwiseDisp() do not require library sizes to be equal. Sorry if the help pages for these functions give that impression. The help pages are trying to explain an internal computation. edgeR never assumes that the real library sizes are equal.

2. If you not setting an offset matrix, then calcNormFactors() is used with both the classic and glm pipelines in edgeR. Normalization works pretty much the same for both pipelines.

3. If you compute a normalization offset matrix using EDASeq, then this can be input into the glm pipelines of edgeR. In this case, you do not use calcNormFactors(). The classic edgeR pipeline cannot accommodate an offset matrix.

4. Obviously you should not input "normalized counts" into edgeR. edgeR requires real counts.

ADD COMMENT
0
Entering edit mode

Thanks Gordon for clarifying. Maybe I misunderstood mentioned in the User Guide. This is what it says in page 13 (Section 2.6.7 Pseudo-Counts)
"In general, edgeR functions work directly on the raw counts. For the most part, edgeR does not produce any quantity that could be called a “normalized count”. An exception is the internal use of pseudo-counts by the classic edgeR functions estimateCommonDisp and exactTest. The exact negative binomial test [20] computed by exactTest and the conditional likelihood [20] used by estimateCommonDisp and estimateTagwiseDisp require the library sizes to be equal for all samples."

Regarding your 4rth point I got the same impression after reading the user guide, but then I saw your reply to one person who was trying to use normalized counts from EDASeq as input to edgeR and you did not give that person a warning:
problem with aveLogCPM.default in edgeR
In EDASeq when the dataframe contain normalized and raw counts with the offset values then exprs() function gives normalized counts instead of the raw counts as far as I checked on my data. So that is why I wanted to confirm it finally.

ADD REPLY
1
Entering edit mode

Yes, you have jumped to some incorrect conclusions from what is said in the edgeR documentation. The internal computation of conditional likelihood does use equal library sizes, but this is just for mathematical convenience and computational speed. The user-level edgeR functions themselves do not make this assumption (because they do the necessary equalization internally). I am the senior author of the estimateCommonDisp and estimateTagwiseDisp functions and I wrote the section of the User's Guide that you are quoting, so it might be constructive to assume that I am telling you the truth.

Regarding the earlier post that you give a link to, I advised the questioner to pass an offset matrix from EDASeq to edgeR, which is the same advice I am giving to you. I did not give any warning about "normalized counts" because I assumed that EDAseq was normalizing by offset and not changing the counts themselves.

ADD REPLY
0
Entering edit mode

Perhaps I should clarify that EDASeq returns *both* an offset, to be used in supervised problems, i.e., differential expression, by for instance passing it on to edgeR, *and* the normalized counts to be used for visualization / unsupervised problems. The functions counts() and normCounts() can be used to access the original and normalized counts, respectively (see help of SeqExpressionSet-class). The use of "exps()" is deprecated in EDASeq.

ADD REPLY
2
Entering edit mode
davide risso ▴ 980
@davide-risso-5075
Last seen 8 months ago
University of Padova

Hi Rahil,

Section 6.1 of the EDASeq vignette shows an example on how to use EDASeq normalization factors as offsets in edgeR.

 

ADD COMMENT
0
Entering edit mode

Hi Davide.

Thanks for the reply. I did look at the EDASeq manual before but it's not mentioned which one is preferred over the other (normalized counts vs. raw counts with offset values).  Moreover, just for the purpose of illustration on how to use EDASeq normalization in edgeR the manual mentions only estimateGLMCommonDisp for dispersion estimates which in my data gave more inconsistent results when I looked at the within group and between group variation for genes with FDR < 0.05. Trended and then Tagwise GLM dispersion estimates gave more consistent results.

ADD REPLY
1
Entering edit mode

Hi Rahil,

"they" is me :), so I now that my example was just for illustration, but it should work just fine if you add the trended and tag wise dispersion estimation. Perhaps I should update it to add those two steps.

We (as in the authors of EDASeq) have a somewhat "agnostic" position in the "normalized counts vs. offset" issue, because, although I agree with Gordon that count models should be applied to counts, in our experience the results are very similar if you use a negative binomial model on the pseudo counts obtained by normalizing the original counts. So I think in practice, it won't make much of a difference, but if you want to be formally more rigorous, you should probably stick to offsets.

ADD REPLY

Login before adding your answer.

Traffic: 798 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6