Question

locfunc position in estimateSizeFactorsForMatrix function

0

Entering edit mode

lapinskim • 0

@lapinskim-10840

Last seen 8.7 years ago

Hello,

Just a quick question, why the locfunc in estimateSizeFactrosForMatrix is applied to the logarithm of the ratio of counts to geometric average, and not to the ratio itself (without the logarithm). As in:

exp(locfunc((log(cnts) - loggeomeans)

and not:

locfunc(exp((log(cnts) - loggeomeans)

I've been following your publication (Anders, Huber 2010), and since the numbers from those two approaches are slightly different, I have been wondering which is considered correct and why?

Best,

Maciek

deseq2 normalization • 736 views

ADD COMMENT • link updated 8.7 years ago by Michael Love 43k • written 8.7 years ago by lapinskim • 0

score 1 · Accepted Answer · 2016-06-06

This choice is inherited from DESeq:

https://github.com/Bioconductor-mirror/DESeq/blob/release-3.3/R/core.R#L5

For locfund=median, the results identical if the vector has a length which is an odd number, and otherwise, only slightly different when the length is even (because we take the arithmetic or geometric mean of the middle two numbers).

For general locfunc, it's preferable to have the location function operate on the log scale, because the log scale centers the ratios around zero.