normLGF yields non-symetric matrices
5
0
Entering edit mode
@enriquevidal-12787
Last seen 7.6 years ago

Hi!

HiCNorm implementation provided in normLGF differs from original one in

https://github.com/Bioconductor-mirror/HiTC/blob/master/R/normalize_hiC.R#L405

while the original code is

len_m<-(len_m-mean(c(len_m)))/sd(c(len_m))
gcc_m<-(gcc_m-mean(c(gcc_m)))/sd(c(gcc_m))

This leads to non-symmetrical matrices.

Any ideas why? Is it desirable to break symmetry?

HiTC • 1.7k views
ADD COMMENT
1
Entering edit mode
@nicolas-servant-1466
Last seen 2.6 years ago
France

Hi,

Not sure to see why this lead to non-symmetrical matrices ?

But indeed, this is not expected. Normalized matrices should also be symmetric.

Best

ADD COMMENT
1
Entering edit mode
@nicolas-servant-1466
Last seen 2.6 years ago
France

Indeed ! I also checked in the original code from Hu et al. 2012, and this is the same.

http://www.people.fas.harvard.edu/~junliu/HiCNorm/

So I agree that the matrix is no longer symmetric but the normalized values in i,j and j,i remains very close .. so I do not think that this is a real issue.

Would you have any idea to change that ?

Otherwise, you can transform it in a symmetric matrix using ;

> forceSymmetric(hiC_LGF)

And I will try to contact the authors of the method.

Best

 

ADD COMMENT
1
Entering edit mode
@enriquevidal-12787
Last seen 7.6 years ago

I've checked the original scripts following the link you provided and it seems they are scaling by the overall sd, no the column-specific sd.

I guess changing the lines

    len_m<-(len_m-mean(len_m, na.rm=TRUE))/apply(len_m, 2, sd, na.rm=TRUE)
    gcc_m<-(gcc_m-mean(gcc_m, na.rm=TRUE))/apply(gcc_m, 2, sd, na.rm=TRUE)

by

    len_m<-(len_m-mean(len_m, na.rm=TRUE))/sd(len_m, na.rm=TRUE)
    gcc_m<-(gcc_m-mean(gcc_m, na.rm=TRUE))/sd(gcc_m, na.rm=TRUE)


in the normLGF definition would do the trick (which I've already done in my local version of the package).

I agree with you that the differences at the "cell" level (x_{i,j}) could be minor. However, I don't know what is the advantage of scaling by the column-specific sd instead of the overall sd.

In any case, thanks for your quick responses.

:)

 

 

ADD COMMENT
0
Entering edit mode
@enriquevidal-12787
Last seen 7.6 years ago

I guess if you divide each column by a different number, then the matrix no longer is symmetric.

a <- matrix(rnorm(100), 10)
a <- (a + t(a)) / 2

check_sim <- function(x) identical(x, (x + t(x)) / 2)

check_sim(a)

b <- (a - mean(a))/ sd(a)
check_sim(b)

bb <- (a - mean(a))/apply(a, 2, sd)
check_sim(bb)

 

ADD COMMENT
0
Entering edit mode
@nicolas-servant-1466
Last seen 2.6 years ago
France

thank you very much for your suggestion !

I will try to update the package for next release.

Best

ADD COMMENT

Login before adding your answer.

Traffic: 999 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6