Hi,
i tried to convert a numeric matrix into a SnpMatrix. My matrix is coded binary (0/1 for biallelic, there are no heterozygotes). Based on the documentation, i would have assumed that I would need to recode this into 0/2 (see below), however, it appears this is not the case.
The documentation (https://rdrr.io/bioc/snpStats/man/SnpMatrix-class.html) states:
coerce signature(from = "SnpMatrix", to = "character"): map to codes "A/A", "A/B", "B/B", ""
coerce signature(from = "matrix",to = "SnpMatrix"): maps numeric matrix (coded 0, 1, 2 or NA) to a SnpMatrix
I would have assumed that these codings are in the same order.
However, based on some experimentation with conversion (via new("SnpMatrix",x)) and colSummary, it appears that the actual coding for coercion from a numeric matrix is:
0 -> Missing
1 -> A/A
2 -> A/B
3 -> B/B
If you can confirm that, i think it would be helpful to update the documentation.
The matrix:
> str(test_mat)
num [1:50, 1:50] 0 1 0 0 1 0 0 0 0 0 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:50] "742" "763" "772" "854" ...
..$ : chr [1:50] "1" "5" "7" "9" ...
Recoding
> storage.mode(test_mat) <- "double"
> test_mat_code_0 <- test_mat
> test_mat_code_0[test_mat_code_0 == 1] <- 2
> test_mat_code_0[test_mat_code_0 == 0] <- 0
> test_mat_code_1 <- test_mat
> test_mat_code_1[test_mat_code_1 == 1] <- 2
> test_mat_code_1[test_mat_code_1 == 0] <- 1
>
> test_mat_code_2 <- test_mat
> test_mat_code_2[test_mat_code_2 == 1] <- 3
> test_mat_code_2[test_mat_code_2 == 0] <- 1
The two colsummaries:
This has missing:
snpStats::col.summary(new("SnpMatrix", testmatcode_0))[1:10,] coercing object of mode character to SnpMatrix
Calls Call.rate Certain.calls RAF MAF P.AA P.AB P.BB z.HWE
1 3 0.10000000 1 0.5 0.5 0 1 0 1.732051
5 10 0.33333333 1 0.5 0.5 0 1 0 3.162278
7 0 0.00000000 NA NA NA NA NA NA NA
9 1 0.03333333 1 0.5 0.5 0 1 0 1.000000
12 0 0.00000000 NA NA NA NA NA NA NA
13 1 0.03333333 1 0.5 0.5 0 1 0 1.000000
16 0 0.00000000 NA NA NA NA NA NA NA
17 0 0.00000000 NA NA NA NA NA NA NA
18 0 0.00000000 NA NA NA NA NA NA NA
21 0 0.00000000 NA NA NA NA NA NA NA
This has no BB only AA and AB?
snpStats::col.summary(new("SnpMatrix", testmatcode_1))[1:10,] coercing object of mode numeric to SnpMatrix
Calls Call.rate Certain.calls RAF MAF P.AA P.AB P.BB z.HWE
1 50 1 1 0.03 0.03 0.94 0.06 0 0.21869282
5 50 1 1 0.10 0.10 0.80 0.20 0 0.78567420
7 50 1 1 0.00 0.00 1.00 0.00 0 NA
9 50 1 1 0.01 0.01 0.98 0.02 0 0.07142493
12 50 1 1 0.00 0.00 1.00 0.00 0 NA
13 50 1 1 0.01 0.01 0.98 0.02 0 0.07142493
16 50 1 1 0.00 0.00 1.00 0.00 0 NA
17 50 1 1 0.00 0.00 1.00 0.00 0 NA
18 50 1 1 0.00 0.00 1.00 0.00 0 NA
21 50 1 1 0.00 0.00 1.00 0.00 0 NA
While this seems to have the correct coding:
snpStats::col.summary(new("SnpMatrix", testmatcode_2))[1:10,] coercing object of mode numeric to SnpMatrix
Calls Call.rate Certain.calls RAF MAF P.AA P.AB P.BB z.HWE
1 50 1 1 0.06 0.06 0.94 0 0.06 -7.071068
5 50 1 1 0.20 0.20 0.80 0 0.20 -7.071068
7 50 1 1 0.00 0.00 1.00 0 0.00 NA
9 50 1 1 0.02 0.02 0.98 0 0.02 -7.071068
12 50 1 1 0.00 0.00 1.00 0 0.00 NA
13 50 1 1 0.02 0.02 0.98 0 0.02 -7.071068
16 50 1 1 0.00 0.00 1.00 0 0.00 NA
17 50 1 1 0.00 0.00 1.00 0 0.00 NA
18 50 1 1 0.00 0.00 1.00 0 0.00 NA
21 50 1 1 0.00 0.00 1.00 0 0.00 NA
Cheers Niklas