Hi,
I used matrix as a column of mcols of GRanges. This kind of structure would be of great value in some case. However, I found the matrix can not be extracted correctly with lapply. I don't know the reason. I can use 'for loop' to get around of it, but this way is not very convenient.
For example,
gr <- GRanges(c("A:1-2", "B:4-5")) gr$mat <- matrix(1:4, 2) (grl <- GRangesList(gr, gr + 1)) ## GRangesList object of length 2: ## [[1]] ## GRanges object with 2 ranges and 1 metadata column: ## seqnames ranges strand | mat ## <Rle> <IRanges> <Rle> | <matrix> ## [1] A [1, 2] * | 1:3 ## [2] B [4, 5] * | 2:4 ## [[2]] ## GRanges object with 2 ranges and 1 metadata column: ## seqnames ranges strand | mat ## [1] A [0, 3] * | 1:3 ## [2] B [3, 6] * | 2:4 ## ------- ## seqinfo: 2 sequences from an unspecified genome; no seqlengths lapply(grl, function(gr) gr$mat) ## it only takes the first column of mat ## [[1]] ## [1] 1 2 ## [[2]] ## [1] 1 2 listMat <- list() for (i in 1:2) { ## use for loop to avoid that problem listMat[[i]] <- grl[[i]]$mat } print(listMat) ## [[1]] ## [,1] [,2] ## [1,] 1 3 ## [2,] 2 4 ## [[2]] ## [,1] [,2] ## [1,] 1 3 ## [2,] 2 4
Even attaching the class "AsIs" to the mat does not work.
gr$mat <- I(matrix(1:4, 2)) class(gr$mat) <- "AsIs" class(gr$mat) ## "AsIs" is attached ## [1] "AsIs" (grl_2 <- GRangesList(gr, gr + 1)) GRangesList object of length 2: ## [[1]] ## GRanges object with 2 ranges and 1 metadata column: ## seqnames ranges strand | mat ## <Rle> <IRanges> <Rle> | <matrix> ## [1] A [1, 2] * | 1:3 ## [2] B [4, 5] * | 2:4 ## [[2]] ## GRanges object with 2 ranges and 1 metadata column: ## seqnames ranges strand | mat ## [1] A [0, 3] * | 1:3 ## [2] B [3, 6] * | 2:4 ## ------- ## seqinfo: 2 sequences from an unspecified genome; no seqlengths class(grl_2[[1]]$mat) ## "AsIs" is automatically removed! ## [1] "matrix" identical(grl, grl_2) ## [1] TRUE
I don't know if it is a proper behavior and the rational behind it.
Thanks for your help in advance.
Bests,
Can
sessionInfo() R version 3.4.1 (2017-06-30) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 Matrix products: default locale: [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936 [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936 [4] LC_NUMERIC=C [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936 attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] GenomicRanges_1.28.3 GenomeInfoDb_1.12.2 IRanges_2.10.2 [4] S4Vectors_0.14.3 BiocGenerics_0.22.0 BiocInstaller_1.26.0 [7] cowsay_0.5.0 loaded via a namespace (and not attached): [1] zlibbioc_1.22.0 compiler_3.4.1 rmsfact_0.0.3 [4] XVector_0.16.0 GenomeInfoDbData_0.99.0 RCurl_1.95-4.8 [7] bitops_1.0-6 fortunes_1.5-4
Hi Michael,
Thanks! I tried to understand those functions you had mentioned. And found
eventually calls
which would convert a matrix or an array to a numeric vector. And consequently the following
would only extract elements corresponding to first column of the original matrix or array.
S4Vectors:::.extractROWSWithBracket(gr$mat, NSBS(IRanges(1,2), gr$mat))
can succeed.
S4Vectors:::.extractROWSWithBracket
is the same with
So
can be changed to
inherited from: x="ANY", i="ANY"
to avoid the problem.I think the potential inefficiency you meant may come from
in
S4Vectors:::.extractROWSWithBracket
It was a long way for me to understand the mechanism you had interpreted. Anyway, thank you very much :)
Can