matrix as a column of mcols of GRanges would not be extracted correctly by lapply
1
0
Entering edit mode
wcstcyx ▴ 30
@wcstcyx-11636
Last seen 6.8 years ago
China/Beijing/AMSS,CAS

Hi,

I used matrix as a column of mcols of GRanges. This kind of structure would be of great value in some case. However, I found the matrix can not be extracted correctly with lapply. I don't know the reason. I can use 'for loop' to get around of it, but this way is not very convenient. 

For example,

gr <- GRanges(c("A:1-2", "B:4-5"))

gr$mat <- matrix(1:4, 2)

(grl <- GRangesList(gr, gr + 1))

## GRangesList object of length 2:
## [[1]] 
## GRanges object with 2 ranges and 1 metadata column:
##       seqnames    ranges strand |      mat
##          <Rle> <IRanges>  <Rle> | <matrix>
##   [1]        A    [1, 2]      * |      1:3
##   [2]        B    [4, 5]      * |      2:4

## [[2]] 
## GRanges object with 2 ranges and 1 metadata column:
##       seqnames ranges strand | mat
##   [1]        A [0, 3]      * | 1:3
##   [2]        B [3, 6]      * | 2:4

## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths


lapply(grl, function(gr) gr$mat) ## it only takes the first column of mat
## [[1]]
## [1] 1 2

## [[2]]
## [1] 1 2

listMat <- list()
for (i in 1:2) { ## use for loop to avoid that problem
  listMat[[i]] <- grl[[i]]$mat
}
print(listMat)
## [[1]]
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4

## [[2]]
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4

Even attaching the class "AsIs" to the mat does not work.

gr$mat <- I(matrix(1:4, 2))
class(gr$mat) <- "AsIs"
class(gr$mat) ## "AsIs" is attached
## [1] "AsIs"
(grl_2 <- GRangesList(gr, gr + 1))
GRangesList object of length 2:
## [[1]] 
## GRanges object with 2 ranges and 1 metadata column:
##       seqnames    ranges strand |      mat
##          <Rle> <IRanges>  <Rle> | <matrix>
##   [1]        A    [1, 2]      * |      1:3
##   [2]        B    [4, 5]      * |      2:4

## [[2]] 
## GRanges object with 2 ranges and 1 metadata column:
##       seqnames ranges strand | mat
##   [1]        A [0, 3]      * | 1:3
##   [2]        B [3, 6]      * | 2:4

## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
class(grl_2[[1]]$mat) ## "AsIs" is automatically removed!
## [1] "matrix"
identical(grl, grl_2)
## [1] TRUE

I don't know if it is a proper behavior and the rational behind it.

Thanks for your help in advance.

Bests,

Can

sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936   
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
[4] LC_NUMERIC=C                                                   
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] GenomicRanges_1.28.3 GenomeInfoDb_1.12.2  IRanges_2.10.2      
[4] S4Vectors_0.14.3     BiocGenerics_0.22.0  BiocInstaller_1.26.0
[7] cowsay_0.5.0        

loaded via a namespace (and not attached):
[1] zlibbioc_1.22.0         compiler_3.4.1          rmsfact_0.0.3          
[4] XVector_0.16.0          GenomeInfoDbData_0.99.0 RCurl_1.95-4.8         
[7] bitops_1.0-6            fortunes_1.5-4         
mcols DataFrame GRanges • 1.5k views
ADD COMMENT
3
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States

This is because

> extractROWS(gr$mat, NSBS(IRanges(1,2), gr$mat))
[1] 1 2

which dispatches to extractROWS,vector_or_factor,RangeNSBS() and fails to take into account the dimensions. For now, I've checked in a fallback to S4Vectors:::.extractROWSWithBracket() for arrays, but there would be some inefficiency in converting the ranges to an integer vector. Thoughts, Herve?

ADD COMMENT
0
Entering edit mode

Hi Michael,

Thanks! I tried to understand those functions you had mentioned. And found

extractROWS(gr$mat, NSBS(IRanges(1,2), gr$mat))

eventually calls

x <- as(x, "vector_OR_factor", strict = FALSE)

which would convert a matrix or an array to a numeric vector. And consequently the following

extract_ranges_from_vector_OR_factor(x, start, width)

would only extract elements corresponding to first column of the original matrix or array.

S4Vectors:::.extractROWSWithBracket(gr$mat, NSBS(IRanges(1,2), gr$mat))

can succeed.

S4Vectors:::.extractROWSWithBracket

 is the same with

selectMethod(extractROWS, signature(x="any", i="any"))

So 

x="array", i="RangeNSBS"
    (inherited from: x="vector_OR_factor", i="RangeNSBS")
x="array", i="RangesNSBS"
    (inherited from: x="vector_OR_factor", i="RangesNSBS")

x="matrix", i="RangeNSBS"
    (inherited from: x="vector_OR_factor", i="RangeNSBS")
x="matrix", i="RangesNSBS"
    (inherited from: x="vector_OR_factor", i="RangesNSBS")

can be changed to inherited from: x="ANY", i="ANY" to avoid the problem.

I think the potential inefficiency you meant may come from

i <- normalizeSingleBracketSubscript(i, x, allow.NAs = TRUE)

in S4Vectors:::.extractROWSWithBracket

It was a long way for me to understand the mechanism you had interpreted. Anyway, thank you very much :)

Can

ADD REPLY

Login before adding your answer.

Traffic: 739 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6