Question

matrix as a column of mcols of GRanges would not be extracted correctly by lapply

0

Entering edit mode

wcstcyx ▴ 30

@wcstcyx-11636

Last seen 7.1 years ago

China/Beijing/AMSS,CAS

Hi,

I used matrix as a column of mcols of GRanges. This kind of structure would be of great value in some case. However, I found the matrix can not be extracted correctly with lapply. I don't know the reason. I can use 'for loop' to get around of it, but this way is not very convenient.

For example,

gr <- GRanges(c("A:1-2", "B:4-5"))

gr$mat <- matrix(1:4, 2)

(grl <- GRangesList(gr, gr + 1))

## GRangesList object of length 2:
## [[1]] 
## GRanges object with 2 ranges and 1 metadata column:
##       seqnames    ranges strand |      mat
##          <Rle> <IRanges>  <Rle> | <matrix>
##   [1]        A    [1, 2]      * |      1:3
##   [2]        B    [4, 5]      * |      2:4

## [[2]] 
## GRanges object with 2 ranges and 1 metadata column:
##       seqnames ranges strand | mat
##   [1]        A [0, 3]      * | 1:3
##   [2]        B [3, 6]      * | 2:4

## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths


lapply(grl, function(gr) gr$mat) ## it only takes the first column of mat
## [[1]]
## [1] 1 2

## [[2]]
## [1] 1 2

listMat <- list()
for (i in 1:2) { ## use for loop to avoid that problem
  listMat[[i]] <- grl[[i]]$mat
}
print(listMat)
## [[1]]
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4

## [[2]]
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4

Even attaching the class "AsIs" to the mat does not work.

gr$mat <- I(matrix(1:4, 2))
class(gr$mat) <- "AsIs"
class(gr$mat) ## "AsIs" is attached
## [1] "AsIs"
(grl_2 <- GRangesList(gr, gr + 1))
GRangesList object of length 2:
## [[1]] 
## GRanges object with 2 ranges and 1 metadata column:
##       seqnames    ranges strand |      mat
##          <Rle> <IRanges>  <Rle> | <matrix>
##   [1]        A    [1, 2]      * |      1:3
##   [2]        B    [4, 5]      * |      2:4

## [[2]] 
## GRanges object with 2 ranges and 1 metadata column:
##       seqnames ranges strand | mat
##   [1]        A [0, 3]      * | 1:3
##   [2]        B [3, 6]      * | 2:4

## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
class(grl_2[[1]]$mat) ## "AsIs" is automatically removed!
## [1] "matrix"
identical(grl, grl_2)
## [1] TRUE

I don't know if it is a proper behavior and the rational behind it.

Thanks for your help in advance.

Bests,

Can

sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936   
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
[4] LC_NUMERIC=C                                                   
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] GenomicRanges_1.28.3 GenomeInfoDb_1.12.2  IRanges_2.10.2      
[4] S4Vectors_0.14.3     BiocGenerics_0.22.0  BiocInstaller_1.26.0
[7] cowsay_0.5.0        

loaded via a namespace (and not attached):
[1] zlibbioc_1.22.0         compiler_3.4.1          rmsfact_0.0.3          
[4] XVector_0.16.0          GenomeInfoDbData_0.99.0 RCurl_1.95-4.8         
[7] bitops_1.0-6            fortunes_1.5-4

mcols DataFrame GRanges • 1.6k views

ADD COMMENT • link updated 7.7 years ago by Michael Lawrence ★ 11k • written 7.7 years ago by wcstcyx ▴ 30

score 3 · Accepted Answer · 2017-07-16

3

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 3.3 years ago

United States

This is because

> extractROWS(gr$mat, NSBS(IRanges(1,2), gr$mat))
[1] 1 2

which dispatches to extractROWS,vector_or_factor,RangeNSBS() and fails to take into account the dimensions. For now, I've checked in a fallback to S4Vectors:::.extractROWSWithBracket() for arrays, but there would be some inefficiency in converting the ranges to an integer vector. Thoughts, Herve?

ADD COMMENT • link 7.7 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Hi Michael,

Thanks! I tried to understand those functions you had mentioned. And found

extractROWS(gr$mat, NSBS(IRanges(1,2), gr$mat))

eventually calls

x <- as(x, "vector_OR_factor", strict = FALSE)

which would convert a matrix or an array to a numeric vector. And consequently the following

extract_ranges_from_vector_OR_factor(x, start, width)

would only extract elements corresponding to first column of the original matrix or array.

S4Vectors:::.extractROWSWithBracket(gr$mat, NSBS(IRanges(1,2), gr$mat))

can succeed.

S4Vectors:::.extractROWSWithBracket

is the same with

selectMethod(extractROWS, signature(x="any", i="any"))

So

x="array", i="RangeNSBS"
    (inherited from: x="vector_OR_factor", i="RangeNSBS")
x="array", i="RangesNSBS"
    (inherited from: x="vector_OR_factor", i="RangesNSBS")

x="matrix", i="RangeNSBS"
    (inherited from: x="vector_OR_factor", i="RangeNSBS")
x="matrix", i="RangesNSBS"
    (inherited from: x="vector_OR_factor", i="RangesNSBS")

can be changed to inherited from: x="ANY", i="ANY" to avoid the problem.

I think the potential inefficiency you meant may come from

i <- normalizeSingleBracketSubscript(i, x, allow.NAs = TRUE)

in S4Vectors:::.extractROWSWithBracket

It was a long way for me to understand the mechanism you had interpreted. Anyway, thank you very much :)

Can

ADD REPLY • link 7.7 years ago wcstcyx ▴ 30