Hi there,
I'm getting unexpected behaviour when using set operation functions in an R session with both dplyr and GenomicRanges loaded. Set operation functions in dplyr used to not work on GRanges objects, which is fine, as if I accidentally used union
rather than GenomicRanges::union
I'd get an error and know how to fix it. Now I no longer get an error using dplyr::union
on GRanges objects, but I also don't get the expected results (see reproducible example below).
I wasn't sure if this was an issue with dplyr or GenomicRanges, so I first reported it on the dplyr github issues. They think it's an issue with the dispatch, which ends up calling S4Vectors::union.Vector
instead of base::union
or GenomicRanges::union
. The discussion is still ongoing there, but I thought it was worth also bringing it up here in case the Bioconductor team can provide some insight.
Thanks in advance for any help!
library(GenomicRanges)
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: parallel
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:parallel':
#>
#> clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
#> clusterExport, clusterMap, parApply, parCapply, parLapply,
#> parLapplyLB, parRapply, parSapply, parSapplyLB
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> anyDuplicated, append, as.data.frame, basename, cbind,
#> colMeans, colnames, colSums, dirname, do.call, duplicated,
#> eval, evalq, Filter, Find, get, grep, grepl, intersect,
#> is.unsorted, lapply, lengths, Map, mapply, match, mget, order,
#> paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind,
#> Reduce, rowMeans, rownames, rowSums, sapply, setdiff, sort,
#> table, tapply, union, unique, unsplit, which, which.max,
#> which.min
#> Loading required package: S4Vectors
#>
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:base':
#>
#> expand.grid
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:GenomicRanges':
#>
#> intersect, setdiff, union
#> The following object is masked from 'package:GenomeInfoDb':
#>
#> intersect
#> The following objects are masked from 'package:IRanges':
#>
#> collapse, desc, intersect, setdiff, slice, union
#> The following objects are masked from 'package:S4Vectors':
#>
#> first, intersect, rename, setdiff, setequal, union
#> The following objects are masked from 'package:BiocGenerics':
#>
#> combine, intersect, setdiff, union
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
x <- GRanges("chr1", IRanges(c(2, 9) , c(7, 19)), strand=c("+", "-"))
y <- GRanges("chr1", IRanges(5, 10), strand="-")
GenomicRanges::union(x,y)
#> GRanges object with 2 ranges and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chr1 2-7 +
#> [2] chr1 5-19 -
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
dplyr::union(x, y)
#> GRanges object with 3 ranges and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chr1 2-7 +
#> [2] chr1 9-19 -
#> [3] chr1 5-10 -
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
GenomicRanges::intersect(x,y)
#> GRanges object with 1 range and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chr1 9-10 -
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
dplyr::intersect(x, y)
#> GRanges object with 0 ranges and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
sessionInfo()
#> R version 3.5.1 (2018-07-02)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: macOS Sierra 10.12.6
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
#>
#> attached base packages:
#> [1] parallel stats4 stats graphics grDevices utils datasets
#> [8] methods base
#>
#> other attached packages:
#> [1] dplyr_0.7.8 GenomicRanges_1.34.0 GenomeInfoDb_1.18.1
#> [4] IRanges_2.16.0 S4Vectors_0.20.1 BiocGenerics_0.28.0
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.0 bindr_0.1.1 knitr_1.21
#> [4] XVector_0.22.0 magrittr_1.5 zlibbioc_1.28.0
#> [7] tidyselect_0.2.5 R6_2.3.0 rlang_0.3.1
#> [10] stringr_1.3.1 highr_0.7 tools_3.5.1
#> [13] xfun_0.4 htmltools_0.3.6 assertthat_0.2.0
#> [16] yaml_2.2.0 digest_0.6.18 tibble_2.0.0
#> [19] crayon_1.3.4 bindrcpp_0.2.2 GenomeInfoDbData_1.2.0
#> [22] purrr_0.2.5 bitops_1.0-6 RCurl_1.95-4.11
#> [25] glue_1.3.0 evaluate_0.12 rmarkdown_1.11
#> [28] stringi_1.2.4 pillar_1.3.1 compiler_3.5.1
#> [31] pkgconfig_2.0.2
Created on 2019-01-10 by the reprex package (v0.2.1)
I guess the point is that dispatch could be smarter. I went ahead and inverted the dispatch cascade in S4Vectors 0.21.10, so that it flows from S3 to S4. Should resolve this issue.
Thank you very much for making this change!
IMO this is a problem because the behaviour (
S4Vectors::union
being called) is unexpected, since at least in my experience this kind of behaviour doesn't happen in any other situations (either the correct method is called for the object, or an error occurs), because it produces significantly different results to the correct method for the object, and because it still returns a GRanges it's difficult to spot. I only noticed because of a numerical difference in results after loading dplyr. Since both dplyr and GenomicRanges are pretty popular packages I'm sure I'm not the only person who could experience this issue.You could argue that everyone should always be explicit about which version of
union
they are using, but realistically that isn't common practice. Over on Github I was pointed to the conflicted package, though, which is a great way to spot where this is necessary!