Hi,
When converting a GInteractions object into a dataframe, a ‘duplicate row.names’ error is generated. This is presumably a problem with the as.data.frame function, which encounters a problem since the names of the original GRanges are maintained in the GInteractions object, despite not being visible (unnaming the GRanges prior to joining, or giving them distinct names, avoids the problem).
Many thanks, Noa.
> Granges1 <- GRanges(seqnames="chr1",ranges=IRanges(start=c(2,3,4,5),end=c(5,6,7,8)))
> names(Granges1) <- paste("name",seq(from=1, to = length(Granges1)), sep="_")
> Granges2 <- GRanges(seqnames="chr1",ranges=IRanges(start=c(11,12,13,14),end=c(12,22,24,21)))
> names(Granges2) <- paste("name",seq(from=1, to = length(Granges2)), sep="_")
> GInt <- GInteractions(Granges1, Granges2)
> as.data.frame(GInt)
Error in data.frame(seqnames = as.factor(seqnames(x)), start = start(x), :
duplicate row.names: name_1, name_2, name_3, name_4
> GInt <- GInteractions(unname(Granges1), unname(Granges2))
> as.data.frame(GInt)
seqnames1 start1 end1 width1 strand1 seqnames2 start2 end2 width2 strand2
1 chr1 2 5 4 * chr1 11 12 2 *
2 chr1 3 6 4 * chr1 12 22 11 *
3 chr1 4 7 4 * chr1 13 24 12 *
4 chr1 5 8 4 * chr1 14 21 8 *
> names(Granges2) <- paste("new.name",seq(from=1, to = length(Granges2)), sep="_")
> GInt <- GInteractions(Granges1, Granges2)
> as.data.frame(GInt)
seqnames1 start1 end1 width1 strand1 seqnames2 start2 end2 width2 strand2
1 chr1 2 5 4 * chr1 11 12 2 *
2 chr1 3 6 4 * chr1 12 22 11 *
3 chr1 4 7 4 * chr1 13 24 12 *
4 chr1 5 8 4 * chr1 14 21 8 *
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /tungstenfs/groups/gbioinfo/Appz/easybuild/software/OpenBLAS/0.3.12-GCC-10.2.0/lib/libopenblas_skylakex-r0.3.12.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] TFBSTools_1.30.0 JASPAR2018_1.1.1 InteractionSet_1.20.0 SummarizedExperiment_1.22.0 Biobase_2.52.0 MatrixGenerics_1.4.0 matrixStats_0.59.0
[8] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4 readr_1.4.0 tidyr_1.1.3 tibble_3.1.2
[15] ggplot2_3.3.5 tidyverse_1.3.1 rtracklayer_1.52.0 GenomicRanges_1.44.0 GenomeInfoDb_1.28.1 IRanges_2.26.0 S4Vectors_0.30.0
[22] BiocGenerics_0.38.0
loaded via a namespace (and not attached):
[1] bitops_1.0-7 fs_1.5.0 DirichletMultinomial_1.34.0 lubridate_1.7.10 bit64_4.0.5 httr_1.4.2 tools_4.1.0
[8] backports_1.2.1 utf8_1.2.1 R6_2.5.0 seqLogo_1.58.0 DBI_1.1.1 colorspace_2.0-2 withr_2.4.2
[15] tidyselect_1.1.1 bit_4.0.4 compiler_4.1.0 cli_3.0.0 rvest_1.0.0 xml2_1.3.2 DelayedArray_0.18.0
[22] caTools_1.18.2 scales_1.1.1 Rsamtools_2.8.0 R.utils_2.10.1 XVector_0.32.0 pkgconfig_2.0.3 BSgenome_1.60.0
[29] dbplyr_2.1.1 fastmap_1.1.0 rlang_0.4.11 readxl_1.3.1 rstudioapi_0.13 RSQLite_2.2.7 BiocIO_1.2.0
[36] generics_0.1.0 jsonlite_1.7.2 BiocParallel_1.26.1 gtools_3.9.2 R.oo_1.24.0 RCurl_1.98-1.3 magrittr_2.0.1
[43] GO.db_3.13.0 GenomeInfoDbData_1.2.6 Matrix_1.3-3 Rcpp_1.0.7 munsell_0.5.0 fansi_0.5.0 R.methodsS3_1.8.1
[50] lifecycle_1.0.0 stringi_1.6.2 yaml_2.2.1 zlibbioc_1.38.0 plyr_1.8.6 grid_4.1.0 blob_1.2.1
[57] crayon_1.4.1 CNEr_1.28.0 lattice_0.20-44 Biostrings_2.60.1 haven_2.4.1 annotate_1.70.0 KEGGREST_1.32.0
[64] hms_1.1.0 pillar_1.6.1 rjson_0.2.20 reshape2_1.4.4 TFMPvalue_0.0.8 reprex_2.0.0 XML_3.99-0.6
[71] glue_1.4.2 modelr_0.1.8 png_0.1-7 vctrs_0.3.8 cellranger_1.1.0 poweRlaw_0.70.6 gtable_0.3.0
[78] assertthat_0.2.1 cachem_1.0.5 xtable_1.8-4 broom_0.7.8 pracma_2.3.3 restfulr_0.0.13 AnnotationDbi_1.54.1
[85] GenomicAlignments_1.28.0 memoise_2.0.0 ellipsis_0.3.2
Great, thanks for clarifying!