Hello, I am overlapping chipseq peaks with hichip peaks in RStudio using findOVerlaps() function. I performed the overlapping on each fragment of the hichip separately and extracted the query hits of each overlaps and then create empty columns in my chip peak file so I can fill it with the fragments that overlapped with it using the approach of queryHits and subjectHits. But I got kind of lost with query and subject hits. Does anyone have an idea about it plz.
hichip_gr <- GRangesList(frg1, frg2)
# Create GRanges for the first set of ranges
frg1 <- GRanges(seqnames = Rle(hichip$frg1),
ranges = IRanges(start = hichip$start1, end = hichip$end1))
# Create GRanges for the second set of ranges
frg2 <- GRanges(seqnames = Rle(hichip$frg2),
ranges = IRanges(start = hichip$start2, end = hichip$end2))
# Load my chip peak file
S1 <- read.delim(file.path(path, "S1_Invi_fdr.bed") , header = FALSE)
# Convert it to Grange
S1 <- GRanges(seqnames = Rle(S1$seqnames),
ranges = IRanges(start = S1$start, end = S1$end))
# creating a data columns for hichip fragments
mcols(S1)$frg1 <- NA
mcols(S1)$start.1 <- NA
mcols(S1)$end.1 <- NA
mcols(S1)$frg2 <- NA
mcols(S1)$start.2 <- NA
mcols(S1)$end.2 <- NA
# find overlapping
overlap1 <- findOverlaps(S1, frg1)
# extract queryHits from overlap1
query_hits <- overlap1$subjectHits
subset_frg1 <- frg1[query_hits]
# this is the result of overlap1
overlap1
queryHits subjectHits
1: 1 35804
2: 1 35805
3: 5 44884
4: 5 44885
5: 5 44886
---
12398: 15969 43123
12399: 15970 25561
12400: 15971 55972
12401: 15971 55973
12402: 15971 55974
sessionInfo( )
R version 4.1.1 (2021-08-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS
Matrix products: default
BLAS/LAPACK: /shared/ifbstor1/software/miniconda/envs/r-4.1.1/lib/libopenblasp-r0.3.18.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.14.2 reshape2_1.4.4
[3] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 ChIPseeker_1.30.3
[5] ChIPpeakAnno_3.28.0 msigdbr_7.4.1
[7] ggupset_0.3.0 DOSE_3.20.1
[9] RColorBrewer_1.1-3 fgsea_1.20.0
[11] org.Hs.eg.db_3.14.0 enrichplot_1.14.2
[13] clusterProfiler_4.2.2 pheatmap_1.0.12
[15] VennDiagram_1.7.3 futile.logger_1.4.3
[17] enrichR_3.2 rtracklayer_1.54.0
[19] TxDb.Hsapiens.UCSC.hg38.knownGene_3.14.0 GenomicFeatures_1.46.1
[21] AnnotationDbi_1.56.1 DiffBind_3.4.11
[23] BiocParallel_1.28.3 BiocManager_1.30.23
[25] openxlsx_4.2.5.2 readxl_1.4.3
[27] EnhancedVolcano_1.13.2 ggrepel_0.9.5
[29] magrittr_2.0.3 lubridate_1.9.3
[31] forcats_1.0.0 stringr_1.5.1
[33] dplyr_1.1.4 purrr_1.0.2
[35] readr_2.1.5 tidyr_1.3.1
[37] tibble_3.2.1 tidyverse_2.0.0
[39] DESeq2_1.34.0 SummarizedExperiment_1.24.0
[41] MatrixGenerics_1.6.0 matrixStats_0.62.0
[43] GenomicRanges_1.46.1 GenomeInfoDb_1.30.1
[45] IRanges_2.28.0 S4Vectors_0.32.4
[47] Biobase_2.54.0 BiocGenerics_0.40.0
[49] limma_3.50.3 ggplot2_3.5.1
loaded via a namespace (and not attached):
[1] utf8_1.2.2 tidyselect_1.2.1 RSQLite_2.2.8 htmlwidgets_1.5.4
[5] scatterpie_0.1.7 munsell_0.5.0 systemPipeR_2.0.4 withr_2.5.0
[9] colorspace_2.0-3 GOSemSim_2.20.0 filelock_1.0.2 rstudioapi_0.16.0
[13] bbmle_1.0.24 GenomeInfoDbData_1.2.7 mixsqp_0.3-43 hwriter_1.3.2
[17] polyclip_1.10-0 bit64_4.0.5 farver_2.1.1 downloader_0.4
[21] treeio_1.18.1 coda_0.19-4 vctrs_0.6.5 generics_0.1.3
[25] lambda.r_1.2.4 timechange_0.3.0 BiocFileCache_2.2.0 regioneR_1.26.0
[29] R6_2.5.1 apeglm_1.16.0 graphlayouts_0.8.0 invgamma_1.1
[33] locfit_1.5-9.4 AnnotationFilter_1.18.0 bitops_1.0-7 cachem_1.0.6
[37] gridGraphics_0.5-1 DelayedArray_0.20.0 assertthat_0.2.1 BiocIO_1.4.0
[41] scales_1.3.0 ggraph_2.0.5 gtable_0.3.1 ensembldb_2.18.2
[45] tidygraph_1.2.0 rlang_1.1.4 genefilter_1.76.0 splines_4.1.1
[49] lazyeval_0.2.2 yaml_2.3.6 qvalue_2.26.0 RBGL_1.70.0
[53] tools_4.1.1 ggplotify_0.1.0 gplots_3.1.3.1 Rcpp_1.0.9
[57] plyr_1.8.7 progress_1.2.2 zlibbioc_1.40.0 RCurl_1.98-1.9
[61] prettyunits_1.1.1 viridis_0.6.2 ashr_2.2-47 futile.options_1.0.1
[65] DO.db_2.9 truncnorm_1.0-8 mvtnorm_1.1-3 SQUAREM_2021.1
[69] amap_0.8-19 ProtGenerics_1.26.0 patchwork_1.1.1 hms_1.1.3
[73] xtable_1.8-4 XML_3.99-0.9 emdbook_1.3.12 jpeg_0.1-9
[77] gridExtra_2.3 compiler_4.1.1 biomaRt_2.50.0 bdsmatrix_1.3-7
[81] shadowtext_0.0.9 KernSmooth_2.23-20 crayon_1.5.2 htmltools_0.5.2
[85] ggfun_0.0.4 tzdb_0.2.0 geneplotter_1.72.0 aplot_0.1.1
[89] DBI_1.1.2 tweenr_1.0.2 formatR_1.11 WriteXLS_6.6.0
[93] dbplyr_2.1.1 MASS_7.3-54 rappdirs_0.3.3 boot_1.3-28
[97] babelgene_21.4 ShortRead_1.52.0 Matrix_1.3-4 cli_3.6.3
[101] parallel_4.1.1 igraph_2.0.3 pkgconfig_2.0.3 GenomicAlignments_1.30.0
[105] numDeriv_2016.8-1.1 InteractionSet_1.22.0 xml2_1.3.3 ggtree_3.2.1
[109] annotate_1.72.0 multtest_2.50.0 XVector_0.34.0 yulab.utils_0.0.4
[113] digest_0.6.30 graph_1.72.0 Biostrings_2.62.0 cellranger_1.1.0
[117] fastmatch_1.1-3 tidytree_0.3.6 restfulr_0.0.13 GreyListChIP_1.26.0
[121] curl_4.3.3 Rsamtools_2.10.0 gtools_3.9.3 rjson_0.2.20
[125] jsonlite_1.8.8 nlme_3.1-153 lifecycle_1.0.3 viridisLite_0.4.1
[129] BSgenome_1.62.0 fansi_1.0.3 pillar_1.9.0 lattice_0.20-45
[133] plotrix_3.8-2 KEGGREST_1.34.0 fastmap_1.1.0 httr_1.4.4
[137] survival_3.2-13 GO.db_3.14.0 glue_1.6.2 zip_2.2.0
[141] png_0.1-7 bit_4.0.4 ggforce_0.3.3 stringi_1.7.8
[145] blob_1.2.2 latticeExtra_0.6-29 caTools_1.18.2 memoise_2.0.1
[149] ape_5.5 irlba_2.3.5.1
thank you for your help. but if I want to assign each query to its subjecthits I can have as results a data with the genomic coordinates well assigned?
Can you restate the question? It's not clear to me what you are asking.
sure! in ur example, findOverlaps(gr1, gr2) the 1st query is assigned to the 5th subjectHits, and the 2nd query is assigned to the 4th subjectHits. this is result represented as the indices of each gr1 and gr2, what if I want to represent it as genomic coordinates and not only the indices. So instead of having
we'll have, ie
chr1 overlaps with chr5 on this know position
How can we do this?
We could use a
GenomicPairs
object for that.ohh interesting! I didn't know about that. Thank you so much for ur help :)
GAlignmentPairs objects are highly specialized objects used to represent aligned paired-end reads, typically by loading them from a BAM file. They are not really the right kind of object for representing the ranges of the overlaps between two GRanges objects. Better to use
pintersect()
for that:The returned GRanges object is _parallel_ to the Hits object
fo
and contains the range of the overlap for each hit info
. It can be stored in the Hits object itself as a metadata column:Hope this helps,
H.