I'm doing a ChipSeq analysis for the first time and have some basic questions. I successfully ran macs2 callpeak() and have a .narrowPeak file that I can load into IGV. I also have an .xls file with the names of specific genes we are interested in. I can load my .narrowPeak file into IGV, manually type in the gene name, and determine if my TF binds but the list is over 5000 genes long so doing this manually isn't an option. I've been told to look into the GenomicRanges package but would love some direction. I need output that lists each of the genes with a column of 0/1 to indicate if the gene bound somewhere in my .narrowPeak file.
Thanks in advance for any help, Stacy
sessionInfo( )
R version 4.2.2 (2022-10-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.6.2
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] readxl_1.4.1 openxlsx_4.2.5.1 dplyr_1.0.10
[4] fastqcr_0.1.2 msigdbr_7.5.1 clusterProfiler_4.6.0
[7] ggupset_0.3.0 UpSetR_1.4.0 ChIPseeker_1.34.1
[10] rtracklayer_1.58.0 org.Hs.eg.db_3.16.0 TxDb.Hsapiens.UCSC.hg38.knownGene_3.16.0
[13] GenomicFeatures_1.50.3 AnnotationDbi_1.60.0 Rsamtools_2.14.0
[16] Biostrings_2.66.0 XVector_0.38.0 ChIPQC_1.34.0
[19] BiocParallel_1.32.5 DiffBind_3.8.3 SummarizedExperiment_1.28.0
[22] Biobase_2.58.0 MatrixGenerics_1.10.0 matrixStats_0.63.0
[25] GenomicRanges_1.50.2 GenomeInfoDb_1.34.4 IRanges_2.32.0
[28] S4Vectors_0.36.1 BiocGenerics_0.44.0 ggplot2_3.4.0
loaded via a namespace (and not attached):
[1] utf8_1.2.2 tidyselect_1.2.0 RSQLite_2.2.20
[4] htmlwidgets_1.6.0 grid_4.2.2 scatterpie_0.1.8
[7] munsell_0.5.0 codetools_0.2-18 interp_1.1-3
[10] systemPipeR_2.4.0 withr_2.5.0 colorspace_2.0-3
[13] GOSemSim_2.24.0 filelock_1.0.2 knitr_1.41
[16] DOSE_3.24.2 labeling_0.4.2 bbmle_1.0.25
[19] GenomeInfoDbData_1.2.9 mixsqp_0.3-48 hwriter_1.3.2.1
[22] polyclip_1.10-4 bit64_4.0.5 farver_2.1.1
[25] downloader_0.4 coda_0.19-4 vctrs_0.5.1
[28] treeio_1.22.0 TxDb.Rnorvegicus.UCSC.rn4.ensGene_3.2.2 generics_0.1.3
[31] xfun_0.36 gson_0.0.9 BiocFileCache_2.6.0
[34] R6_2.5.1 apeglm_1.20.0 graphlayouts_0.8.4
[37] invgamma_1.1 locfit_1.5-9.6 bitops_1.0-7
[40] cachem_1.0.6 fgsea_1.24.0 gridGraphics_0.5-1
[43] DelayedArray_0.24.0 assertthat_0.2.1 vroom_1.6.0
[46] BiocIO_1.8.0 scales_1.2.1 ggraph_2.1.0
[49] enrichplot_1.18.3 gtable_0.3.1 tidygraph_1.2.2
[52] rlang_1.0.6 splines_4.2.2 lazyeval_0.2.2
[55] selectr_0.4-2 yaml_2.3.6 reshape2_1.4.4
[58] TxDb.Dmelanogaster.UCSC.dm3.ensGene_3.2.2 qvalue_2.30.0 tools_4.2.2
[61] ggplotify_0.1.0 ellipsis_0.3.2 gplots_3.1.3
[64] jquerylib_0.1.4 RColorBrewer_1.1-3 Rcpp_1.0.9
[67] plyr_1.8.8 progress_1.2.2 zlibbioc_1.44.0
[70] purrr_1.0.0 RCurl_1.98-1.9 prettyunits_1.1.1
[73] deldir_1.0-6 viridis_0.6.2 ashr_2.2-54
[76] cowplot_1.1.1 chipseq_1.48.0 ggrepel_0.9.2
[79] magrittr_2.0.3 data.table_1.14.6 TxDb.Hsapiens.UCSC.hg18.knownGene_3.2.2
[82] truncnorm_1.0-8 mvtnorm_1.1-3 SQUAREM_2021.1
[85] amap_0.8-19 TxDb.Mmusculus.UCSC.mm9.knownGene_3.2.2 evaluate_0.19
[88] hms_1.1.2 patchwork_1.1.2 HDO.db_0.99.1
[91] XML_3.99-0.13 emdbook_1.3.12 jpeg_0.1-10
[94] gridExtra_2.3 compiler_4.2.2 biomaRt_2.54.0
[97] bdsmatrix_1.3-6 tibble_3.1.8 KernSmooth_2.23-20
[100] crayon_1.5.2 shadowtext_0.1.2 htmltools_0.5.4
[103] tzdb_0.3.0 ggfun_0.0.9 tidyr_1.2.1
[106] aplot_0.1.9 DBI_1.1.3 tweenr_2.0.2
[109] dbplyr_2.2.1 MASS_7.3-58.1 rappdirs_0.3.3
[112] boot_1.3-28.1 babelgene_22.9 readr_2.1.3
[115] ShortRead_1.56.1 Matrix_1.5-3 cli_3.5.0
[118] parallel_4.2.2 igraph_1.3.5 pkgconfig_2.0.3
[121] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 GenomicAlignments_1.34.0 numDeriv_2016.8-1.1
[124] TxDb.Celegans.UCSC.ce6.ensGene_3.2.2 xml2_1.3.3 ggtree_3.6.2
[127] bslib_0.4.2 rvest_1.0.3 yulab.utils_0.0.6
[130] stringr_1.5.0 digest_0.6.31 cellranger_1.1.0
[133] rmarkdown_2.19 fastmatch_1.1-3 tidytree_0.4.2
[136] restfulr_0.0.15 GreyListChIP_1.30.0 curl_4.3.3
[139] gtools_3.9.4 rjson_0.2.21 lifecycle_1.0.3
[142] nlme_3.1-161 jsonlite_1.8.4 viridisLite_0.4.1
[145] limma_3.54.0 BSgenome_1.66.1 fansi_1.0.3
[148] pillar_1.8.1 lattice_0.20-45 Nozzle.R1_1.1-1.1
[151] KEGGREST_1.38.0 fastmap_1.1.0 httr_1.4.4
[154] plotrix_3.8-2 GO.db_3.16.0 glue_1.6.2
[157] zip_2.2.2 png_0.1-8 bit_4.0.5
[160] sass_0.4.4 ggforce_0.4.1 stringi_1.7.8
[163] blob_1.2.3 TxDb.Mmusculus.UCSC.mm10.knownGene_3.10.0 latticeExtra_0.6-30
[166] caTools_1.18.2 memoise_2.0.1 irlba_2.3.5.1
[169] ape_5.6-2
https://www.biostars.org/p/9550136/#9550147
Why spreading the same question over multiple communities? It is a simple overlap operation, if you provide some example data as requested at biostars this probably comes down to a one-liner.
Because you told me to look into GenomicRanges, which I did, and now I'm asking it on the page for the GenomicRanges package. What kind of examples do you need. I have a .narrowPeak file and a .xls with a column that has gene names (like 'MCIDAS') and I need to to somehow find where these two are the same and get output of adds a column to the .xls (or tab delimited format) that has a 1 or 0 if each of those genes are bound in the .narrowPeak file.