Question

How to search a .narrowPeak file for a list of specific genes using GenomicRanges package

0

Entering edit mode

stacy.genovese • 0

@fce3b503

Last seen 2.1 years ago

United States

I'm doing a ChipSeq analysis for the first time and have some basic questions. I successfully ran macs2 callpeak() and have a .narrowPeak file that I can load into IGV. I also have an .xls file with the names of specific genes we are interested in. I can load my .narrowPeak file into IGV, manually type in the gene name, and determine if my TF binds but the list is over 5000 genes long so doing this manually isn't an option. I've been told to look into the GenomicRanges package but would love some direction. I need output that lists each of the genes with a column of 0/1 to indicate if the gene bound somewhere in my .narrowPeak file.

Thanks in advance for any help, Stacy

sessionInfo( )
R version 4.2.2 (2022-10-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.6.2

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] readxl_1.4.1                             openxlsx_4.2.5.1                         dplyr_1.0.10                            
 [4] fastqcr_0.1.2                            msigdbr_7.5.1                            clusterProfiler_4.6.0                   
 [7] ggupset_0.3.0                            UpSetR_1.4.0                             ChIPseeker_1.34.1                       
[10] rtracklayer_1.58.0                       org.Hs.eg.db_3.16.0                      TxDb.Hsapiens.UCSC.hg38.knownGene_3.16.0
[13] GenomicFeatures_1.50.3                   AnnotationDbi_1.60.0                     Rsamtools_2.14.0                        
[16] Biostrings_2.66.0                        XVector_0.38.0                           ChIPQC_1.34.0                           
[19] BiocParallel_1.32.5                      DiffBind_3.8.3                           SummarizedExperiment_1.28.0             
[22] Biobase_2.58.0                           MatrixGenerics_1.10.0                    matrixStats_0.63.0                      
[25] GenomicRanges_1.50.2                     GenomeInfoDb_1.34.4                      IRanges_2.32.0                          
[28] S4Vectors_0.36.1                         BiocGenerics_0.44.0                      ggplot2_3.4.0                           

loaded via a namespace (and not attached):
  [1] utf8_1.2.2                                tidyselect_1.2.0                          RSQLite_2.2.20                           
  [4] htmlwidgets_1.6.0                         grid_4.2.2                                scatterpie_0.1.8                         
  [7] munsell_0.5.0                             codetools_0.2-18                          interp_1.1-3                             
 [10] systemPipeR_2.4.0                         withr_2.5.0                               colorspace_2.0-3                         
 [13] GOSemSim_2.24.0                           filelock_1.0.2                            knitr_1.41                               
 [16] DOSE_3.24.2                               labeling_0.4.2                            bbmle_1.0.25                             
 [19] GenomeInfoDbData_1.2.9                    mixsqp_0.3-48                             hwriter_1.3.2.1                          
 [22] polyclip_1.10-4                           bit64_4.0.5                               farver_2.1.1                             
 [25] downloader_0.4                            coda_0.19-4                               vctrs_0.5.1                              
 [28] treeio_1.22.0                             TxDb.Rnorvegicus.UCSC.rn4.ensGene_3.2.2   generics_0.1.3                           
 [31] xfun_0.36                                 gson_0.0.9                                BiocFileCache_2.6.0                      
 [34] R6_2.5.1                                  apeglm_1.20.0                             graphlayouts_0.8.4                       
 [37] invgamma_1.1                              locfit_1.5-9.6                            bitops_1.0-7                             
 [40] cachem_1.0.6                              fgsea_1.24.0                              gridGraphics_0.5-1                       
 [43] DelayedArray_0.24.0                       assertthat_0.2.1                          vroom_1.6.0                              
 [46] BiocIO_1.8.0                              scales_1.2.1                              ggraph_2.1.0                             
 [49] enrichplot_1.18.3                         gtable_0.3.1                              tidygraph_1.2.2                          
 [52] rlang_1.0.6                               splines_4.2.2                             lazyeval_0.2.2                           
 [55] selectr_0.4-2                             yaml_2.3.6                                reshape2_1.4.4                           
 [58] TxDb.Dmelanogaster.UCSC.dm3.ensGene_3.2.2 qvalue_2.30.0                             tools_4.2.2                              
 [61] ggplotify_0.1.0                           ellipsis_0.3.2                            gplots_3.1.3                             
 [64] jquerylib_0.1.4                           RColorBrewer_1.1-3                        Rcpp_1.0.9                               
 [67] plyr_1.8.8                                progress_1.2.2                            zlibbioc_1.44.0                          
 [70] purrr_1.0.0                               RCurl_1.98-1.9                            prettyunits_1.1.1                        
 [73] deldir_1.0-6                              viridis_0.6.2                             ashr_2.2-54                              
 [76] cowplot_1.1.1                             chipseq_1.48.0                            ggrepel_0.9.2                            
 [79] magrittr_2.0.3                            data.table_1.14.6                         TxDb.Hsapiens.UCSC.hg18.knownGene_3.2.2  
 [82] truncnorm_1.0-8                           mvtnorm_1.1-3                             SQUAREM_2021.1                           
 [85] amap_0.8-19                               TxDb.Mmusculus.UCSC.mm9.knownGene_3.2.2   evaluate_0.19                            
 [88] hms_1.1.2                                 patchwork_1.1.2                           HDO.db_0.99.1                            
 [91] XML_3.99-0.13                             emdbook_1.3.12                            jpeg_0.1-10                              
 [94] gridExtra_2.3                             compiler_4.2.2                            biomaRt_2.54.0                           
 [97] bdsmatrix_1.3-6                           tibble_3.1.8                              KernSmooth_2.23-20                       
[100] crayon_1.5.2                              shadowtext_0.1.2                          htmltools_0.5.4                          
[103] tzdb_0.3.0                                ggfun_0.0.9                               tidyr_1.2.1                              
[106] aplot_0.1.9                               DBI_1.1.3                                 tweenr_2.0.2                             
[109] dbplyr_2.2.1                              MASS_7.3-58.1                             rappdirs_0.3.3                           
[112] boot_1.3-28.1                             babelgene_22.9                            readr_2.1.3                              
[115] ShortRead_1.56.1                          Matrix_1.5-3                              cli_3.5.0                                
[118] parallel_4.2.2                            igraph_1.3.5                              pkgconfig_2.0.3                          
[121] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2   GenomicAlignments_1.34.0                  numDeriv_2016.8-1.1                      
[124] TxDb.Celegans.UCSC.ce6.ensGene_3.2.2      xml2_1.3.3                                ggtree_3.6.2                             
[127] bslib_0.4.2                               rvest_1.0.3                               yulab.utils_0.0.6                        
[130] stringr_1.5.0                             digest_0.6.31                             cellranger_1.1.0                         
[133] rmarkdown_2.19                            fastmatch_1.1-3                           tidytree_0.4.2                           
[136] restfulr_0.0.15                           GreyListChIP_1.30.0                       curl_4.3.3                               
[139] gtools_3.9.4                              rjson_0.2.21                              lifecycle_1.0.3                          
[142] nlme_3.1-161                              jsonlite_1.8.4                            viridisLite_0.4.1                        
[145] limma_3.54.0                              BSgenome_1.66.1                           fansi_1.0.3                              
[148] pillar_1.8.1                              lattice_0.20-45                           Nozzle.R1_1.1-1.1                        
[151] KEGGREST_1.38.0                           fastmap_1.1.0                             httr_1.4.4                               
[154] plotrix_3.8-2                             GO.db_3.16.0                              glue_1.6.2                               
[157] zip_2.2.2                                 png_0.1-8                                 bit_4.0.5                                
[160] sass_0.4.4                                ggforce_0.4.1                             stringi_1.7.8                            
[163] blob_1.2.3                                TxDb.Mmusculus.UCSC.mm10.knownGene_3.10.0 latticeExtra_0.6-30                      
[166] caTools_1.18.2                            memoise_2.0.1                             irlba_2.3.5.1                            
[169] ape_5.6-2

ChIPSeq PeakDetection • 2.0k views

ADD COMMENT • link 2.3 years ago • updated 2.2 years ago stacy.genovese • 0

0

Entering edit mode

https://www.biostars.org/p/9550136/#9550147

Why spreading the same question over multiple communities? It is a simple overlap operation, if you provide some example data as requested at biostars this probably comes down to a one-liner.

ADD REPLY • link 2.3 years ago ATpoint ★ 4.8k

0

Entering edit mode

Because you told me to look into GenomicRanges, which I did, and now I'm asking it on the page for the GenomicRanges package. What kind of examples do you need. I have a .narrowPeak file and a .xls with a column that has gene names (like 'MCIDAS') and I need to to somehow find where these two are the same and get output of adds a column to the .xls (or tab delimited format) that has a 1 or 0 if each of those genes are bound in the .narrowPeak file.

ADD REPLY • link 2.3 years ago stacy.genovese • 0

score 1 · Accepted Answer · 2023-01-05

1

Entering edit mode

Vince Schulz ▴ 160

@vince-schulz-3553

Last seen 5 months ago

United States

Try using the annotatePeak function from the ChIPseeker package. This will give information for the nearest gene, whether the region is exonic, intergenic, etc. You can then use match(), %in% or other methods to match up your gene list to the annotated Genomic Range.