Genes in ChIPseeker annotation do not exist in the organism
Entering edit mode
Sam ▴ 10
Last seen 5 months ago

Note: I have posted this on GitHub, as this seems more appropriate but I do not know how to remove a post.

I annotated using ChIPseeker mouse chip-seq data (aligned to GRCm38). I am interested in seeing the distribution of all the reads, not only of the peaks. For this purpose, I have downsampled the alignment bam files to 1M reads, and converted the file to bed format (hope this is kosher). The TxDb was created using Ensembl database.

Problem : The resulting geneChr column in the annotated files (should reflect the chromosome of the nearest gene) makes no sense - the chromosomes numbers do not exist in mouse. See bellow.

txdb <- makeTxDbFromBiomart(dataset="mmusculus_gene_ensembl")

file_list <- list(WT = "1.bed", CKO = "2.bed")

# Checking to see the chromosome names are ok in the files

[1] 1          2          3          4          5          6          7          8          9         
[10] 10         11         12         13         14         15         16         17         18        
[19] 19         X          Y          MT         GL456233.1 GL456211.1 JH584304.1 GL456379.1 GL456216.1
[28] GL456393.1 GL456366.1 GL456383.1 GL456360.1 GL456378.1 GL456389.1 GL456370.1 GL456390.1 GL456394.1
[37] GL456392.1 GL456396.1 GL456368.1
39 Levels ...

files_anno <- lapply(files, annotatePeak, TxDb=txdb, tssRegion = c(-3000,3000), verbose=TRUE)

# Why should the gene chromosomes be such?

 [1]  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95 103  97 139
[26] 100

sessionInfo( )

R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] diffloop_1.12.0        GenomicFeatures_1.36.4 AnnotationDbi_1.46.1   Biobase_2.44.0        
 [5] GenomicRanges_1.36.1   GenomeInfoDb_1.20.0    IRanges_2.18.3         S4Vectors_0.22.1      
 [9] BiocGenerics_0.30.0    ChIPseeker_1.20.0     

loaded via a namespace (and not attached):
  [1] fgsea_1.10.1                            colorspace_2.0-0                       
  [3] ellipsis_0.3.1                          ggridges_0.5.2                         
  [5] qvalue_2.16.0                           XVector_0.24.0                         
  [7] base64enc_0.1-3                         rstudioapi_0.13                        
  [9] farver_2.0.3                            urltools_1.7.3                         
 [11] graphlayouts_0.7.1                      ggrepel_0.8.2                          
 [13] bit64_4.0.5                             xml2_1.3.2                             
 [15] codetools_0.2-18                        splines_3.6.3                          
 [17] GOSemSim_2.10.0                         knitr_1.28                             
 [19] polyclip_1.10-0                         jsonlite_1.7.1                         
 [21] Rsamtools_2.0.3                         gridBase_0.4-7                         
 [23] GO.db_3.8.2                             ggforce_0.3.2                          
 [25] readr_1.4.0                             BiocManager_1.30.10                    
 [27] compiler_3.6.3                          httr_1.4.2                             
 [29] rvcheck_0.1.8                           Matrix_1.2-18                          
 [31] limma_3.40.6                            TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
 [33] tweenr_1.0.1                            htmltools_0.5.0                        
 [35] prettyunits_1.1.1                       tools_3.6.3                            
 [37] igraph_1.2.6                            gtable_0.3.0                           
 [39] glue_1.4.2                              GenomeInfoDbData_1.2.1                 
 [41] reshape2_1.4.4                          DO.db_2.9                              
 [43] dplyr_1.0.2                             fastmatch_1.1-0                        
 [45] Rcpp_1.0.5                              enrichplot_1.4.0                       
 [47] vctrs_0.3.5                             Biostrings_2.52.0                      
 [49] rtracklayer_1.44.4                      iterators_1.0.13                       
 [51] ggraph_2.0.4                            xfun_0.13                              
 [53] stringr_1.4.0                           lifecycle_0.2.0                        
 [55] gtools_3.8.2                            statmod_1.4.35                         
 [57] XML_3.99-0.3                            Sushi_1.22.0                           
 [59] DOSE_3.10.2                             edgeR_3.26.8                           
 [61] zoo_1.8-8                               europepmc_0.4                          
 [63] zlibbioc_1.30.0                         MASS_7.3-53                            
 [65] scales_1.1.1                            tidygraph_1.2.0                        
 [67] hms_0.5.3                               SummarizedExperiment_1.14.1            
 [69] RColorBrewer_1.1-2                      yaml_2.2.1                             
 [71] curl_4.3                                pbapply_1.4-3                          
 [73] memoise_1.1.0                           gridExtra_2.3                          
 [75] ggplot2_3.3.2                           UpSetR_1.4.0                           
 [77] biomaRt_2.40.5                          triebeard_0.3.0                        
 [79] stringi_1.5.3                           RSQLite_2.2.1                          
 [81] foreach_1.5.1                           plotrix_3.7-8                          
 [83] caTools_1.18.0                          boot_1.3-25                            
 [85] BiocParallel_1.18.1                     rlang_0.4.8                            
 [87] pkgconfig_2.0.3                         matrixStats_0.57.0                     
 [89] bitops_1.0-6                            evaluate_0.14                          
 [91] lattice_0.20-41                         purrr_0.3.4                            
 [93] labeling_0.4.2                          GenomicAlignments_1.20.1               
 [95] cowplot_1.1.0                           bit_4.0.4                              
 [97] tidyselect_1.1.0                        plyr_1.8.6                             
 [99] magrittr_2.0.1                          R6_2.5.0                               
[101] gplots_3.1.0                            generics_0.1.0                         
[103] DelayedArray_0.10.0                     DBI_1.1.0                              
[105] pillar_1.4.6                            RCurl_1.98-1.2                         
[107] tibble_3.0.4                            crayon_1.3.4                           
[109] KernSmooth_2.23-18                      rmarkdown_2.1                          
[111] viridis_0.5.1                           progress_1.2.2                         
[113] locfit_1.5-9.4                          grid_3.6.3                             
[115] data.table_1.13.2                       blob_1.2.1                             
[117] digest_0.6.27                           tidyr_1.1.2                            
[119] gridGraphics_0.5-0                      munsell_0.5.0                          
[121] viridisLite_0.3.0                       ggplotify_0.0.5
ChIPseeker • 934 views

Login before adding your answer.

Traffic: 1139 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6