Hi,
I'm quality-checking the mouse ChIP-seq data with ChIPQC, but for some reason the coverage plot behaves weirdly. After this, I ran the samples by chromosomes and found out it is chr Y - which has much less reads than others - which is causing the trouble. To demonstrate, here are coverage plots with all the chromosomes, with all but chr Y, and with chr Y only.
This is all the data (chr Y included)
https://www.dropbox.com/s/m1e2dzzgqlddumf/with_Y.png?dl=0
This is data without chr Y
https://www.dropbox.com/s/rm4qcou0uuwd030/without_Y.png?dl=0
And this is chr Y itself.
https://www.dropbox.com/s/jsbzjsxovbmbwc5/Y.png?dl=0
Any ideas what is going on, is there something wrong with the data / program?
This is my code, (producing the reports in the same order as are the figures):
library("ChIPQC") samples <- read.csv("samples_ChIPQC", stringsAsFactors = FALSE, header = TRUE) exampleExp = ChIPQC(samples, chromosomes = NULL) ChIPQCreport(exampleExp, reportFolder = "bc_all") #"chromosomes" just lists the chromosomes, Y being the last, 21th chrs <- (read.table("chromosomes", stringsAsFactors = FALSE))[,1] exampleExp = ChIPQC(samples, chromosomes = chrs[-21]) ChIPQCreport(exampleExp, reportFolder = "bc_without_Y")
exampleExp = ChIPQC(samples, chromosomes = "chrY") ChIPQCreport(exampleExp, reportFolder = "bc_chrY")
This is how 'samples' look like.
> samples SampleID Tissue bamReads ControlID bamControl Peaks PeakCaller 1 S1 T1 s_1_001.bam S1_ctrl S1_ctrl.bam S1_peaks.bed MACS 2 S2 T2 s_2_001.bam S2_ctrl S2_ctrl.bam S2_peaks.bed MACS
sessionInfo()
> sessionInfo() R version 3.5.0 (2018-04-23) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS High Sierra 10.13.2 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] ChIPQC_1.16.0 DiffBind_2.8.0 SummarizedExperiment_1.10.1 DelayedArray_0.6.2 BiocParallel_1.14.2 matrixStats_0.54.0 [7] Biobase_2.40.0 GenomicRanges_1.32.6 GenomeInfoDb_1.16.0 IRanges_2.14.10 S4Vectors_0.18.3 BiocGenerics_0.26.0 [13] ggplot2_3.0.0 loaded via a namespace (and not attached): [1] amap_0.8-16 colorspace_1.3-2 rjson_0.2.20 [4] hwriter_1.3.2 XVector_0.20.0 base64enc_0.1-3 [7] rstudioapi_0.7 ggrepel_0.8.0 bit64_0.9-7 [10] AnnotationDbi_1.42.1 splines_3.5.0 TxDb.Rnorvegicus.UCSC.rn4.ensGene_3.2.2 [13] Nozzle.R1_1.1-1 Rsamtools_1.32.2 annotate_1.58.0 [16] GO.db_3.6.0 pheatmap_1.0.10 graph_1.58.0 [19] TxDb.Hsapiens.UCSC.hg18.knownGene_3.2.2 compiler_3.5.0 httr_1.3.1 [22] GOstats_2.46.0 backports_1.1.2 assertthat_0.2.0 [25] Matrix_1.2-14 lazyeval_0.2.1 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 [28] limma_3.36.2 prettyunits_1.0.2 tools_3.5.0 [31] bindrcpp_0.2.2 gtable_0.2.0 glue_1.3.0 [34] GenomeInfoDbData_1.1.0 Category_2.46.0 reshape2_1.4.3 [37] systemPipeR_1.14.0 dplyr_0.7.6 ShortRead_1.38.0 [40] Rcpp_0.12.18 TxDb.Dmelanogaster.UCSC.dm3.ensGene_3.2.2 TxDb.Mmusculus.UCSC.mm9.knownGene_3.2.2 [43] Biostrings_2.48.0 gdata_2.18.0 rtracklayer_1.40.3 [46] TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.0 stringr_1.3.1 gtools_3.8.1 [49] XML_3.98-1.13 edgeR_3.22.3 zlibbioc_1.26.0 [52] scales_0.5.0 hms_0.4.2 RBGL_1.56.0 [55] RColorBrewer_1.1-2 BBmisc_1.11 memoise_1.1.0 [58] biomaRt_2.36.1 latticeExtra_0.6-28 stringi_1.2.4 [61] RSQLite_2.1.1 genefilter_1.62.0 checkmate_1.8.5 [64] GenomicFeatures_1.32.0 caTools_1.17.1.1 chipseq_1.30.0 [67] rlang_0.2.1 pkgconfig_2.0.1 BatchJobs_1.7 [70] bitops_1.0-6 TxDb.Celegans.UCSC.ce6.ensGene_3.2.2 lattice_0.20-35 [73] purrr_0.2.5 bindr_0.1.1 labeling_0.3 [76] GenomicAlignments_1.16.0 bit_1.1-14 tidyselect_0.2.4 [79] GSEABase_1.42.0 AnnotationForge_1.22.1 plyr_1.8.4 [82] magrittr_1.5 sendmailR_1.2-1 R6_2.2.2 [85] gplots_3.0.1 DBI_1.0.0 pillar_1.3.0 [88] withr_2.1.2 survival_2.42-6 RCurl_1.95-4.11 [91] tibble_1.4.2 crayon_1.3.4 KernSmooth_2.23-15 [94] progress_1.2.0 locfit_1.5-9.1 grid_3.5.0 [97] data.table_1.11.4 blob_1.1.1 Rgraphviz_2.24.0 [100] digest_0.6.15 xtable_1.8-2 brew_1.0-6 [103] munsell_0.5.0
hi,
Would you be able to share the BAM file with me on Box (tc.infomatics@gmail.com) so I can try and find the source of this?
best,
tom
Hi!
Unfortunately the data is not public and I can not share it.
However, it may have something to do with how chromosomes are combined: I took two non-problematic chromosomes, and started to down-sample the other one, and ran ChIPQC for the combined file. Some "cracks" do appear, though not as drastic as in the figures I linked before. I assume that the coverage plots should approach that of the intact chromosome when inputting less and less reads from the other chromosome.
If I find time I try to reproduce this with some public data.
Best,
Tapio