Hello,
I am trying to calculate the Fragment Length and Relative Cross-Coverage for a sequencing experiment using a non-model microbe. DNA was sheared to ~300bp during library prep, used single-end library sequencing, and my peaks were called using Mosaics in R (fragment length 300, bin size 200). I used the information here and here to make a custom annotation list from my organisms .gff file and that appears to work.
txdb <- GenomicFeatures::makeTxDbFromGFF("20181113_hcah_GCF_000223905.1_ASM22390v1_genomic.gff", format = "gff")
txn <- GenomicFeatures::transcripts(txdb)
gene <- unlist(GenomicFeatures::cdsBy(txdb, "tx"))
pro500 <- GenomicFeatures::promoters(txdb, upstream = 500, downstream = 0)
pro250 <- GenomicFeatures::promoters(txdb, upstream = 250, downstream = 0)
hha <- list(version="",
gff.features = txn, #all features in gff
genes = gene, #specifically CDS
promoters250 = pro250, #250bp upstream of any feature
promoters500 = pro500 #500bp upstream of any feature
However, when I run either ChIPQCsample() or ChIPQC() my output contains only Tissue, Factor, Condition, Control, Replicate, Peaks. It is not returning fragment length calculations or cross-correlation scores. Is it because my data are not paired-end? For brevity, I've only included the error information for ChIPQCsample(). The output is the same for ChIPQC(), just over my 16 samples.
###test on single file
bam <- "data/bams/4774_C5_S74_L004_R1_001_trimmed_sorted.bam"
peeks <- "data/peakcalls/HCA_trmB_gc_0glu_A.bed"
peek <- rtracklayer::import(peeks, format = "bed")
tmp2 = ChIPQCsample(reads = bam, peaks = peek, annotation = hha)
The output:
I haven't been able to identify the source of the NA integer warning. It is also present if I do not include the annotation file.
the condition has length > 1 and only the first element will be usedBam file has 3 contigs Calculating coverage histogram for NC015948.1 Calculating SSD for NC015948.1 Calculating unique positions per strand for NC015948.1 Calculating shift for NC015948.1 300 / 300 removing1peaks within 400bp of chromosome edges Counting reads in features for NC015948.1 Signal over peaks for NC015948.1 done Calculating coverage Calculating Summits on NC015948.1 ..Calculating coverage histogram for NC015943.1 Calculating SSD for NC015943.1 Calculating unique positions per strand for NC015943.1 Calculating shift for NC015943.1 300 / 300 removing1peaks within 400bp of chromosome edges Counting reads in features for NC015943.1 Signal over peaks for NC015943.1 done Calculating coverage Calculating Summits on NC015943.1 ..Calculating coverage histogram for NC015944.1 Calculating SSD for NC015944.1 Calculating unique positions per strand for NC015944.1 Calculating shift for NC015944.1 300 / 300 removing1peaks within 400bp of chromosome edges Counting reads in features for NC015944.1 Signal over peaks for NC015944.1 done Calculating coverage Calculating Summits on NC_015944.1 .. [1] 1 [1] 1 [1] 1 NAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflowNAs produced by integer overflow.....
tmp2
ChIPQCsample Number of Mapped reads: 2167362 Number of Mapped reads passing MapQ filter: 2136201 Percentage Of Reads as Non-Duplicates (NRF): 100(0) Percentage Of Reads in Blacklisted Regions: NA SSD: 22.8916269433196 Fragment Length Cross-Coverage: Relative Cross-Coverage: Percentage Of Reads in GenomicFeature: Percentage Of Reads in Peaks: 9.28 Number of Peaks: 192
ProportionOfCounts Peaks 0.09284473
gff.features 0.89566946
genes 0.88093443
promoters250 0.28362640
promoters500 0.47769943
A separate error (perhaps a bug - others have similar issues that haven't been resolved, see below for links) is preventing the generation of a complete report. All plots are generated, but the final html is not compiled. The CC_score plot is empty.
QCmetrics(tmp2)
QCmetrics(tmp2) Error in names(res) <- c("Reads", "Map%", "Filt%", "Dup%", "ReadL", "FragL", : 'names' attribute [9] must be the same length as the vector [7]
ChIPQCreport(tmp2)
ChIPQCreport(tmp2) Saving 7 x 7 in image Error in names(res) <- c("Reads", "Map%", "Filt%", "Dup%", "ReadL", "FragL", : 'names' attribute [9] must be the same length as the vector [7]
Here is a list of others reporting the same issues: https://support.bioconductor.org/p/117577/ https://support.bioconductor.org/p/122402/ https://support.bioconductor.org/p/122763/ https://support.bioconductor.org/p/121941/ https://www.biostars.org/p/377937/
It's difficult to troubleshoot because I am not sure if these are independent issues or related. Any help would be appreciated. Some users have suggested reverting to an older version of ChIPQC, but I am having issues with dependencies for older versions of DiffBind. If that's the best solution, I'll make another post.
Session info:
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)
Matrix products: default
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] ChIPQC_1.20.0 DiffBind_2.12.0 SummarizedExperiment_1.14.1 DelayedArray_0.10.0 BiocParallel_1.18.1
[6] matrixStats_0.54.0 Biobase_2.44.0 GenomicRanges_1.36.1 GenomeInfoDb_1.20.0 IRanges_2.18.2
[11] S4Vectors_0.22.0 BiocGenerics_0.30.0 forcats_0.4.0 stringr_1.4.0 dplyr_0.8.3
[16] purrr_0.3.2 readr_1.3.1 tidyr_0.8.3 tibble_2.1.3 ggplot2_3.2.1
[21] tidyverse_1.2.1
loaded via a namespace (and not attached):
[1] amap_0.8-17 colorspace_1.4-1 rjson_0.2.20
[4] hwriter_1.3.2 XVector_0.24.0 rstudioapi_0.10
[7] ggrepel_0.8.1 bit64_0.9-7 AnnotationDbi_1.46.1
[10] lubridate_1.7.4 xml2_1.2.2 splines_3.6.1
[13] TxDb.Rnorvegicus.UCSC.rn4.ensGene_3.2.2 knitr_1.24 zeallot_0.1.0
[16] Nozzle.R1_1.1-1 jsonlite_1.6 Rsamtools_2.0.0
[19] broom_0.5.2 annotate_1.62.0 GO.db_3.8.2
[22] pheatmap_1.0.12 graph_1.62.0 TxDb.Hsapiens.UCSC.hg18.knownGene_3.2.2
[25] BiocManager_1.30.4 compiler_3.6.1 httr_1.4.1
[28] GOstats_2.50.0 backports_1.1.4 assertthat_0.2.1
[31] Matrix_1.2-17 lazyeval_0.2.2 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[34] limma_3.40.6 cli_1.1.0 prettyunits_1.0.2
[37] tools_3.6.1 gtable_0.3.0 glue_1.3.1
[40] GenomeInfoDbData_1.2.1 Category_2.50.0 reshape2_1.4.3
[43] systemPipeR_1.18.2 rappdirs_0.3.1 batchtools_0.9.11
[46] ShortRead_1.42.0 Rcpp_1.0.2 TxDb.Dmelanogaster.UCSC.dm3.ensGene_3.2.2
[49] TxDb.Mmusculus.UCSC.mm9.knownGene_3.2.2 cellranger_1.1.0 vctrs_0.2.0
[52] Biostrings_2.52.0 gdata_2.18.0 nlme_3.1-140
[55] rtracklayer_1.44.4 TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.7 xfun_0.9
[58] rvest_0.3.4 gtools_3.8.1 XML_3.98-1.20
[61] edgeR_3.26.8 zlibbioc_1.30.0 scales_1.0.0
[64] BSgenome_1.52.0 VariantAnnotation_1.30.1 hms_0.5.1
[67] RBGL_1.60.0 RColorBrewer_1.1-2 yaml_2.2.0
[70] memoise_1.1.0 biomaRt_2.40.4 latticeExtra_0.6-28
[73] stringi_1.4.3 RSQLite_2.1.2 genefilter_1.66.0
[76] checkmate_1.9.4 GenomicFeatures_1.36.4 caTools_1.17.1.2
[79] chipseq_1.34.0 rlang_0.4.0 pkgconfig_2.0.2
[82] bitops_1.0-6 TxDb.Celegans.UCSC.ce6.ensGene_3.2.2 lattice_0.20-38
[85] labeling_0.3 GenomicAlignments_1.20.1 bit_1.1-14
[88] tidyselect_0.2.5 GSEABase_1.46.0 AnnotationForge_1.26.0
[91] plyr_1.8.4 magrittr_1.5 R6_2.4.0
[94] gplots_3.0.1.1 generics_0.0.2 base64url_1.4
[97] DBI_1.0.0 pillar_1.4.2 haven_2.1.1
[100] withr_2.1.2 survival_2.44-1.1 RCurl_1.95-4.12
[103] modelr_0.1.5 crayon_1.3.4 KernSmooth_2.23-15
[106] progress_1.2.2 locfit_1.5-9.1 grid_3.6.1
[109] readxl_1.3.1 data.table_1.12.2 blob_1.2.0
[112] Rgraphviz_2.28.0 digest_0.6.20 xtable_1.8-4
[115] brew_1.0-6 munsell_0.5.0
You are correct. The problem is caused by single end data which would not return fragment length. I fixed it in my fork project. But I hope the author will fix it in future.
https://github.com/shengqh/ChIPQC