I am running derfinder
to ultimately find DER in a time course but despite the very good documentation I have some doubts about how set the cutoff to find expressed regions. In the 2017 NAR paper, it is stated that an average across all regions and samples is used as a cut-off (Finding ERs):
Here, we focus on a new approach based on the bump-hunting methodology for region level genomic analysis (33) (Figure 2). This approach first calculates ERs across the set of observed samples. For each base, the average, potentially library size-adjusted, coverage is calculated across all samples in the data set. This generates a vector of (normalized) mean level expression measurements across the genome. Then an average-coverage cutoff is applied to this mean coverage vector to identify bases that show minimum levels of expression. An expressed region is any contiguous set of bases that has expression above the mean expression cutoff.
However, in the manual I could not find any indication of this being used, and all examples use a hard cutoff for base coverage. I also looked into help page of filterData
but I am still in the dark.
Two questions:
- Is this filtering option using calculated mean coverage implemented? I could of course calculate that myself and feed it to
fullCoverage
orregionMatrix
but that could be re-inventing the wheel. - If it turns out I am misunderstanding the above paper paragraph, any tips on selecting a reasonable cut-off? I would usually create a distribution of random bases (sampling n times) and then use the median as a threshold.
Unrelated to the main points, the data are normalized bigwigs, generated from paired-end sequencing. Which read-length should bed used in regionMatrix
? Actual read length or something else? derfinder on bacteria PE strand specific RNA-seq is somewhat related but I am not interested in finding expressed regions for each individual strand (at least not at the moment).
Cheers.
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)
Matrix products: default
BLAS: /usr/lib/atlas-base/libf77blas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] DESeq2_1.18.1 SummarizedExperiment_1.8.0
[3] DelayedArray_0.4.1 matrixStats_0.52.2
[5] Biobase_2.38.0 GenomicRanges_1.30.0
[7] GenomeInfoDb_1.14.0 IRanges_2.12.0
[9] S4Vectors_0.16.0 BiocGenerics_0.24.0
[11] derfinder_1.12.0 fortunes_1.5-4
loaded via a namespace (and not attached):
[1] RMySQL_0.10.13 bit64_0.9-7 splines_3.4.1
[4] foreach_1.4.3 GenomicFiles_1.14.0 Formula_1.2-2
[7] bumphunter_1.20.0 assertthat_0.2.0 latticeExtra_0.6-28
[10] doRNG_1.6.6 blob_1.1.0 BSgenome_1.46.0
[13] GenomeInfoDbData_0.99.1 Rsamtools_1.30.0 progress_1.1.2
[16] RSQLite_2.0 backports_1.1.1 lattice_0.20-35
[19] digest_0.6.12 RColorBrewer_1.1-2 XVector_0.18.0
[22] checkmate_1.8.5 qvalue_2.10.0 colorspace_1.3-2
[25] htmltools_0.3.6 Matrix_1.2-12 plyr_1.8.4
[28] devtools_1.13.4 XML_3.98-1.9 biomaRt_2.34.0
[31] genefilter_1.60.0 zlibbioc_1.24.0 xtable_1.8-2
[34] scales_0.5.0 BiocParallel_1.12.0 annotate_1.56.1
[37] htmlTable_1.9 tibble_1.3.4 pkgmaker_0.22
[40] ggplot2_2.2.1 withr_2.1.0 GenomicFeatures_1.30.0
[43] nnet_7.3-12 lazyeval_0.2.1 survival_2.41-3
[46] magrittr_1.5 memoise_1.1.0 foreign_0.8-69
[49] registry_0.3 tools_3.4.1 data.table_1.10.4-3
[52] prettyunits_1.0.2 stringr_1.2.0 locfit_1.5-9.1
[55] munsell_0.4.3 rngtools_1.2.4 cluster_2.0.6
[58] AnnotationDbi_1.40.0 Biostrings_2.46.0 compiler_3.4.1
[61] rlang_0.1.4 grid_3.4.1 RCurl_1.95-4.8
[64] rstudioapi_0.7 iterators_1.0.8 VariantAnnotation_1.24.1
[67] htmlwidgets_0.9 bitops_1.0-6 base64enc_0.1-3
[70] derfinderHelper_1.12.0 gtable_0.2.0 codetools_0.2-15
[73] DBI_0.7 reshape2_1.4.2 R6_2.2.2
[76] GenomicAlignments_1.14.0 gridExtra_2.3 knitr_1.17
[79] rtracklayer_1.38.0 bit_1.1-12 Hmisc_4.0-3
[82] stringi_1.1.6 Rcpp_0.12.13 geneplotter_1.56.0
[85] rpart_4.1-11 acepack_1.4.1