Hi,
I'm confused by how DiffBind uses the spike-ins for normalization. My understanding of the manual is that DiffBind calculates the spike-in reads in the bins, and uses those read counts as the library sizes for normalization. I can see that when I set spikein=FALSE
, the $lib.sizes
and the $background$binned$totals
are equal, which is good:
db_data_spikeinNorm2 <- dba.normalize(db_data, spikein = FALSE, background=T, library=DBA_LIBSIZE_BACKGROUND, normalize=DBA_NORM_LIB)
db_data_spikeinNorm2$norm$DESeq2$lib.sizes
[1] 7424321 7030471 8640826 7006223
> db_data_spikeinNorm2$norm$background$binned$totals
[1] 7424321 7030471 8640826 7006223
However, when I set spikein=TRUE
, they are not equal anymore:
db_data_spikeinNorm3 <-dba.normalize(db_data, spikein = TRUE, background=T, library=DBA_LIBSIZE_BACKGROUND, normalize=DBA_NORM_LIB)
db_data_spikeinNorm3$norm$DESeq2$lib.sizes
[1] 7747122 7334460 9112179 7386926
db_data_spikeinNorm3$norm$background$binned$totals
[1] 1970 1923 2638 2262
The $lib.sizes
are still big numbers that are close to the $lib.sizes
from spikein=FALSE
, but not identical. Why are they not equal to the background totals anymore? The binned totals make sense because I have a small number of mapped reads in the spike-in control bams. What's going on with the $lib.sizes
values when spikein=TRUE
?
Thanks.
> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] cowplot_1.1.1 pheatmap_1.0.12 ggrepel_0.9.3
[4] profileplyr_1.12.0 csaw_1.30.1 DiffBind_3.6.5
[7] SummarizedExperiment_1.26.1 Biobase_2.56.0 MatrixGenerics_1.8.1
[10] matrixStats_0.63.0 GenomicRanges_1.48.0 GenomeInfoDb_1.32.4
[13] IRanges_2.30.1 S4Vectors_0.34.0 BiocGenerics_0.42.0
[16] forcats_1.0.0 stringr_1.5.0 dplyr_1.1.0
[19] purrr_1.0.1 readr_2.1.4 tidyr_1.3.0
[22] tibble_3.1.8 ggplot2_3.4.0 tidyverse_1.3.2
Perfect. Thanks!