I am trying to use metagenomeSeq to normalize my counts data from different OTUs for a virome-profiling project.
I have loaded my read count data into R, and I've gotten normalization with metagenomeSeq to work. However, the resulting outputs confuse me. Basically, if one sample has significantly less read count abundance than another, why would it be represented as having greater reads? I think something is going wrong with:
p = cumNormStatFast(mrexperiment)
The output always says default value of 0.5 is being used. But I am confused why this is so, because if I calculate the 75th quantile by hand, I get 6,913 for sample 1, 3 for sample 2, and 0 for sample 3. I also notice that p can't be set larger than 1. Am I misinterpreting what a 75th quantile means in this context? Thank you for any help, I no little to nothing about stats and am trying to learn R.
mrdata<-cumNormMat(mrexperiment, p = cumNormStatFast(mrexperiment), sl = 1000)
Default value being used.
# include your problematic code here with any corresponding output
# please also include the results of running the following in an R session
sessionInfo( )
> sessionInfo( )
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS Big Sur 11.0.1
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Denver
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils
[6] datasets methods base
other attached packages:
[1] metagenomeSeq_1.48.1
[2] glmnet_4.1-8
[3] Matrix_1.7-2
[4] IsoformSwitchAnalyzeR_2.6.0
[5] pfamAnalyzeR_1.6.0
[6] dplyr_1.1.4
[7] stringr_1.5.1
[8] readr_2.1.5
[9] ggplot2_3.5.1
[10] sva_3.54.0
[11] genefilter_1.88.0
[12] mgcv_1.9-1
[13] nlme_3.1-167
[14] satuRn_1.14.0
[15] DEXSeq_1.52.0
[16] RColorBrewer_1.1-3
[17] AnnotationDbi_1.68.0
[18] DESeq2_1.46.0
[19] SummarizedExperiment_1.36.0
[20] GenomicRanges_1.58.0
[21] GenomeInfoDb_1.42.3
[22] IRanges_2.40.1
[23] S4Vectors_0.44.0
[24] MatrixGenerics_1.18.1
[25] matrixStats_1.5.0
[26] BiocParallel_1.40.0
[27] limma_3.62.2
[28] phyloseq_1.50.0
[29] Biobase_2.66.0
[30] BiocGenerics_0.52.0
loaded via a namespace (and not attached):
[1] splines_4.4.1 BiocIO_1.16.0
[3] bitops_1.0-9 filelock_1.0.3
[5] tibble_3.2.1 XML_3.99-0.18
[7] lifecycle_1.0.4 httr2_1.1.0
[9] pwalign_1.2.0 edgeR_4.4.2
[11] vroom_1.6.5 lattice_0.22-6
[13] ensembldb_2.30.0 MASS_7.3-64
[15] magrittr_2.0.3 yaml_2.3.10
[17] Wrench_1.24.0 pbapply_1.7-2
[19] DBI_1.2.3 ade4_1.7-22
[21] abind_1.4-8 zlibbioc_1.52.0
[23] purrr_1.0.4 AnnotationFilter_1.30.0
[25] RCurl_1.98-1.16 rappdirs_0.3.3
[27] GenomeInfoDbData_1.2.13 vegan_2.6-10
[29] annotate_1.84.0 permute_0.9-7
[31] codetools_0.2-20 DelayedArray_0.32.0
[33] xml2_1.3.6 tidyselect_1.2.1
[35] shape_1.4.6.1 futile.logger_1.4.3
[37] locfdr_1.1-8 farver_2.1.2
[39] UCSC.utils_1.2.0 BiocFileCache_2.14.0
[41] GenomicAlignments_1.42.0 jsonlite_1.8.9
[43] multtest_2.62.0 survival_3.8-3
[45] iterators_1.0.14 foreach_1.5.2
[47] tools_4.4.1 progress_1.2.3
[49] Rcpp_1.0.14 glue_1.8.0
[51] gridExtra_2.3 SparseArray_1.6.1
[53] withr_3.0.2 formatR_1.14
[55] BiocManager_1.30.25 fastmap_1.2.0
[57] boot_1.3-31 rhdf5filters_1.18.0
[59] caTools_1.18.3 digest_0.6.37
[61] R6_2.5.1 colorspace_2.1-1
[63] gtools_3.9.5 biomaRt_2.62.1
[65] RSQLite_2.3.9 tidyr_1.3.1
[67] generics_0.1.3 data.table_1.16.4
[69] tximeta_1.24.0 rtracklayer_1.66.0
[71] prettyunits_1.2.0 httr_1.4.7
[73] S4Arrays_1.6.0 pkgconfig_2.0.3
[75] gtable_0.3.6 blob_1.2.4
[77] hwriter_1.3.2.1 XVector_0.46.0
[79] geneplotter_1.84.0 biomformat_1.34.0
[81] ProtGenerics_1.38.0 scales_1.3.0
[83] png_0.1-8 lambda.r_1.2.4
[85] rstudioapi_0.17.1 tzdb_0.4.0
[87] reshape2_1.4.4 rjson_0.2.23
[89] curl_6.2.0 cachem_1.1.0
[91] rhdf5_2.50.2 KernSmooth_2.23-26
[93] BiocVersion_3.20.0 parallel_4.4.1
[95] restfulr_0.0.15 pillar_1.10.1
[97] grid_4.4.1 vctrs_0.6.5
[99] gplots_3.2.0 dbplyr_2.5.0
[101] xtable_1.8-4 cluster_2.1.8
[103] tximport_1.34.0 VennDiagram_1.7.3
[105] GenomicFeatures_1.58.0 cli_3.6.3
[107] locfit_1.5-9.11 compiler_4.4.1
[109] futile.options_1.0.1 Rsamtools_2.22.0
[111] rlang_1.1.5 crayon_1.5.3
[113] labeling_0.4.3 plyr_1.8.9
[115] stringi_1.8.4 txdbmaker_1.2.1
[117] munsell_0.5.1 Biostrings_2.74.1
[119] lazyeval_0.2.2 BSgenome_1.74.0
[121] hms_1.1.3 bit64_4.6.0-1
[123] Rhdf5lib_1.28.0 KEGGREST_1.46.0
[125] statmod_1.5.0 AnnotationHub_3.14.0
[127] igraph_2.1.4 memoise_2.0.1
[129] bit_4.5.0.1 ape_5.8-1