replaceOutliers in DESeq2 : need to recompute size Factors ?
1
0
Entering edit mode
@eleonoregravier-8219
Last seen 2.4 years ago
France

Hi BioC community,

I work on metagenomic data (16S sequencing) and I want to indentify microorganisms differentially abundant between two groups. With this aim, I use the size Factors computed with GMPR, a normalization method adapted to metagenomic data. Consequently,I can not use directly the DESeq function. In addition, I have pvalues set to NA due to outliers, that's why, I want to replace outliers (like in the DESeq process). I am wondering if, after replacing outliers, I have to recompute the size factors ? I also set the cooksCutoff argument to "FALSE" after replacing outliers to avoid getting again NA pvalues, is it correct ?

Briefly : is the code below OK for you?

Many thanks in advance for your help, Best, Eleonore


gmpr.size.factor <- GMPR(t(otu.tab)) 

dds <- DESeqDataSetFromMatrix(countData = otu.tab,colData = cli,design= ~ gp)
sizeFactors(dds) <-  gmpr.size.factor
dds <- estimateDispersions(dds)
dds <- nbinomWaldTest(dds)

dds2<-replaceOutliers(dds)
gmpr.size.factor <- GMPR(t(counts(dds2,normalized=FALSE,replaced=FALSE))) 
sizeFactors(dds2) <-  gmpr.size.factor
dds2 <- estimateDispersions(dds2)
dds2 <- nbinomWaldTest(dds2)

res<-results(dds2, name="gp_A_vs_B",independentFiltering=FALSE,pAdjustMethod="none",cooksCutoff = "FALSE")

R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252
[4] LC_NUMERIC=C                   LC_TIME=French_France.1252    

attached base packages:
 [1] stats4    grid      parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] biomformat_1.12.0             readxl_1.3.1                  coin_1.3-1                   
 [4] survival_3.1-8                pscl_1.5.5                    DESeq2_1.24.0                
 [7] SummarizedExperiment_1.14.1   DelayedArray_0.10.0           BiocParallel_1.18.1          
[10] matrixStats_0.56.0            GenomicRanges_1.36.1          GenomeInfoDb_1.20.0          
[13] IRanges_2.18.3                S4Vectors_0.22.1              GMPR_0.1.3                   
[16] MASS_7.3-51.4                 lmerTest_3.1-1                lme4_1.1-21                  
[19] Matrix_1.2-17                 rmarkdown_2.1                 car_3.0-6                    
[22] carData_3.0-3                 emmeans_1.4.5                 EMA_1.4.7                    
[25] tidyr_1.0.2                   scales_1.1.0                  gridExtra_2.3                
[28] vegan_2.5-6                   lattice_0.20-38               permute_0.9-5                
[31] htmlwidgets_1.5.1             reshape2_1.4.3                plotly_4.9.2                 
[34] curatedMetagenomicData_1.14.1 ExperimentHub_1.10.0          dplyr_1.0.2                  
[37] Biobase_2.44.0                AnnotationHub_2.16.1          BiocFileCache_1.8.0          
[40] dbplyr_1.4.2                  BiocGenerics_0.30.0           ggplot2_3.2.1                
[43] ape_5.3                       phyloseq_1.28.0               knitr_1.28                   

loaded via a namespace (and not attached):
  [1] tidyselect_1.1.0              RSQLite_2.2.0                 AnnotationDbi_1.46.1         
  [4] FactoMineR_2.3                munsell_0.5.0                 codetools_0.2-16             
  [7] preprocessCore_1.46.0         withr_2.1.2                   colorspace_1.4-1             
 [10] highr_0.8                     rstudioapi_0.11               leaps_3.1                    
 [13] labeling_0.3                  GenomeInfoDbData_1.2.1        bit64_0.9-7                  
 [16] farver_2.0.1                  rhdf5_2.28.1                  coda_0.19-3                  
 [19] vctrs_0.3.4                   generics_0.1.0                TH.data_1.0-10               
 [22] xfun_0.12                     R6_2.4.1                      locfit_1.5-9.1               
 [25] bitops_1.0-6                  assertthat_0.2.1              promises_1.1.0               
 [28] multcomp_1.4-14               nnet_7.3-12                   gtable_0.3.0                 
 [31] affy_1.62.0                   sandwich_3.0-0                rlang_0.4.8                  
 [34] genefilter_1.66.0             scatterplot3d_0.3-41          splines_3.6.1                
 [37] lazyeval_0.2.2                acepack_1.4.1                 checkmate_2.0.0              
 [40] heatmap.plus_1.3              BiocManager_1.30.10           yaml_2.2.1                   
 [43] abind_1.4-5                   backports_1.1.5               httpuv_1.5.2                 
 [46] Hmisc_4.3-1                   tools_3.6.1                   affyio_1.54.0                
 [49] ellipsis_0.3.0                RColorBrewer_1.1-2            siggenes_1.58.0              
 [52] Rcpp_1.0.4.6                  plyr_1.8.5                    base64enc_0.1-3              
 [55] progress_1.2.2                zlibbioc_1.30.0               purrr_0.3.3                  
 [58] RCurl_1.98-1.1                prettyunits_1.1.1             rpart_4.1-15                 
 [61] zoo_1.8-7                     haven_2.2.0                   ggrepel_0.8.1                
 [64] cluster_2.1.0                 magrittr_1.5                  data.table_1.12.8            
 [67] openxlsx_4.1.4                mvtnorm_1.0-11                hms_0.5.3                    
 [70] mime_0.9                      evaluate_0.14                 xtable_1.8-4                 
 [73] XML_3.99-0.3                  rio_0.5.16                    jpeg_0.1-8.1                 
 [76] gcrma_2.56.0                  compiler_3.6.1                biomaRt_2.40.5               
 [79] tibble_3.0.4                  crayon_1.3.4                  minqa_1.2.4                  
 [82] htmltools_0.4.0               mgcv_1.8-28                   later_1.0.0                  
 [85] Formula_1.2-3                 libcoin_1.0-6                 geneplotter_1.62.0           
 [88] DBI_1.1.0                     rappdirs_0.3.1                boot_1.3-22                  
 [91] ade4_1.7-13                   igraph_1.2.4.2                forcats_0.5.0                
 [94] pkgconfig_2.0.3               flashClust_1.01-2             numDeriv_2016.8-1.1          
 [97] foreign_0.8-71                foreach_1.4.8                 annotate_1.62.0              
[100] multtest_2.40.0               XVector_0.24.0                estimability_1.3             
[103] scrime_1.3.5                  stringr_1.4.0                 digest_0.6.25                
[106] Biostrings_2.52.0             cellranger_1.1.0              htmlTable_1.13.3             
[109] curl_4.3                      modeltools_0.2-23             shiny_1.4.0                  
[112] nloptr_1.2.1                  GSA_1.03.1                    lifecycle_0.2.0              
[115] nlme_3.1-140                  jsonlite_1.6.1                Rhdf5lib_1.6.3               
[118] viridisLite_0.3.0             pillar_1.4.3                  fastmap_1.0.1                
[121] httr_1.4.1                    interactiveDisplayBase_1.22.0 glue_1.4.0                   
[124] zip_2.0.4                     png_0.1-7                     iterators_1.0.12             
[127] bit_1.1-15.2                  stringi_1.4.3                 blob_1.2.1                   
[130] latticeExtra_0.6-29           memoise_1.1.0
DESeq2 • 993 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

If you have many outliers, it may indicate that the NB is not an appropriate model for your data. You might consider other software designed for this type of data instead.

I can not use directly the DESeq function

You can use DESeq() with custom size factors. It will not replace the size factors.

I am wondering if, after replacing outliers, I have to recompute the size factors ?

In DESeq() we do not recompute size factors.

ADD COMMENT
0
Entering edit mode

Hi Michael and thaks for your answer. How can I use custom size factors on DESeq function please ? Thanks again, Eleonore

ADD REPLY
0
Entering edit mode

sizeFactors(dds) <- ...

ADD REPLY
0
Entering edit mode

Hi again Michael,

I now understand what you mean, sorry for the delay. So, the code is simply as below :


gmpr.size.factor <- GMPR(t(otu.tab)) 
dds <- DESeqDataSetFromMatrix(countData = otu.tab,colData = cli,design= ~ gp)
sizeFactors(dds) <-  gmpr.size.factor
dds<-DESeq(dds)
res<-results(dds, name="gp_A_vs_B")

Thanks a lot for your help !

ADD REPLY

Login before adding your answer.

Traffic: 650 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6