Question

Dispersion Factors Not Being Calculated For Every Gene

0

Entering edit mode

rohitghosh • 0

@eadfcf90

Last seen 3.6 years ago

United States

Hello,

This may be a trivial question, but I am trying to use DESeq2 to find dispersion estimates with a reduced model (using no covariates). My dataset has 15400 genes with non-zero expression, but when I use the following commands, "environment(dds@dispersionFunction)[["fit"]][["fitted.values"]]" only has only 14116 values:


>  dds <- estimateSizeFactors(dds)
>  dds <- estimateDispersions(dds)

sessionInfo( )

R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggplot2_3.3.3               pasilla_1.14.0              DESeq2_1.26.0              
 [4] SummarizedExperiment_1.16.1 DelayedArray_0.12.3         BiocParallel_1.20.1        
 [7] matrixStats_0.58.0          Biobase_2.46.0              GenomicRanges_1.38.0       
[10] GenomeInfoDb_1.22.1         IRanges_2.20.2              S4Vectors_0.24.4           
[13] BiocGenerics_0.32.0         ImpulseDE2_1.10.0          

loaded via a namespace (and not attached):
 [1] bitops_1.0-7           bit64_4.0.5            RColorBrewer_1.1-2     tools_3.6.3           
 [5] backports_1.2.1        utf8_1.2.1             R6_2.5.0               rpart_4.1-15          
 [9] Hmisc_4.5-0            DBI_1.1.1              colorspace_2.0-1       nnet_7.3-13           
[13] GetoptLong_1.0.5       withr_2.4.2            tidyselect_1.1.1       gridExtra_2.3         
[17] bit_4.0.4              compiler_3.6.3         htmlTable_2.1.0        labeling_0.4.2        
[21] scales_1.1.1           checkmate_2.0.0        genefilter_1.68.0      stringr_1.4.0         
[25] digest_0.6.27          foreign_0.8-75         XVector_0.26.0         base64enc_0.1-3       
[29] jpeg_0.1-8.1           pkgconfig_2.0.3        htmltools_0.5.1.1      fastmap_1.1.0         
[33] htmlwidgets_1.5.3      rlang_0.4.11           GlobalOptions_0.1.2    rstudioapi_0.13       
[37] RSQLite_2.2.7          shape_1.4.5            generics_0.1.0         farver_2.1.0          
[41] dplyr_1.0.6            RCurl_1.98-1.3         magrittr_2.0.1         GenomeInfoDbData_1.2.2
[45] Formula_1.2-4          Matrix_1.2-18          Rcpp_1.0.6             munsell_0.5.0         
[49] fansi_0.4.2            lifecycle_1.0.0        stringi_1.5.3          zlibbioc_1.32.0       
[53] grid_3.6.3             blob_1.2.1             crayon_1.4.1           lattice_0.20-40       
[57] cowplot_1.1.1          splines_3.6.3          annotate_1.64.0        circlize_0.4.12       
[61] locfit_1.5-9.4         knitr_1.33             ComplexHeatmap_2.2.0   pillar_1.6.0          
[65] rjson_0.2.20           geneplotter_1.64.0     XML_3.99-0.3           glue_1.4.2            
[69] latticeExtra_0.6-29    data.table_1.14.0      png_0.1-7              vctrs_0.3.8           
[73] gtable_0.3.0           purrr_0.3.4            clue_0.3-59            cachem_1.0.4          
[77] xfun_0.22              xtable_1.8-4           survival_3.1-8         tibble_3.1.1          
[81] AnnotationDbi_1.48.0   memoise_2.0.0          cluster_2.1.0          ellipsis_0.3.2

For context, I am using these dispersion estimates as input for running ImpulseDE2 to identify time-dependent genes using time series RNA-seq data with only one sample per time point. I've emailed the people who wrote ImpulseDE2 and they told me this was possible by first running DESeq2 with a reduced model.

Are some genes not able to be used for calculating dispersion estimates? If so, is there a way for me to identify which genes these are?

Thanks!

DESeq2 • 479 views

ADD COMMENT • link updated 3.8 years ago by Michael Love 43k • written 3.8 years ago by rohitghosh • 0

score 0 · Answer 1 · 2021-05-19

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

See the vignette section "Access to all calculated values", there are cleaner ways to get at the internal calculated values than using environment.

You can just take a look at the mcols DataFrame to get a better idea I think.

ADD COMMENT • link 3.8 years ago Michael Love 43k