Question

"Error in nbinomLRT(... less than one degree of freedom"... Isolating main and interaction effects with the LRT?

0

Entering edit mode

laurenjfrazee • 0

@laurenjfrazee-23747

Last seen 4.8 years ago

Temple University, Philadelphia, PA, USA

Hello,

I have 18 samples of RNAseq read counts (46,127 genes) from a time series experiment. My factors/variables are

species, with 2 levels: C.rat & C.lin stage, with 3 levels: 1, 2, & 3

So I have 3 replicates of each combination of the above factors. In making my "ddsFullCountTable", I used these as colData:

                                species       stage
C. linearis Stage 1 plant 1     C.linearis     1
C. linearis Stage 1 plant 2     C.linearis     1
C. linearis Stage 1 plant 3     C.linearis     1
C. linearis Stage 2 plant 1     C.linearis     2
C. linearis Stage 2 plant 2     C.linearis     2
C. linearis Stage 2 plant 3     C.linearis     2
C. linearis Stage 3 plant 1     C.linearis     3
C. linearis Stage 3 plant 2     C.linearis     3
C. linearis Stage 3 plant 3     C.linearis     3
C. rattanii Stage 1 plant 1     C.rattanii     1
C. rattanii Stage 1 plant 2     C.rattanii     1
C. rattanii Stage 1 plant 3     C.rattanii     1
C. rattanii Stage 2 plant 1     C.rattanii     2
C. rattanii Stage 2 plant 2     C.rattanii     2
C. rattanii Stage 2 plant 3     C.rattanii     2
C. rattanii Stage 3 plant 1     C.rattanii     3
C. rattanii Stage 3 plant 2     C.rattanii     3
C. rattanii Stage 3 plant 3     C.rattanii     3

and

design = ~ species + stage + species:stage

My goal is to find the genes with significant species by stage (interaction) effect, which I've done, but then to also use the LRT to find the genes with a stage main effect. Ultimately I'll need to then find the subset of genes that do have the stage main effect but do not have the interaction effect. I've run the following code:

dds <- DESeq(ddsFullCountTable)
ddsTimeSeries <- DESeq(dds, test="LRT", reduced = ~ species + stage)
resultsTimeSeries <- results(ddsTimeSeries, alpha = 0.01)

and this has worked fine to give me ~855 genes with an overall (i.e., ANOVA-style "omnibus") interaction effect.

[I do have one question about this, though: the vignette says: "In this case, using the likelihood ratio test with a reduced model which does not contain the interaction term will test whether the condition induces a change in gene expression at any time point after the base-level time point." (Q1) Does this mean that this test will pull out genes with interactions between time points 1-2, 2-3, and 1-3? In other words, does it treat the time factor as ordinal? or categorical?]

then I do the following to try and repeat the same for isolating genes with an overall (i.e., ANOVA-style "omnibus") stage main effect:

ddsTimeSeries <- DESeq(dds, test="LRT", reduced = ~ species + species:stage)

but I get this error

using pre-existing size factors
estimating dispersions
found already estimated dispersions, replacing these
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
Error in nbinomLRT(object, full = full, reduced = reduced, quiet = quiet,  : 
  less than one degree of freedom, perhaps full and reduced models are not in the correct order

I've then tried rearranging the terms of the full model before dropping stage, but the same error appears.

So my second question is (Q2) Why is dropping stage from the LRT not a possibility, or is it?

To get around it, I've run the same ddsFullCountTable data with design= ~stage and reduced = ~1, but it seems strange and incorrect to use 2 different full models to get to this answer about the same data, so... (Q3) is this work-around incorrect or wrong from a modeling perspective and/or in DESeq2?

To get around this issue another way, I've considered running the same ddsFullCountTable data with the original full model (design= ~species + stage + species:stage), and then doing both

ddsTimeSeries <- DESeq(dds, test="LRT", reduced = ~ species + stage)

and

ddsTimeSeries <- DESeq(dds, test="LRT", reduced = ~ species) #which does work!

and then using the intersection of those 2 resulting sets of genes to parse out the genes with only a stage main effect, so... (Q4) again, is this work-around incorrect or wrong from a modeling perspective or in DESeq2?

Lastly, I have considered using the LRT results for the interaction effect (~855 genes), then using the union of gene sets resulting from the following independent contrasts to find all the genes with a main effect for stage:

design(dds) <- ~ species + stage + species:stage
dds <- DESeq(dds, test = "Wald")
results(dds, contrast=c("stage","2","1"))
results(dds, contrast=c("stage","3","2"))
results(dds, contrast=c("stage","3","1"))

Then I would be removing any of the 855 genes that have an interaction effect from this list, in order to achieve my goal (i.e., to find the subset of genes that do have the stage main effect but do not have the interaction effect), which again, seems incorrect to be doing across tests. so, (Q5) is this work-around incorrect or wrong from a modeling perspective or in DESeq2?

Finally, (Q6) is there a better option i have not considered? if not, which of the above methods are the most statistically sound?

Thank you so so so much to whomever can provide insight!

> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
 [1] grid      parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] apeglm_1.8.0                Rgraphviz_2.30.0            topGO_2.38.1               
 [4] SparseM_1.78                GO.db_3.10.0                AnnotationDbi_1.48.0       
 [7] graph_1.64.0                RColorBrewer_1.1-2          pheatmap_1.0.12            
[10] VennDiagram_1.6.20          futile.logger_1.4.3         factoextra_1.0.7.999       
[13] ggplot2_3.2.1               devtools_2.3.0              usethis_1.6.1              
[16] FactoMineR_2.3              DESeq2_1.26.0               SummarizedExperiment_1.16.1
[19] DelayedArray_0.12.2         BiocParallel_1.20.1         matrixStats_0.55.0         
[22] Biobase_2.46.0              GenomicRanges_1.38.0        GenomeInfoDb_1.22.0        
[25] IRanges_2.20.2              S4Vectors_0.24.3            BiocGenerics_0.32.0        

loaded via a namespace (and not attached):
  [1] colorspace_1.4-1       ggsignif_0.6.0         ellipsis_0.3.0         rprojroot_1.3-2       
  [5] htmlTable_1.13.3       XVector_0.26.0         base64enc_0.1-3        fs_1.3.1              
  [9] rstudioapi_0.11        ggpubr_0.2.5           farver_2.0.3           remotes_2.1.1         
 [13] ggrepel_0.8.1          bit64_0.9-7            mvtnorm_1.1-1          fansi_0.4.1           
 [17] splines_3.6.2          leaps_3.1              geneplotter_1.64.0     knitr_1.28            
 [21] pkgload_1.0.2          Formula_1.2-3          annotate_1.64.0        cluster_2.1.0         
 [25] png_0.1-7              BiocManager_1.30.10    compiler_3.6.2         backports_1.1.5       
 [29] assertthat_0.2.1       Matrix_1.2-18          lazyeval_0.2.2         cli_2.0.2             
 [33] formatR_1.7            acepack_1.4.1          htmltools_0.4.0        prettyunits_1.1.1     
 [37] tools_3.6.2            coda_0.19-3            gtable_0.3.0           glue_1.4.1            
 [41] GenomeInfoDbData_1.2.2 dplyr_0.8.4            Rcpp_1.0.3             bbmle_1.0.23.1        
 [45] vctrs_0.3.0            xfun_0.12              stringr_1.4.0          ps_1.3.2              
 [49] testthat_2.3.2         lifecycle_0.2.0        XML_3.99-0.3           zlibbioc_1.32.0       
 [53] MASS_7.3-51.5          scales_1.1.0           lambda.r_1.2.4         curl_4.3              
 [57] memoise_1.1.0          gridExtra_2.3          emdbook_1.3.12         bdsmatrix_1.3-4       
 [61] rpart_4.1-15           latticeExtra_0.6-29    stringi_1.4.6          RSQLite_2.2.0         
 [65] genefilter_1.68.0      desc_1.2.0             checkmate_2.0.0        pkgbuild_1.0.6        
 [69] rlang_0.4.6            pkgconfig_2.0.3        bitops_1.0-6           lattice_0.20-38       
 [73] purrr_0.3.3            htmlwidgets_1.5.1      labeling_0.3           bit_1.1-15.2          
 [77] tidyselect_1.0.0       processx_3.4.2         plyr_1.8.5             magrittr_1.5          
 [81] R6_2.4.1               Hmisc_4.3-1            DBI_1.1.0              pillar_1.4.3          
 [85] foreign_0.8-75         withr_2.1.2            survival_3.1-8         scatterplot3d_0.3-41  
 [89] RCurl_1.98-1.1         nnet_7.3-12            tibble_3.0.1           crayon_1.3.4          
 [93] futile.options_1.0.1   jpeg_0.1-8.1           locfit_1.5-9.1         data.table_1.12.8     
 [97] blob_1.2.1             callr_3.4.3            digest_0.6.25          flashClust_1.01-2     
[101] xtable_1.8-4           numDeriv_2016.8-1.1    munsell_0.5.0          sessioninfo_1.1.1     
>

deseq2 main effects interaction effects R lrt • 1.7k views

ADD COMMENT • link updated 4.8 years ago by Michael Love 43k • written 4.8 years ago by laurenjfrazee • 0

score 1 · Answer 1 · 2020-06-25

1) DESeq2 only works with categorical, we can't accept ordinal factors.

2) You are trying to drop a main effect but include its interaction, which we don't allow for in DESeq2. See this discussion for more details.

3) It's fine to have a different set of full and reduced for testing various effects. This is common in ANOVA.

Re: 4-6 and whether your analysis plan is statistically sound: for specific recommendations on your statistical analysis plan, you'll have to collaborate with a local statistician. Unfortunately I don't have time to provide anything other than help with software usage on the support site.