DeSeq2 Multiple Conditons
umut.caglar
Last seen 8.3 years ago
United States

Hi everybody

I am kind of new at DeSeq2. And I want to understand contrast command in results function.

I have a data set dataset composed of multiple conditions. So instead of the example dataset treated vs untreated I have many different conditions (if number is necessary it is 52). I have 3 different samples for each condition. Data is raw RNA counts. I have counts for around 5000 different RNA's.

Now I want to calculate log2 changes between 2 given conditions say "A" and "B". I am doing this by 

rnaObject01$condition <- relevel(rnaObject01$condition, "C1")
rnaObject01 <- estimateSizeFactors(rnaObject01)
rnaObject01 <- DESeq(rnaObject01)
resRnaObject01 <- results(rnaObject01,contrast=c("condition","A","B"))

And I notice that log 2 changes between these 2 conditions depends on my base condition and my result changes when I change my base level by using

rnaObject02$condition <- relevel(rnaObject02$condition, "C2")
rnaObject02 <- estimateSizeFactors(rnaObject02)
rnaObject02 <- DESeq(rnaObject02)
resRnaObject02 <- results(rnaObject02,contrast=c("condition","A","B"))


My question is why the log 2 change between conditions A and B depends on base level C1 or C2. (if needed I can give more details).

Thank you very much 

Best regards


Here are some information that might or might not be relevant

  • Used Guide: Differential analysis of count data – the DESeq2 package (December 16, 2014 )
  • OS: mac yosemite
  • Language R. 
  • R - Season info:

    R version 3.1.2 (2014-10-31)
    Platform: x86_64-apple-darwin13.4.0 (64-bit)

    [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

    attached base packages:
    [1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

    other attached packages:
     [1] dplyr_0.4.1               DESeq2_1.6.3              RcppArmadillo_0.4.600.4.0 Rcpp_0.11.4              
     [5] GenomicRanges_1.18.4      GenomeInfoDb_1.2.4        IRanges_2.0.1             S4Vectors_0.4.0          
     [9] Biobase_2.26.0            BiocGenerics_0.12.1      

    loaded via a namespace (and not attached):
     [1] acepack_1.3-3.3      annotate_1.44.0      AnnotationDbi_1.28.1 assertthat_0.1       base64enc_0.1-2     
     [6] BatchJobs_1.5        BBmisc_1.9           BiocParallel_1.0.3   brew_1.0-6           checkmate_1.5.1     
    [11] cluster_2.0.1        codetools_0.2-10     colorspace_1.2-4     DBI_0.3.1            digest_0.6.8        
    [16] fail_1.2             foreach_1.4.2        foreign_0.8-63       Formula_1.2-0        genefilter_1.48.1   
    [21] geneplotter_1.44.0   ggplot2_1.0.0        grid_3.1.2           gtable_0.1.2         Hmisc_3.15-0        
    [26] iterators_1.0.7      lattice_0.20-30      latticeExtra_0.6-26  locfit_1.5-9.1       magrittr_1.5        
    [31] MASS_7.3-39          munsell_0.4.2        nnet_7.3-9           plyr_1.8.1           proto_0.3-10        
    [36] RColorBrewer_1.1-2   reshape2_1.4.1       rpart_4.1-9          RSQLite_1.0.0        scales_0.2.4        
    [41] sendmailR_1.2-1      splines_3.1.2        stringr_0.6.2        survival_2.37-7      tools_3.1.2         
    [46] XML_3.98-1.1         xtable_1.7-4         XVector_0.6.0


deseq2 multiple factor design • 1.5k views
Last seen 2 days ago
United States


This is likely just numerical stability. The solution is a point in 53 dimensional space (+1 for the intercept term which is not moderated), so there is more space to give nearly the same likelihood. Since v1.4, the prior is calculated in such a way that it will be the same after relevelling, as are the coefficients. But numerically they might not be identical.

If you turn off the betaPrior, (betaPrior=FALSE) then it should be even more numerically stable after relevelling, although you lose the benefits of moderation on the LFC.


