Question

(very) conflicting results with DESeq2 between versions

0

Entering edit mode

birgittenilsson • 0

@birgittenilsson-13204

Last seen 7.9 years ago

Hi all,

I have been using the Kallisto -> Sleuth pipeline on my RNAseq data consisting of a control and two treatments (in triplicates). From that I got 8 DETs (q-value < 0.05) for treatment 1 and 12 DETs for treatment 2. The highest beta ("equivalent" to log fold change) were around 6 and the lowest -4.

In order to validate my results from Sleuth, I wanted to use DESeq2 to see if I got similar results (+ make "prettier"/customizable figures). Tximport were used to load the count data from Kallisto into the DESeq2 pipeline. The first runs with DESeq2 were pretty consistent with the Sleuth results (considering Sleuth is more conservative than DESeq2) with 19 DETs for treatment1 and 34 for treatment 2. The highest/lowest log fold change was around 6.6 and -3.

BUT then... I updated by version of R and DESeq2 and suddenly I got completely different results. The exact same R script (with updated packages) have then 100 DETs for treatment1 and 143 for treatment2 with highest/lowest log fold changes around 21 and -25.5.

This is really weird - have others of you had similar experiences? What causes this? and how do I "fix" it (especially the log fold changes seems way off in the second analysis)

Thanks!

The code:

#Libraries
library(tximport)
library(rhdf5)
library(DESeq2)

#base directory:
dir <- "~/Documents/RNASeq/kallisto/data/"

#Metadata
samples <- read.table(file.path(dir, "metadata.txt"), header=TRUE, stringsAsFactors = FALSE)

#Kallisto-data
files <- file.path(dir,"results", samples$path, "abundance.tsv")
names(files)<- paste0(samples$Sample)
txi <- tximport(files, type = "kallisto", txOut = TRUE)

#DESeq2
sampleTable <- data.frame(condition = factor(rep(c("Treatment1", "Treatment2", "Control"), each = 3)))
rownames(sampleTable) <- colnames(txi$counts)
dds <- DESeqDataSetFromTximport(txi, sampleTable, ~condition)

#Pre-filtering
dds <- dds[rowSums(counts(dds))>1, ]

#Control = first level
dds$condition <- relevel(dds$condition, ref="Control")

#Run DESeq2
 dds <-DESeq(dds)

#Call and inspect the results table
res <- results(dds)

# results for each treatment vs. control
res_treat1 <- results(dds, contrast=c("condition", "Treatment1", "Control"))
res_treat2 <- results(dds, contrast=c("condition", "Treatment2", "Control"))

# Call information on the result object on which variables and tests were used:
mcols(res_treat1, use.names=TRUE)
mcols(res_treat2, use.names=TRUE)

# Call number of significant results with BH-adjusted p-value < 0.05
sum(res_treat1$padj < 0.05, na.rm=TRUE)
sum(res_treat2$padj < 0.05, na.rm=TRUE)

#Subset the results and sort them by the log2 fold change down-regulation:
 resSig_treat1<- res[which(res_treat1$padj < 0.05), ]
 head(resSig_treat1[ order(resSig_treat1$log2FoldChange ), ])

 resSig_treat2<- res[which(res_treat2$padj < 0.05), ]
 head(resSig_treat2[ order(resSig_treat2$log2FoldChange ), ])

 #And sort by up-regulations:
 tail(resSig_treat1[order(resSig_treat1$log2FoldChange ), ] )
 tail(resSig_treat2[order(resSig_treat2$log2FoldChange ), ] )

Session-info (for the updated run):

R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.5

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] da_DK.UTF-8/da_DK.UTF-8/da_DK.UTF-8/C/da_DK.UTF-8/da_DK.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.17.2              SummarizedExperiment_1.7.4 DelayedArray_0.3.9         matrixStats_0.52.2        
 [5] Biobase_2.37.2             GenomicRanges_1.29.4       GenomeInfoDb_1.13.4        IRanges_2.11.3            
 [9] S4Vectors_0.15.3           BiocGenerics_0.23.0        rhdf5_2.21.1               tximport_1.5.0            

loaded via a namespace (and not attached):
 [1] genefilter_1.59.0       locfit_1.5-9.1          splines_3.4.0           lattice_0.20-35        
 [5] colorspace_1.3-2        htmltools_0.3.6         base64enc_0.1-3         survival_2.41-3        
 [9] XML_3.98-1.7            rlang_0.1.1             DBI_0.6-1               foreign_0.8-68         
[13] BiocParallel_1.11.2     RColorBrewer_1.1-2      GenomeInfoDbData_0.99.1 plyr_1.8.4             
[17] stringr_1.2.0           zlibbioc_1.23.0         munsell_0.4.3           gtable_0.2.0           
[21] htmlwidgets_0.8         memoise_1.1.0           latticeExtra_0.6-28     knitr_1.16             
[25] geneplotter_1.55.0      AnnotationDbi_1.39.1    htmlTable_1.9           Rcpp_0.12.11           
[29] acepack_1.4.1           xtable_1.8-2            scales_0.4.1            backports_1.1.0        
[33] checkmate_1.8.2         Hmisc_4.0-3             annotate_1.55.0         XVector_0.17.0         
[37] gridExtra_2.2.1         ggplot2_2.2.1           digest_0.6.12           stringi_1.1.5          
[41] grid_3.4.0              tools_3.4.0             bitops_1.0-6            magrittr_1.5           
[45] RSQLite_1.1-2           lazyeval_0.2.0          RCurl_1.95-4.8          tibble_1.3.3           
[49] Formula_1.2-1           cluster_2.0.6           Matrix_1.2-10           data.table_1.10.4      
[53] rpart_4.1-11            nnet_7.3-12             compiler_3.4.0

deseq2 tximport rnaseq kallisto R • 2.3k views

ADD COMMENT • link updated 7.9 years ago by Michael Love 43k • written 7.9 years ago by birgittenilsson • 0

score 0 · Answer 1 · 2017-06-07

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

The log fold change difference are documented in a couple of places (the NEWS and vignette section on changes). There is a new function lfcShrink() in the package for shrinking fold changes. Check the workflow and vignette, it's just a small change to the code to obtain shrunken log fold changes.

For pvalues and FDR sets, you shouldn't expect the same sets between versions. It's very easy with improvements to methods which result in very small changes to parameters to have <100 genes pushed above or below a threshold.

ADD COMMENT • link 7.9 years ago Michael Love 43k

0

Entering edit mode

Here's some more discussion on why you might see the gene sets change when you update DESeq2

A: Deseq2 versions discrepancy

ADD REPLY • link 7.9 years ago Michael Love 43k

0

Entering edit mode

Thank you! I was not aware of the lfcShrink() function - that made the trick :-)

ADD REPLY • link 7.9 years ago birgittenilsson • 0