Question

Strange results from Deseq2

0

Entering edit mode

swbarnes2 ★ 1.4k

@swbarnes2-14086

Last seen 12 hours ago

San Diego

I'm getting strange results from DESeq2 results. I'm comparing 3 samples to another group of 3 samples, which I know is less than ideal

counts<-read.csv("raw_180803.csv", header = TRUE, row.names = 1, check.names = FALSE) anno <- read.csv("anno_180803.csv", header = TRUE, row.names = 1)

anno$donor <- factor(anno$donor)

dds <- DESeqDataSetFromMatrix(countData = counts, colData = anno, design =~ 1)

dds <- dds[rowSums(counts(dds)) > 1,]

design(dds) <- ~ group + donor

dds <-DESeq(dds)

res<-results(dds, contrast = c("group", "hIgG1", "unstim_none"))

summary(res)

out of 20305 with nonzero total read count adjusted p-value < 0.1 LFC > 0 (up) : 1, 0.0049% LFC < 0 (down) : 0, 0% outliers [1] : 0, 0% low counts [2] : 0, 0%

(mean count < 0)

res<-results(dds, contrast = c("group", "hIgG4", "unstim_none"))

summary(res)

out of 20305 with nonzero total read count adjusted p-value < 0.1 LFC > 0 (up) : 26, 0.13% LFC < 0 (down) : 10, 0.049% outliers [1] : 0, 0% low counts [2] : 12991, 64% (mean count < 124)

But the counts for IgG4 and IgG1 just don't look that different across the board. Why is the mean count threshold set so differently between the two contrasts? (The mean of normalized counts for the hIgG1 samples ranges between 650-790).

Here are normalized counts for one gene, but every comparison I do, except for one, shows NA for the adjusted p-value

59.04707

79.40045

37.52253

63.28349

82.1004

42.47123

53.48357

68.60829

36.32574

71.87155

73.46171

46.15779

56.68832

60.98544

39.50018

I thought that NA's meant either outliers or too few reads, but can that be the case with these normalized counts?

Should I stop worrying and just believe my data? What should I look at to figure out why these two contrasts are so different from each other?

EDIT:

I can force the software to filter based on normalized means at the level I choose (I chose 180 here) instead of letting the software pick for me

res<-results(dds, contrast = c("group", "hIgG1", "unstim_none"), filter = dds@rowRanges@elementMetadata$baseMean > 180)

summary(res)

out of 20305 with nonzero total read count adjusted p-value < 0.1 LFC > 0 (up) : 2, 0.0098% LFC < 0 (down) : 1, 0.0049% outliers [1] : 0, 0% low counts [2] : 13766, 68% (mean count < 0) [1] see 'cooksCutoff' argument of ?results [2] see 'independentFiltering' argument of ?results

I guess the issue is that there are almost no significant differences in this particular comparison, (while there are big donor differences), while for the other comparisons, dropping large number of genes which have decent expression is improving the stats for the best ones.

Is the default independent filtering in the results command maybe not doing that great a job, and I should be setting it myself? It seems plausible that there might be good DE genes with normalized means of about 100 or lower, but the default filtering settings are getting rid of these in most comparisons.

> sessionInfo() R version 3.5.0 (2018-04-23) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS release 6.9 (Final)

Matrix products: default BLAS: /.automount/tools/GNU/R/3.5.0/lib64/R/lib/libRblas.so LAPACK: /.automount/tools/GNU/R/3.5.0/lib64/R/lib/libRlapack.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] org.Hs.eg.db_3.6.0 AnnotationDbi_1.42.1 [3] DESeq2_1.20.0 SummarizedExperiment_1.10.1 [5] DelayedArray_0.6.2 BiocParallel_1.14.2 [7] matrixStats_0.54.0 Biobase_2.40.0 [9] GenomicRanges_1.32.6 GenomeInfoDb_1.16.0 [11] IRanges_2.14.10 S4Vectors_0.18.3 [13] BiocGenerics_0.26.0

loaded via a namespace (and not attached): [1] bit64_0.9-7 splines_3.5.0 Formula_1.2-3 [4] assertthat_0.2.0 latticeExtra_0.6-28 blob_1.1.1 [7] GenomeInfoDbData_1.1.0 pillar_1.3.0 RSQLite_2.1.1 [10] backports_1.1.2 lattice_0.20-35 glue_1.3.0 [13] digest_0.6.15 RColorBrewer_1.1-2 XVector_0.20.0 [16] checkmate_1.8.5 colorspace_1.3-2 htmltools_0.3.6 [19] Matrix_1.2-14 plyr_1.8.4 XML_3.98-1.13 [22] pkgconfig_2.0.1 genefilter_1.62.0 zlibbioc_1.26.0 [25] purrr_0.2.5 xtable_1.8-2 scales_0.5.0 [28] htmlTable_1.12 tibble_1.4.2 annotate_1.58.0 [31] ggplot2_3.0.0 nnet_7.3-12 lazyeval_0.2.1 [34] survival_2.42-6 magrittr_1.5 crayon_1.3.4 [37] memoise_1.1.0 foreign_0.8-71 tools_3.5.0 [40] data.table_1.11.4 stringr_1.3.1 locfit_1.5-9.1 [43] munsell_0.5.0 cluster_2.0.7-1 bindrcpp_0.2.2 [46] compiler_3.5.0 rlang_0.2.1 grid_3.5.0 [49] RCurl_1.95-4.11 rstudioapi_0.7 htmlwidgets_1.2 [52] bitops_1.0-6 base64enc_0.1-3 gtable_0.2.0 [55] DBI_1.0.0 R6_2.2.2 gridExtra_2.3 [58] knitr_1.20 dplyr_0.7.6 bit_1.1-14 [61] bindr_0.1.1 Hmisc_4.1-1 stringi_1.2.4 [64] Rcpp_0.12.18 geneplotter_1.58.0 rpart_4.1-13 [67] acepack_1.4.1 tidyselect_0.2.4

deseq2 • 754 views

ADD COMMENT • link 6.5 years ago swbarnes2 ★ 1.4k

score 0 · Answer 1 · 2018-08-14

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

The threshold that is chosen by independent filtering depends on whether greater power can it be achieved by increasing the threshold. It needs to be the case that there are alternative hypotheses with sufficient power at higher counts, and that filtering out lower count rows helps to make these differences detectable at the rejection level ‘alpha’ that you specified. So the filter threshold can go higher with one contrast because there are alternative hypotheses which are aided by filtering, where is the filter stays at 0 for another contrast because there are no alternative hypotheses and only nulls. You can turn the automatic filtering off with independentFiltering=FALSE and you can try alternative methods to filtering such as IHW using the code in the vignette.

ADD COMMENT • link 6.5 years ago Michael Love 43k

0

Entering edit mode

Thanks, I tried some things, and added a follow-up in my question

ADD REPLY • link 6.5 years ago swbarnes2 ★ 1.4k

0

Entering edit mode

There may be genes that you’re missing with low count, and you could turn off independent filtering to try to capture these, but you’ll end up with less differentially expresses genes in total (because that’s how the independent filter works, see the diagram in vignette).