Question

DESeq2 76% of genes labeled 'low count'

0

Entering edit mode

hs.lansdell ▴ 20

@hslansdell-14246

Last seen 7.5 years ago

Hello!

I am running DESeq2 on a dichotomous outcome (77 no, 41 yes). My concern, is that I am getting an enormous number of counts labeled as low. I've been using the same count matrix for other runs, just pulling the samples I need for the process (which I get. changes the matrix depending on which samples are used), but I have never gotten anything like this. The data was originally 183 samples, parred down to 117 for this specific outcome. I have 20338 genes.

Code:

data<-read.csv("TotalRNA.csv", header=TRUE, row.names = 1, stringsAsFactors = FALSE)

cD<-read.csv("Book2.csv",header = TRUE,row.names = 1)

all(rownames(cD)==colnames(data))

dds<-DESeqDataSetFromMatrix(countData = data, colData = cD, design =~Sex+Condition)

dds<-DESeq(dds) res<-results(dds) plotMA(res,ylim=c(-2,2)) summary(res) sum(res$padj < 0.1, na.rm=TRUE)

Results:

out of 20338 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 907, 4.5% 
LFC < 0 (down)   : 101, 0.5% 
outliers [1]     : 0, 0% 
low counts [2]   : 15378, 76% 
(mean count < 160)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

Thanks for any feedback!

deseq2 low count genes • 1.1k views

ADD COMMENT • link updated 7.4 years ago by Michael Love 43k • written 7.4 years ago by hs.lansdell ▴ 20

score 0 · Answer 1 · 2017-12-12

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 5 days ago

United States

Can you add the sessionInfo()?

ADD COMMENT • link 7.4 years ago Michael Love 43k

0

Entering edit mode

R version 3.4.2 (2017-09-28)

Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:

[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] DESeq2_1.16.1 SummarizedExperiment_1.6.5 DelayedArray_0.2.7 matrixStats_0.52.2 Biobase_2.36.2 [6] GenomicRanges_1.28.6 GenomeInfoDb_1.12.3 IRanges_2.10.5 S4Vectors_0.14.7 BiocGenerics_0.22.1

loaded via a namespace (and not attached): [1] locfit_1.5-9.1 Rcpp_0.12.14 lattice_0.20-35 tidyr_0.7.2 assertthat_0.2.0 digest_0.6.12 [7] R6_2.2.2 plyr_1.8.4 backports_1.1.1 acepack_1.4.1 RSQLite_2.0 ggplot2_2.2.1 [13] zlibbioc_1.22.0 rlang_0.1.4 lazyeval_0.2.1 rstudioapi_0.7 data.table_1.10.4-3 annotate_1.54.0 [19] blob_1.1.0 rpart_4.1-11 Matrix_1.2-12 checkmate_1.8.5 splines_3.4.2 BiocParallel_1.10.1 [25] geneplotter_1.54.0 stringr_1.2.0 foreign_0.8-69 htmlwidgets_0.9 bit_1.1-12 RCurl_1.95-4.8 [31] munsell_0.4.3 compiler_3.4.2 pkgconfig_2.0.1 base64enc_0.1-3 htmltools_0.3.6 nnet_7.3-12 [37] tibble_1.3.4 gridExtra_2.3 htmlTable_1.11.0 GenomeInfoDbData_0.99.0 Hmisc_4.0-3 XML_3.98-1.9 [43] dplyr_0.7.4 bitops_1.0-6 grid_3.4.2 DBI_0.7 xtable_1.8-2 gtable_0.2.0 [49] magrittr_1.5 scales_0.5.0 stringi_1.1.6 XVector_0.16.0 genefilter_1.58.1 bindrcpp_0.2 [55] latticeExtra_0.6-28 Formula_1.2-2 RColorBrewer_1.1-2 tools_3.4.2 bit64_0.9-7 glue_1.2.0 [61] purrr_0.2.4 survival_2.41-3 AnnotationDbi_1.38.2 colorspace_1.3-2 cluster_2.0.6 memoise_1.1.0 [67] knitr_1.17 bindr_0.1

ADD REPLY • link 7.4 years ago hs.lansdell ▴ 20

0

Entering edit mode

Thanks for looking!

ADD REPLY • link 7.4 years ago hs.lansdell ▴ 20

0

Entering edit mode

Ok, then, can you try this code from the vignette:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#independent-filtering-of-results

This will plot the reason that the high threshold was chosen. If the plot looks strange, then you may want to just pick a single reasonable threshold across all your result tables, and use independentFiltering=FALSE. Some code for that is in a recent thread:

C: contamination and DESeq2 performance

ADD REPLY • link 7.4 years ago Michael Love 43k