DESeq2 76% of genes labeled 'low count'
1
0
Entering edit mode
hs.lansdell ▴ 20
@hslansdell-14246
Last seen 7.1 years ago

Hello!

I am running DESeq2 on a dichotomous outcome (77 no, 41 yes). My concern, is that I am getting an enormous number of counts labeled as low. I've been using the same count matrix for other runs, just pulling the samples I need for the process (which I get. changes the matrix depending on which samples are used), but I have never gotten anything like this. The data was originally 183 samples, parred down to 117 for this specific outcome. I have 20338 genes. 

Code: 

data<-read.csv("TotalRNA.csv", header=TRUE, row.names = 1, stringsAsFactors = FALSE)

cD<-read.csv("Book2.csv",header = TRUE,row.names = 1)

all(rownames(cD)==colnames(data))

dds<-DESeqDataSetFromMatrix(countData = data, colData = cD, design =~Sex+Condition)

dds<-DESeq(dds)
res<-results(dds)
plotMA(res,ylim=c(-2,2))
summary(res)
sum(res$padj < 0.1, na.rm=TRUE)

 

Results:

out of 20338 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 907, 4.5% 
LFC < 0 (down)   : 101, 0.5% 
outliers [1]     : 0, 0% 
low counts [2]   : 15378, 76% 
(mean count < 160)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

 

Thanks for any feedback!

 

 

deseq2 low count genes • 1.0k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 2 days ago
United States

Can you add the sessionInfo()

ADD COMMENT
0
Entering edit mode

R version 3.4.2 (2017-09-28)

Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:

[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C    LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.16.1              SummarizedExperiment_1.6.5 DelayedArray_0.2.7         matrixStats_0.52.2         Biobase_2.36.2            
 [6] GenomicRanges_1.28.6       GenomeInfoDb_1.12.3        IRanges_2.10.5             S4Vectors_0.14.7           BiocGenerics_0.22.1       

loaded via a namespace (and not attached):
 [1] locfit_1.5-9.1          Rcpp_0.12.14            lattice_0.20-35         tidyr_0.7.2             assertthat_0.2.0        digest_0.6.12          
 [7] R6_2.2.2                plyr_1.8.4              backports_1.1.1         acepack_1.4.1           RSQLite_2.0             ggplot2_2.2.1          
[13] zlibbioc_1.22.0         rlang_0.1.4             lazyeval_0.2.1          rstudioapi_0.7          data.table_1.10.4-3     annotate_1.54.0        
[19] blob_1.1.0              rpart_4.1-11            Matrix_1.2-12           checkmate_1.8.5         splines_3.4.2           BiocParallel_1.10.1    
[25] geneplotter_1.54.0      stringr_1.2.0           foreign_0.8-69          htmlwidgets_0.9   bit_1.1-12              RCurl_1.95-4.8         
[31] munsell_0.4.3           compiler_3.4.2          pkgconfig_2.0.1         base64enc_0.1-3         htmltools_0.3.6         nnet_7.3-12            
[37] tibble_1.3.4            gridExtra_2.3           htmlTable_1.11.0        GenomeInfoDbData_0.99.0 Hmisc_4.0-3             XML_3.98-1.9           
[43] dplyr_0.7.4             bitops_1.0-6            grid_3.4.2              DBI_0.7                 xtable_1.8-2            gtable_0.2.0           
[49] magrittr_1.5            scales_0.5.0            stringi_1.1.6           XVector_0.16.0          genefilter_1.58.1       bindrcpp_0.2           
[55] latticeExtra_0.6-28     Formula_1.2-2           RColorBrewer_1.1-2      tools_3.4.2             bit64_0.9-7             glue_1.2.0             
[61] purrr_0.2.4             survival_2.41-3         AnnotationDbi_1.38.2    colorspace_1.3-2        cluster_2.0.6           memoise_1.1.0          
[67] knitr_1.17              bindr_0.1   

ADD REPLY
0
Entering edit mode

Thanks for looking!

ADD REPLY
0
Entering edit mode

Ok, then, can you try this code from the vignette:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#independent-filtering-of-results

This will plot the reason that the high threshold was chosen. If the plot looks strange, then you may want to just pick a single reasonable threshold across all your result tables, and use independentFiltering=FALSE. Some code for that is in a recent thread:

C: contamination and DESeq2 performance

 

ADD REPLY

Login before adding your answer.

Traffic: 777 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6