I have 14 samples divide in two groups ( 11 vs 3 samples).
After normal deseq2 work-flow I found this results:
out of 22492 with nonzero total read count adjusted p-value < 0.1 LFC > 0 (up) : 20748, 92% LFC < 0 (down) : 100, 0.44% outliers [1] : 0, 0% low counts [2] : 21, 0.093% (mean count < 0) [1] see 'cooksCutoff' argument of ?results [2] see 'independentFiltering' argument of ?results
Whi all the genes are differential expressed...i
> head(res)
log2 fold change (MLE): Intercept
Wald test p-value: Intercept
DataFrame with 6 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue
<numeric> <numeric> <numeric> <numeric> <numeric>
ENSG00000000003 223.53931 7.804385 0.29525254 26.43291 5.736351e-154
ENSG00000000419 106.27997 6.731726 0.10858462 61.99520 0.000000e+00
ENSG00000000457 127.99856 6.999984 0.09604285 72.88396 0.000000e+00
ENSG00000000460 59.39814 5.892346 0.14531761 40.54805 0.000000e+00
ENSG00000000938 20.36792 4.348227 0.35805062 12.14417 6.160687e-34
ENSG00000000971 3296.12479 11.686555 0.34137287 34.23399 7.549703e-257
padj
<numeric>
ENSG00000000003 1.339792e-153
ENSG00000000419 0.000000e+00
ENSG00000000457 0.000000e+00
ENSG00000000460 0.000000e+00
ENSG00000000938 9.749256e-34
ENSG00000000971 2.170346e-256
> sessionInfo() R version 3.3.2 (2016-10-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.1 LTS locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=it_IT.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=it_IT.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=it_IT.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods [9] base other attached packages: [1] pheatmap_1.0.8 ggplot2_2.2.1 genefilter_1.54.2 [4] biomaRt_2.28.0 DESeq2_1.12.4 SummarizedExperiment_1.2.3 [7] Biobase_2.32.0 GenomicRanges_1.24.3 GenomeInfoDb_1.8.7 [10] IRanges_2.6.1 S4Vectors_0.10.3 BiocGenerics_0.18.0 loaded via a namespace (and not attached): [1] locfit_1.5-9.1 splines_3.3.2 lattice_0.20-34 colorspace_1.3-2 [5] htmltools_0.3.5 base64enc_0.1-3 survival_2.40-1 XML_3.98-1.5 [9] foreign_0.8-67 DBI_0.5-1 BiocParallel_1.6.6 RColorBrewer_1.1-2 [13] plyr_1.8.4 stringr_1.1.0 zlibbioc_1.18.0 munsell_0.4.3 [17] gtable_0.2.0 htmlwidgets_0.8 memoise_1.0.0 labeling_0.3 [21] latticeExtra_0.6-28 knitr_1.15.1 geneplotter_1.50.0 AnnotationDbi_1.34.4 [25] htmlTable_1.9 Rcpp_0.12.9 acepack_1.4.1 xtable_1.8-2 [29] backports_1.0.5 scales_0.4.1 checkmate_1.8.2 limma_3.28.21 [33] Hmisc_4.0-2 annotate_1.50.1 XVector_0.12.1 gridExtra_2.2.1 [37] digest_0.6.12 stringi_1.1.2 grid_3.3.2 tools_3.3.2 [41] bitops_1.0-6 magrittr_1.5 lazyeval_0.2.0 RCurl_1.95-4.8 [45] tibble_1.2 RSQLite_1.1-2 Formula_1.2-1 cluster_2.0.5 [49] Matrix_1.2-8 data.table_1.10.0 assertthat_0.1 rpart_4.1-10 [53] nnet_7.3-12
The sample are two type of tumors. I want to understand if there is a difference on expression profile between this groups.
The sample size are 53ML of unique reads. I have some sample with low number of reads as you can see on the picture http://imgur.com/a/zFrpn .
mean(depth[depth$condition == "A",]$Count)
[1] 4.86
>
> mean(depth[depth$condition == "B",]$Count)
[1] 9.87
summary(depth[depth$condition == "A",]$Count)
Min. 1st Qu. Median Mean 3rd Qu. Max.
7.40 7.70 8.00 9.87 11.10 14.20
> summary(depth[depth$condition != "B",]$Count)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.80 2.25 3.10 4.86 5.70 16.80