Hi All,
I am comparing bulkRNA-seq data for differential expression between three groups. When I keep all the samples together and run a contrast between two groups I get less numbers of DEGs compared to when I just take two groups of samples at a time and perform the DE analysis? Between them the common genes were less than 50%. Is this difference expected? What is the best way to perform this kind of analysis? Thank you very much.
>SampleData.selected$Disease_Status
[1] HI_MS HI_MS HI_MS LI_MS LI_MS LI_MS CTRL HI_MS HI_MS HI_MS HI_MS LI_MS LI_MS LI_MS CTRL CTRL CTRL CTRL
Levels: CTRL LI_MS HI_MS
#############taking all three groups at a time
dds <- DESeqDataSetFromMatrix(countData = Count.selected,
colData = SampleData.selected,
design = ~RNA_Batch+RIN+Sex+Disease_Status)
dds <- estimateSizeFactors(dds)
idx <- rowSums( counts(dds, normalized=TRUE) > 0) >= 5
dds <- dds[idx,]
dds <- DESeq(dds)
res <- results(dds, contrast=c("Disease_Status","HI_MS","CTRL"), independentFiltering = TRUE)
res <- res[order(res$padj), ]
summary(res)
out of 37443 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 48, 0.13%
LFC < 0 (down) : 12, 0.032%
outliers [1] : 0, 0%
low counts [2] : 0, 0%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
################## taking just two groups at a time
keep <- which(SampleData$Disease_Status %in% c("CTRL", "HI_MS"))
dds <- DESeqDataSetFromMatrix(countData = Count.selected[, keep],
colData = SampleData.selected[keep, ],
design = ~RNA_Batch+RIN+Sex+Disease_Status)
dds <- estimateSizeFactors(dds)
idx <- rowSums( counts(dds, normalized=TRUE) > 0) >= 5
dds <- dds[idx,]
dds <- DESeq(dds)
res <- results(dds, contrast=c("Disease_Status","HI_MS","CTRL"), independentFiltering = TRUE)
res <- res[order(res$padj), ]
summary(res)
out of 34811 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 62, 0.18%
LFC < 0 (down) : 25, 0.072%
outliers [1] : 0, 0%
low counts [2] : 9449, 27%
(mean count < 4)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
> intersect(rownames(res.two.groups)[1:87], rownames(res.all.groups.together)[1:60])
[1] "IGHG1" "IGKC" "A2M" "FCGBP" "CAPS" "LINC00551" "COL1A2" "RASSF9" "NES" "OLFML2A" "COL5A1" "CPAMD8"
[13] "ESAM" "SNX31" "CLIC6" "COL4A6" "C10orf105" "LEF1" "HPSE2" "RPS2P5" "NPIPB2"
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8
[6] LC_MESSAGES=en_GB.UTF-8 LC_PAPER=en_GB.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] psych_1.8.12 AnnotationHub_2.16.1 BiocFileCache_1.8.0 dbplyr_1.4.2 clusterProfiler_3.12.0
[6] pathfindR_1.3.0 pathview_1.24.0 org.Hs.eg.db_3.8.2 AnnotationDbi_1.46.0 knitr_1.22
[11] ComplexHeatmap_2.0.0 ggpubr_0.2 magrittr_1.5 ggrepel_0.8.1 reshape2_1.4.3
[16] dplyr_0.8.0.1 ggplot2_3.1.1 pheatmap_1.0.12 DESeq2_1.24.0 EDASeq_2.18.0
[21] ShortRead_1.42.0 GenomicAlignments_1.20.0 SummarizedExperiment_1.14.0 DelayedArray_0.10.0 matrixStats_0.54.0
[26] Rsamtools_2.0.0 GenomicRanges_1.36.0 GenomeInfoDb_1.20.0 Biostrings_2.52.0 XVector_0.24.0
[31] IRanges_2.18.0 S4Vectors_0.22.0 BiocParallel_1.18.0 Biobase_2.44.0 BiocGenerics_0.30.0
loaded via a namespace (and not attached):
[1] R.utils_2.8.0 tidyselect_0.2.5 RSQLite_2.1.1 htmlwidgets_1.3
[5] DESeq_1.36.0 munsell_0.5.0 codetools_0.2-16 withr_2.1.2
[9] colorspace_1.4-1 GOSemSim_2.10.0 rstudioapi_0.10 DOSE_3.10.2
[13] KEGGgraph_1.44.0 urltools_1.7.3 GenomeInfoDbData_1.2.1 mnormt_1.5-5
[17] hwriter_1.3.2 polyclip_1.10-0 MCMCpack_1.4-4 bit64_0.9-7
[21] farver_1.1.0 coda_0.19-3 xfun_0.6 R6_2.4.0
[25] doParallel_1.0.14 clue_0.3-57 graphlayouts_0.5.0 locfit_1.5-9.1
[29] bitops_1.0-6 fgsea_1.10.1 gridGraphics_0.4-1 assertthat_0.2.1
[33] promises_1.0.1 scales_1.0.0 ggraph_2.0.0 nnet_7.3-12
[37] enrichplot_1.4.0 gtable_0.3.0 mcmc_0.9-6 tidygraph_1.1.2
[41] rlang_0.4.0 MatrixModels_0.4-1 genefilter_1.66.0 GlobalOptions_0.1.1
[45] splines_3.6.1 rtracklayer_1.44.0 lazyeval_0.2.2 acepack_1.4.1
[49] europepmc_0.3 checkmate_1.9.3 BiocManager_1.30.4 yaml_2.2.0
[53] GenomicFeatures_1.36.0 backports_1.1.4 httpuv_1.5.1 qvalue_2.16.0
[57] Hmisc_4.2-0 tools_3.6.1 ggplotify_0.0.4 RColorBrewer_1.1-2
[61] ggridges_0.5.1 Rcpp_1.0.1 plyr_1.8.4 base64enc_0.1-3
[65] progress_1.2.1 zlibbioc_1.30.0 purrr_0.3.2 RCurl_1.95-4.12
[69] prettyunits_1.0.2 rpart_4.1-15 GetoptLong_0.1.7 viridis_0.5.1
[73] cowplot_0.9.4 cluster_2.1.0 data.table_1.12.2 DO.db_2.9
[77] SparseM_1.77 circlize_0.4.8 triebeard_0.3.0 aroma.light_3.14.0
[81] mime_0.6 hms_0.4.2 evaluate_0.13 xtable_1.8-4
[85] XML_3.98-1.19 gridExtra_2.3 shape_1.4.4 compiler_3.6.1
[89] biomaRt_2.40.0 tibble_2.1.1 crayon_1.3.4 R.oo_1.22.0
[93] htmltools_0.3.6 later_0.8.0 Formula_1.2-3 tidyr_0.8.3
[97] geneplotter_1.62.0 DBI_1.0.0 tweenr_1.0.1 MASS_7.3-51.4
[101] MuSiC_0.1.1 rappdirs_0.3.1 Matrix_1.2-17 R.methodsS3_1.7.1
[105] igraph_1.2.4.1 pkgconfig_2.0.2 rvcheck_0.1.5 foreign_0.8-72
[109] xml2_1.2.0 foreach_1.4.4 annotate_1.62.0 stringr_1.4.0
[113] digest_0.6.18 graph_1.62.0 rmarkdown_1.14 fastmatch_1.1-0
[117] htmlTable_1.13.1 curl_3.3 shiny_1.3.2 quantreg_5.38
[121] rjson_0.2.20 nlme_3.1-141 jsonlite_1.6 viridisLite_0.3.0
[125] pillar_1.4.0 lattice_0.20-38 KEGGREST_1.24.0 httr_1.4.0
[129] survival_2.44-1.1 GO.db_3.8.2 interactiveDisplayBase_1.22.0 glue_1.3.1
[133] UpSetR_1.4.0 png_0.1-7 iterators_1.0.10 bit_1.1-14
[137] Rgraphviz_2.28.0 ggforce_0.3.1 stringi_1.4.3 nnls_1.4
[141] blob_1.1.1 latticeExtra_0.6-28 memoise_1.1.0
Hi Kevin,
Thank you very much for your reply. Yes, my hunch was that, this is the case (removing samples will change the normalisation). Thanks for further explaining the reason for exclusion.
Nurun