Hello
I am trying to run Deseq2 analysis on a salmon output. When I import my data using tximeta, I get completely different results based on whether I used the summarizeToGene function. When I run the first set of parameters, I get 172 DEGs, while without running summarizeToGene I get 438. Why is there such a large difference when I use summarizeToGene and which result should I trust for downstream analysis
hippo12APP <- tximeta(coldata12APP)
##Filtering and Adjusting
hippo12APP <- addExons(hippo12APP)
hippo12APP <- summarizeToGene(hippo12APP)
ddsTxi12 <-DESeqDataSet(hippo12APP, design = ~ diet)
dds12 <- DESeq(ddsTxi12)
row baseMean log2FoldChange lfcSE stat pvalue
Length:172 Min. : 5.96 Min. :-6.2495 Min. :0.1065 Min. :-5.9385 Min. :2.900e-09
Class :character 1st Qu.: 61.23 1st Qu.:-1.7460 1st Qu.:0.1518 1st Qu.:-3.8379 1st Qu.:2.666e-05
Mode :character Median : 281.04 Median :-0.5681 Median :0.2407 Median :-3.2730 Median :1.570e-04
Mean : 1445.57 Mean :-0.5843 Mean :0.3281 Mean :-0.1817 Mean :3.280e-04
3rd Qu.: 1071.05 3rd Qu.: 0.5838 3rd Qu.:0.4575 3rd Qu.: 3.7496 3rd Qu.:5.804e-04
Max. :74531.16 Max. : 2.8181 Max. :1.6449 Max. : 5.9377 Max. :1.085e-03
padj SYMBOL
Min. :2.121e-05 Length:172
1st Qu.:9.582e-03 Class :character
Median :2.859e-02 Mode :character
Mean :4.042e-02
3rd Qu.:7.098e-02
Max. :9.992e-02
##Running code above without summarizeToGene
summary(resSigAPP12)
row baseMean log2FoldChange lfcSE stat pvalue
Length:438 Min. : 5.60 Min. :-22.9352 Min. :0.03517 Min. :-10.018 Min. :0.000e+00
Class :character 1st Qu.: 20.20 1st Qu.: -5.3601 1st Qu.:0.14304 1st Qu.: -3.719 1st Qu.:9.065e-07
Mode :character Median : 82.43 Median : 0.3354 Median :0.65931 Median : 3.389 Median :9.234e-05
Mean : 1264.74 Mean : -1.0568 Mean :1.05338 Mean : 0.574 Mean :2.564e-04
3rd Qu.: 920.82 3rd Qu.: 1.1407 3rd Qu.:1.66442 3rd Qu.: 4.073 3rd Qu.:4.601e-04
Max. :74163.86 Max. : 20.5094 Max. :3.91060 Max. : 9.387 Max. :1.078e-03
padj SYMBOL
Min. :0.0000000 Length:438
1st Qu.:0.0003319 Class :character
Median :0.0169791 Mode :character
Mean :0.0297324
3rd Qu.:0.0564928
Max. :0.0993225
Sessioninfo()
R version 4.2.3 (2023-03-15 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C LC_TIME=English_United States.utf8
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] tximeta_1.16.1 org.Mm.eg.db_3.16.0 GO.db_3.16.0 AnnotationDbi_1.60.2
[5] lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0 dplyr_1.1.1
[9] purrr_1.0.1 tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.2
[13] tidyverse_2.0.0 readr_2.1.4 DESeq2_1.38.3 SummarizedExperiment_1.28.0
[17] Biobase_2.58.0 MatrixGenerics_1.10.0 matrixStats_0.63.0 GenomicRanges_1.50.2
[21] GenomeInfoDb_1.34.9 IRanges_2.32.0 S4Vectors_0.36.2 BiocGenerics_0.44.0
[25] tximport_1.26.1
loaded via a namespace (and not attached):
[1] colorspace_2.1-0 rjson_0.2.21 ellipsis_0.3.2
[4] XVector_0.38.0 rstudioapi_0.14 DT_0.27
[7] bit64_4.0.5 interactiveDisplayBase_1.36.0 fansi_1.0.4
[10] xml2_1.3.3 codetools_0.2-19 cachem_1.0.7
[13] geneplotter_1.76.0 jsonlite_1.8.4 Rsamtools_2.14.0
[16] annotate_1.76.0 dbplyr_2.3.2 png_0.1-8
[19] shiny_1.7.4 BiocManager_1.30.20 compiler_4.2.3
[22] httr_1.4.5 Matrix_1.5-3 fastmap_1.1.1
[25] lazyeval_0.2.2 cli_3.6.1 later_1.3.0
[28] htmltools_0.5.5 prettyunits_1.1.1 tools_4.2.3
[31] gtable_0.3.3 glue_1.6.2 GenomeInfoDbData_1.2.9
[34] rappdirs_0.3.3 Rcpp_1.0.10 vctrs_0.6.1
[37] Biostrings_2.66.0 rtracklayer_1.58.0 timechange_0.2.0
[40] mime_0.12 lifecycle_1.0.3 restfulr_0.0.15
[43] ensembldb_2.22.0 XML_3.99-0.14 AnnotationHub_3.6.0
[46] zlibbioc_1.44.0 scales_1.2.1 vroom_1.6.1
[49] hms_1.1.3 promises_1.2.0.1 ProtGenerics_1.30.0
[52] parallel_4.2.3 AnnotationFilter_1.22.0 RColorBrewer_1.1-3
[55] yaml_2.3.7 curl_5.0.0 memoise_2.0.1
[58] biomaRt_2.54.1 stringi_1.7.12 RSQLite_2.3.1
[61] BiocVersion_3.16.0 BiocIO_1.8.0 GenomicFeatures_1.50.4
[64] filelock_1.0.2 BiocParallel_1.32.6 rlang_1.1.0
[67] pkgconfig_2.0.3 bitops_1.0-7 lattice_0.20-45
[70] GenomicAlignments_1.34.1 htmlwidgets_1.6.2 bit_4.0.5
[73] tidyselect_1.2.0 magrittr_2.0.3 R6_2.5.1
[76] generics_0.1.3 DelayedArray_0.23.2 DBI_1.1.3
[79] pillar_1.9.0 withr_2.5.0 KEGGREST_1.38.0
[82] RCurl_1.98-1.12 crayon_1.5.2 utf8_1.2.3
[85] BiocFileCache_2.6.1 tzdb_0.3.0 progress_1.2.2
[88] locfit_1.5-9.7 grid_4.2.3 blob_1.2.4
[91] digest_0.6.31 xtable_1.8-4 httpuv_1.6.9
[94] munsell_0.5.0
Agree. The biological meaning is different. Both are valid questions. You can have DTE in a gene leading to no DGE (DTU) or the other way around. The DGE question regards the total RNA output of the isoforms when summed together.