I am trying to run Deseq2 analysis on a salmon output. When I import my data using tximeta, I get completely different results based on whether I used the summarizeToGene function. When I run the first set of parameters, I get 172 DEGs, while without running summarizeToGene I get 438. Why is there such a large difference when I use summarizeToGene and which result should I trust for downstream analysis

hippo12APP <- tximeta(coldata12APP)

##Filtering and Adjusting
hippo12APP <- addExons(hippo12APP)
hippo12APP <- summarizeToGene(hippo12APP)

ddsTxi12 <-DESeqDataSet(hippo12APP, design = ~ diet)

dds12 <- DESeq(ddsTxi12)

  row               baseMean        log2FoldChange        lfcSE             stat             pvalue         
 Length:172         Min.   :    5.96   Min.   :-6.2495   Min.   :0.1065   Min.   :-5.9385   Min.   :2.900e-09  
 Class :character   1st Qu.:   61.23   1st Qu.:-1.7460   1st Qu.:0.1518   1st Qu.:-3.8379   1st Qu.:2.666e-05  
 Mode  :character   Median :  281.04   Median :-0.5681   Median :0.2407   Median :-3.2730   Median :1.570e-04  
                    Mean   : 1445.57   Mean   :-0.5843   Mean   :0.3281   Mean   :-0.1817   Mean   :3.280e-04  
                    3rd Qu.: 1071.05   3rd Qu.: 0.5838   3rd Qu.:0.4575   3rd Qu.: 3.7496   3rd Qu.:5.804e-04  
                    Max.   :74531.16   Max.   : 2.8181   Max.   :1.6449   Max.   : 5.9377   Max.   :1.085e-03  
      padj              SYMBOL         
 Min.   :2.121e-05   Length:172        
 1st Qu.:9.582e-03   Class :character  
 Median :2.859e-02   Mode  :character  
 Mean   :4.042e-02                     
 3rd Qu.:7.098e-02                     
 Max.   :9.992e-02                     

##Running code above without summarizeToGene
     row               baseMean        log2FoldChange         lfcSE              stat             pvalue         
 Length:438         Min.   :    5.60   Min.   :-22.9352   Min.   :0.03517   Min.   :-10.018   Min.   :0.000e+00  
 Class :character   1st Qu.:   20.20   1st Qu.: -5.3601   1st Qu.:0.14304   1st Qu.: -3.719   1st Qu.:9.065e-07  
 Mode  :character   Median :   82.43   Median :  0.3354   Median :0.65931   Median :  3.389   Median :9.234e-05  
                    Mean   : 1264.74   Mean   : -1.0568   Mean   :1.05338   Mean   :  0.574   Mean   :2.564e-04  
                    3rd Qu.:  920.82   3rd Qu.:  1.1407   3rd Qu.:1.66442   3rd Qu.:  4.073   3rd Qu.:4.601e-04  
                    Max.   :74163.86   Max.   : 20.5094   Max.   :3.91060   Max.   :  9.387   Max.   :1.078e-03  
      padj              SYMBOL         
 Min.   :0.0000000   Length:438        
 1st Qu.:0.0003319   Class :character  
 Median :0.0169791   Mode  :character  
 Mean   :0.0297324                     
 3rd Qu.:0.0564928                     
 Max.   :0.0993225                     

R version 4.2.3 (2023-03-15 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8    LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                           LC_TIME=English_United States.utf8    

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tximeta_1.16.1             GO.db_3.16.0                AnnotationDbi_1.60.2       
 [5] lubridate_1.9.2             forcats_1.0.0               stringr_1.5.0               dplyr_1.1.1                
 [9] purrr_1.0.1                 tidyr_1.3.0                 tibble_3.2.1                ggplot2_3.4.2              
[13] tidyverse_2.0.0             readr_2.1.4                 DESeq2_1.38.3               SummarizedExperiment_1.28.0
[17] Biobase_2.58.0              MatrixGenerics_1.10.0       matrixStats_0.63.0          GenomicRanges_1.50.2       
[21] GenomeInfoDb_1.34.9         IRanges_2.32.0              S4Vectors_0.36.2            BiocGenerics_0.44.0        
[25] tximport_1.26.1            

loaded via a namespace (and not attached):
 [1] colorspace_2.1-0              rjson_0.2.21                  ellipsis_0.3.2               
 [4] XVector_0.38.0                rstudioapi_0.14               DT_0.27                      
 [7] bit64_4.0.5                   interactiveDisplayBase_1.36.0 fansi_1.0.4                  
[10] xml2_1.3.3                    codetools_0.2-19              cachem_1.0.7                 
[13] geneplotter_1.76.0            jsonlite_1.8.4                Rsamtools_2.14.0             
[16] annotate_1.76.0               dbplyr_2.3.2                  png_0.1-8                    
[19] shiny_1.7.4                   BiocManager_1.30.20           compiler_4.2.3               
[22] httr_1.4.5                    Matrix_1.5-3                  fastmap_1.1.1                
[25] lazyeval_0.2.2                cli_3.6.1                     later_1.3.0                  
[28] htmltools_0.5.5               prettyunits_1.1.1             tools_4.2.3                  
[31] gtable_0.3.3                  glue_1.6.2                    GenomeInfoDbData_1.2.9       
[34] rappdirs_0.3.3                Rcpp_1.0.10                   vctrs_0.6.1                  
[37] Biostrings_2.66.0             rtracklayer_1.58.0            timechange_0.2.0             
[40] mime_0.12                     lifecycle_1.0.3               restfulr_0.0.15              
[43] ensembldb_2.22.0              XML_3.99-0.14                 AnnotationHub_3.6.0          
[46] zlibbioc_1.44.0               scales_1.2.1                  vroom_1.6.1                  
[49] hms_1.1.3                     promises_1.2.0.1              ProtGenerics_1.30.0          
[52] parallel_4.2.3                AnnotationFilter_1.22.0       RColorBrewer_1.1-3           
[55] yaml_2.3.7                    curl_5.0.0                    memoise_2.0.1                
[58] biomaRt_2.54.1                stringi_1.7.12                RSQLite_2.3.1                
[61] BiocVersion_3.16.0            BiocIO_1.8.0                  GenomicFeatures_1.50.4       
[64] filelock_1.0.2                BiocParallel_1.32.6           rlang_1.1.0                  
[67] pkgconfig_2.0.3               bitops_1.0-7                  lattice_0.20-45              
[70] GenomicAlignments_1.34.1      htmlwidgets_1.6.2             bit_4.0.5                    
[73] tidyselect_1.2.0              magrittr_2.0.3                R6_2.5.1                     
[76] generics_0.1.3                DelayedArray_0.23.2           DBI_1.1.3                    
[79] pillar_1.9.0                  withr_2.5.0                   KEGGREST_1.38.0              
[82] RCurl_1.98-1.12               crayon_1.5.2                  utf8_1.2.3                   
[85] BiocFileCache_2.6.1           tzdb_0.3.0                    progress_1.2.2               
[88] locfit_1.5-9.7                grid_4.2.3                    blob_1.2.4                   
[91] digest_0.6.31                 xtable_1.8-4                  httpuv_1.6.9                 
[94] munsell_0.5.0
When you import counts by tximeta, it imports them as transcript-level quantifications. summarizeToGene() turns these counts from transcript to gene-level. The reason why you're probably seeing different results is because in the first chunk of code, you've performed a differential gene expression analysis, whereas in the second (without summarizeToGene()), it's a differential transcript expression analysis.

Agree. The biological meaning is different. Both are valid questions. You can have DTE in a gene leading to no DGE (DTU) or the other way around. The DGE question regards the total RNA output of the isoforms when summed together.


