Question

Understanding edgeR differential expression results

0

Entering edit mode

Teresa • 0

@a420c4c0

Last seen 2.1 years ago

United States

Hi all,

I'm doing metatranscriptomics differential expression analysis in R studio. I have 1-3 replicates per site and am trying to use edgeR. I want to compare Sites J and F, both when water is flowing (On) and not flowing (Off) with sites A-D, which are constant. However, I'm somewhat confused by my output and how to interpret this biologically or if my method of analysis was just incorrect... (Rounded numbers used below). How can OnvConstant and OffvConstant be similar to each other when BothvConstant, which include both On and Off samples in "Both," be different? I tried a few On/Off Site v Constant Site constrasts, like FvA, FvB, JvA. JvB, and these result in a large number of genes labeled as downregulated... While constant vs constant, like AvB, AvC, result in about a quarter up, half notsig, a quarter down. FvJ is mostly NotSig, with a few up and down.

fit.glm <- glmFit(d, design.mat)
glm.contrasts<-makeContrasts(
  OnvSB=(J.On+F.On)-(A.Constant+B.Constant+C.Constant+D.Constant),
  OffvSB=(J.Off+F.Off)-(A.Constant+B.Constant+C.Constant+D.Constant),
  BothVConstant=(J.On+F.On+J.Off+F.Off)-(A.Constant+B.Constant+C.Constant+D.Constant),
  levels=design.mat)

OnVConstant.glm <- glmLRT(fit.glm, contrast=glm.contrasts[,"OnVConstant.glmt"])
OnVConstant.glmDE <- decideTestsDGE(OnVConstant.glm, adjust.method = "fdr", p.value=0.1)
summary(OnVConstant.glmDE)
#output 
Down 5
NotSig 300
Up 4600

#
OffVConstant.glm <- glmLRT(fit.glm, contrast=glm.contrasts[,"OffVConstant"])
OffVConstant.glmDE <- decideTestsDGE(OffVConstant.glm, adjust.method = "fdr", p.value=0.1)
summary(OffVConstant.glmDE)
#output:
Down 10
NotSig 695
Up 4200

#
BothVConstant.glm <- glmLRT(fit.glm, contrast=glm.contrasts[,"BothVConstant"])
BothVConstant.glmDE <- decideTestsDGE(BothVConstant.glm, adjust.method = "fdr", p.value=0.1)
summary(BothVConstant.glmDE)
#output:
Down 800
NotSig 1700
Up 2405


sessionInfo( )
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] forcats_0.5.1               stringr_1.4.0              
 [3] dplyr_1.0.8                 purrr_0.3.4                
 [5] readr_2.1.2                 tidyr_1.2.0                
 [7] tibble_3.1.6                ggplot2_3.3.5              
 [9] tidyverse_1.3.1             viridis_0.6.2              
[11] viridisLite_0.4.0           pheatmap_1.0.12            
[13] argparse_2.1.6              ape_5.6-2                  
[15] gplots_3.1.3                ctc_1.64.0                 
[17] amap_0.8-18                 DESeq2_1.30.1              
[19] SummarizedExperiment_1.20.0 Biobase_2.50.0             
[21] MatrixGenerics_1.2.1        matrixStats_0.61.0         
[23] GenomicRanges_1.42.0        GenomeInfoDb_1.26.7        
[25] IRanges_2.24.1              S4Vectors_0.28.1           
[27] BiocGenerics_0.36.1         edgeR_3.32.1               
[29] limma_3.46.0                BiocManager_1.30.16        

loaded via a namespace (and not attached):
  [1] colorspace_2.0-3       ellipsis_0.3.2        
  [3] XVector_0.30.0         fs_1.5.0              
  [5] rstudioapi_0.13        bit64_4.0.5           
  [7] lubridate_1.8.0        AnnotationDbi_1.52.0  
  [9] fansi_1.0.2            xml2_1.3.3            
 [11] codetools_0.2-16       splines_4.0.2         
 [13] cachem_1.0.6           geneplotter_1.68.0    
 [15] ade4_1.7-19            jsonlite_1.8.0        
 [17] phyloseq_1.34.0        broom_0.7.12          
 [19] annotate_1.68.0        cluster_2.1.0         
 [21] dbplyr_2.1.1           compiler_4.0.2        
 [23] httr_1.4.2             backports_1.4.1       
 [25] assertthat_0.2.1       Matrix_1.3-2          
 [27] fastmap_1.1.0          cli_3.1.0             
 [29] tools_4.0.2            igraph_1.2.11         
 [31] gtable_0.3.0           glue_1.5.0            
 [33] GenomeInfoDbData_1.2.4 reshape2_1.4.4        
 [35] Rcpp_1.0.8             cellranger_1.1.0      
 [37] vctrs_0.3.8            Biostrings_2.58.0     
 [39] zCompositions_1.4.0-1  rhdf5filters_1.2.1    
 [41] multtest_2.46.0        nlme_3.1-155          
 [43] iterators_1.0.14       rvest_1.0.2           
 [45] lifecycle_1.0.1        gtools_3.9.2          
 [47] XML_3.99-0.9           zlibbioc_1.36.0       
 [49] MASS_7.3-55            scales_1.1.1          
 [51] vroom_1.5.7            hms_1.1.1             
 [53] biomformat_1.18.0      rhdf5_2.34.0          
 [55] RColorBrewer_1.1-2     memoise_2.0.1         
 [57] gridExtra_2.3          NADA_1.6-1.1          
 [59] stringi_1.7.6          RSQLite_2.2.10        
 [61] genefilter_1.72.1      foreach_1.5.2         
 [63] permute_0.9-7          caTools_1.18.2        
 [65] BiocParallel_1.24.1    truncnorm_1.0-8       
 [67] rlang_1.0.2            pkgconfig_2.0.3       
 [69] bitops_1.0-7           lattice_0.20-41       
 [71] Rhdf5lib_1.12.1        bit_4.0.4             
 [73] tidyselect_1.1.2       plyr_1.8.6            
 [75] magrittr_2.0.1         R6_2.5.1              
 [77] generics_0.1.2         DelayedArray_0.16.3   
 [79] DBI_1.1.2              withr_2.5.0           
 [81] pillar_1.7.0           haven_2.4.3           
 [83] mgcv_1.8-39            survival_3.3-1        
 [85] RCurl_1.98-1.6         modelr_0.1.8          
 [87] crayon_1.5.0           KernSmooth_2.23-17    
 [89] utf8_1.2.2             tzdb_0.2.0            
 [91] ALDEx2_1.22.0          readxl_1.3.1          
 [93] locfit_1.5-9.5         grid_4.0.2            
 [95] data.table_1.14.2      blob_1.2.2            
 [97] vegan_2.5-7            reprex_2.0.1          
 [99] xtable_1.8-4           munsell_0.5.0

Transcriptomics edgeR DifferentialExpression RNASeqData RNASeq • 1.2k views

ADD COMMENT • link updated 2.2 years ago by Gordon Smyth 52k • written 2.2 years ago by Teresa • 0

score 0 · Answer 1 · 2023-02-23

0

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 6 hours ago

WEHI, Melbourne, Australia

The contrasts you have created do not look correct. It makes no sense to compare sums of coefficients. You probably intended to compare averages:

  OnvSB = (J.On+F.On)/2 - (A.Constant+B.Constant+C.Constant+D.Constant)/4,
  OffvSB = (J.Off+F.Off)/2 - (A.Constant+B.Constant+C.Constant+D.Constant)/4,
  BothVConstant = (J.On+F.On+J.Off+F.Off)/4 - (A.Constant+B.Constant+C.Constant+D.Constant)/4,

ADD COMMENT • link 2.2 years ago Gordon Smyth 52k

0

Entering edit mode

Thanks! The results m make more sense this way, and averages is what I was hoping to compare. Is it ok that each one has a different number of replicates (J.On has 3, F.On has 1, A.Constant has 2, etc)? I'm unsure how to control for that.

ADD REPLY • link 2.2 years ago Teresa • 0

0

Entering edit mode

Yes, it is ok that different groups have different numbers of replicates. edgeR adjusts for that automatically. However you may get somewhat more significantly DE genes when comparing groups with more replicates because of the increased statistical power.

ADD REPLY • link 2.2 years ago Gordon Smyth 52k