Question

edgeR contrasts confusion when multiple groups are involved

0

Entering edit mode

mydonkeysonfire • 0

@506e191c

Last seen 3.1 years ago

Ireland

Hi all,

I have control (C) and disease (D) groups, and the status of a mutation in each sample as absent (0), heterozygous (1), and homozygous (2) so that my design matrix looks like this:

                C.0     C.1     C.2     D.0       D.1       D.2
sample.1        1       0       0       0         0         0
sample.2        1       0       0       0         0         0
sample.3        1       0       0       0         0         0
sample.4        0       1       0       0         0         0
sample.5        0       0       1       0         0         0
sample.6        0       0       1       0         0         0
sample.7        0       0       0       1         0         0
sample.8        0       0       0       1         0         0
sample.9        0       0       0       0         1         0
sample.10       0       0       0       0         1         0
sample.11       0       0       0       0         0         1
sample.12       0       0       0       0         0         1

Is there a way to format my contrasts argument to allow a comparison between D.1 and all C groups? I have tried the following, which was wrong:

qlf <- glmQLFTest(fit,contrast=c(-1,-1,-1,0,1,0))

A more correct answer (but still not correct) is the following:

qlf <- glmQLFTest(fit,contrast=c(-0.33,-0.33,-0.33,0,1,0))

Is there a simple way of writing what I want or am I better off collapsing C.0, C.1, and C.2 into a single control group and comparing D.1 vs C? Thank you.

sessionInfo( )
R version 4.1.1 (2021-08-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS/LAPACK: /opt/conda/lib/libopenblasp-r0.3.17.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] EnhancedVolcano_1.10.0 ggrepel_0.9.1          edgeR_3.34.1          
 [4] limma_3.48.3           BiocManager_1.30.16    ggthemes_4.2.4        
 [7] gorr_1.4.4             forcats_0.5.1          stringr_1.4.0         
[10] dplyr_1.0.7            purrr_0.3.4            readr_2.0.1           
[13] tidyr_1.1.3            tibble_3.1.4           ggplot2_3.3.5         
[16] tidyverse_1.3.1       

loaded via a namespace (and not attached):
 [1] fs_1.5.0           lubridate_1.7.10   bit64_4.0.5        ash_1.0-15        
 [5] RColorBrewer_1.1-2 httr_1.4.2         repr_1.1.3         tools_4.1.1       
 [9] backports_1.2.1    utf8_1.2.2         R6_2.5.1           KernSmooth_2.23-20
[13] vipor_0.4.5        DBI_1.1.1          colorspace_2.0-2   withr_2.4.2       
[17] tidyselect_1.1.1   ggrastr_0.2.3      ggalt_0.4.0        bit_4.0.4         
[21] curl_4.3.2         compiler_4.1.1     extrafontdb_1.0    cli_3.0.1         
[25] rvest_1.0.1        Cairo_1.5-12.2     xml2_1.3.2         labeling_0.4.2    
[29] scales_1.1.1       proj4_1.0-10.1     askpass_1.1        pbdZMQ_0.3-5      
[33] digest_0.6.28      base64enc_0.1-3    pkgconfig_2.0.3    htmltools_0.5.2   
[37] extrafont_0.17     dbplyr_2.1.1       fastmap_1.1.0      maps_3.4.0        
[41] rlang_0.4.11       readxl_1.3.1       rstudioapi_0.13    farver_2.1.0      
[45] generics_0.1.0     jsonlite_1.7.2     vroom_1.5.5        magrittr_2.0.1    
[49] Rcpp_1.0.7         ggbeeswarm_0.6.0   IRkernel_1.2       munsell_0.5.0     
[53] fansi_0.4.2        lifecycle_1.0.1    stringi_1.7.4      MASS_7.3-54       
[57] grid_4.1.1         parallel_4.1.1     crayon_1.4.1       lattice_0.20-45   
[61] splines_4.1.1      IRdisplay_1.0      haven_2.4.3        hms_1.1.1         
[65] locfit_1.5-9.4     pillar_1.6.3       uuid_0.1-4         reprex_2.0.1      
[69] glue_1.4.2         evaluate_0.14      modelr_0.1.8       vctrs_0.3.8       
[73] tzdb_0.1.2         Rttf2pt1_1.3.9     cellranger_1.1.0   gtable_0.3.0      
[77] openssl_1.4.5      assertthat_0.2.1   broom_0.7.9        beeswarm_0.4.0    
[81] ellipsis_0.3.2

edgeR • 1.1k views

ADD COMMENT • link updated 3.2 years ago by Gordon Smyth 52k • written 3.2 years ago by mydonkeysonfire • 0

score 2 · Accepted Answer · 2021-10-12

2

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 13 minutes ago

WEHI, Melbourne, Australia

Why do you say that the second contrast is "still not correct"? It seems the obvious way to make the comparison you want.

You can collapse C.0, C.1 and C.2 into one group if mutation status has no effect in the controls, otherwise not. Collapsing into one group is not actually wrong but it will be more conservative than the contrast if there is genuine DE between C.0, C.1 and C.2.

ADD COMMENT • link 3.2 years ago Gordon Smyth 52k

0

Entering edit mode

I had been using the collapsed C.0, C.1, C.2 as the ground truth, as the results from the second contrast were quite close to the collapsed 'ground truth' (which is conservative as you say).

Apologies for beating a dead horse, but I'm struggling to understand how the second contrast is less conservative than the collapse method:

D.1 - (C.0 + C.1 + C.2)/3  #contrast method
D.1 - C                    #collapse method

Could you explain this please?

Thank you very much for you help.

ADD REPLY • link 3.2 years ago mydonkeysonfire • 0

0

Entering edit mode

Because the residual variability (as measured by the two dispersion parameters) is smaller. If you fit the full correct model (with C.0, C.1 and C.2) then the residual dispersion is estimated from pure error. If you fit an overly simple model (with C) then the residual dispersion is inflated by systematic differences.

ADD REPLY • link 3.2 years ago Gordon Smyth 52k