Hi all,
I have control (C) and disease (D) groups, and the status of a mutation in each sample as absent (0), heterozygous (1), and homozygous (2) so that my design matrix looks like this:
C.0 C.1 C.2 D.0 D.1 D.2
sample.1 1 0 0 0 0 0
sample.2 1 0 0 0 0 0
sample.3 1 0 0 0 0 0
sample.4 0 1 0 0 0 0
sample.5 0 0 1 0 0 0
sample.6 0 0 1 0 0 0
sample.7 0 0 0 1 0 0
sample.8 0 0 0 1 0 0
sample.9 0 0 0 0 1 0
sample.10 0 0 0 0 1 0
sample.11 0 0 0 0 0 1
sample.12 0 0 0 0 0 1
Is there a way to format my contrasts argument to allow a comparison between D.1 and all C groups? I have tried the following, which was wrong:
qlf <- glmQLFTest(fit,contrast=c(-1,-1,-1,0,1,0))
A more correct answer (but still not correct) is the following:
qlf <- glmQLFTest(fit,contrast=c(-0.33,-0.33,-0.33,0,1,0))
Is there a simple way of writing what I want or am I better off collapsing C.0, C.1, and C.2 into a single control group and comparing D.1 vs C? Thank you.
sessionInfo( )
R version 4.1.1 (2021-08-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS
Matrix products: default
BLAS/LAPACK: /opt/conda/lib/libopenblasp-r0.3.17.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] EnhancedVolcano_1.10.0 ggrepel_0.9.1 edgeR_3.34.1
[4] limma_3.48.3 BiocManager_1.30.16 ggthemes_4.2.4
[7] gorr_1.4.4 forcats_0.5.1 stringr_1.4.0
[10] dplyr_1.0.7 purrr_0.3.4 readr_2.0.1
[13] tidyr_1.1.3 tibble_3.1.4 ggplot2_3.3.5
[16] tidyverse_1.3.1
loaded via a namespace (and not attached):
[1] fs_1.5.0 lubridate_1.7.10 bit64_4.0.5 ash_1.0-15
[5] RColorBrewer_1.1-2 httr_1.4.2 repr_1.1.3 tools_4.1.1
[9] backports_1.2.1 utf8_1.2.2 R6_2.5.1 KernSmooth_2.23-20
[13] vipor_0.4.5 DBI_1.1.1 colorspace_2.0-2 withr_2.4.2
[17] tidyselect_1.1.1 ggrastr_0.2.3 ggalt_0.4.0 bit_4.0.4
[21] curl_4.3.2 compiler_4.1.1 extrafontdb_1.0 cli_3.0.1
[25] rvest_1.0.1 Cairo_1.5-12.2 xml2_1.3.2 labeling_0.4.2
[29] scales_1.1.1 proj4_1.0-10.1 askpass_1.1 pbdZMQ_0.3-5
[33] digest_0.6.28 base64enc_0.1-3 pkgconfig_2.0.3 htmltools_0.5.2
[37] extrafont_0.17 dbplyr_2.1.1 fastmap_1.1.0 maps_3.4.0
[41] rlang_0.4.11 readxl_1.3.1 rstudioapi_0.13 farver_2.1.0
[45] generics_0.1.0 jsonlite_1.7.2 vroom_1.5.5 magrittr_2.0.1
[49] Rcpp_1.0.7 ggbeeswarm_0.6.0 IRkernel_1.2 munsell_0.5.0
[53] fansi_0.4.2 lifecycle_1.0.1 stringi_1.7.4 MASS_7.3-54
[57] grid_4.1.1 parallel_4.1.1 crayon_1.4.1 lattice_0.20-45
[61] splines_4.1.1 IRdisplay_1.0 haven_2.4.3 hms_1.1.1
[65] locfit_1.5-9.4 pillar_1.6.3 uuid_0.1-4 reprex_2.0.1
[69] glue_1.4.2 evaluate_0.14 modelr_0.1.8 vctrs_0.3.8
[73] tzdb_0.1.2 Rttf2pt1_1.3.9 cellranger_1.1.0 gtable_0.3.0
[77] openssl_1.4.5 assertthat_0.2.1 broom_0.7.9 beeswarm_0.4.0
[81] ellipsis_0.3.2
I had been using the collapsed C.0, C.1, C.2 as the ground truth, as the results from the second contrast were quite close to the collapsed 'ground truth' (which is conservative as you say).
Apologies for beating a dead horse, but I'm struggling to understand how the second contrast is less conservative than the collapse method:
Could you explain this please?
Thank you very much for you help.
Because the residual variability (as measured by the two dispersion parameters) is smaller. If you fit the full correct model (with C.0, C.1 and C.2) then the residual dispersion is estimated from pure error. If you fit an overly simple model (with C) then the residual dispersion is inflated by systematic differences.