Hi
I am fitting some time course RNA-seq data using glmQLFit and I then wanted to use glmTreat to test either of two coefficients with 0.5 lfc. Here is the simplified version (with fewer samples).
set.seed(0) T <- 24 t <- rep(seq(3,42, by=3), 11) c <- cos(2*pi*t/T) # a time course basis function s <- sin(2*pi*t/T) batch_effect <- factor(ceiling(runif(14*11)*11)) subjects <- sprintf("P%02d", rep(1:11, each = 14)) design1 <- model.matrix(~ batch_effect + c + s) fit <- glmQLFit(y, design, robust=TRUE)
These work
treat1 <- glmTreat(fit, coef="c", lfc=0.5) treat2 <- glmTreat(fit, coef=c("c", "s"), lfc=0)
but this gives an error
treat3 <- glmTreat(fit, coef=c("c","s"), lfc=0.5)
error: Error in .addCompressedMatrices(offset.old, offset.adj) :
matrix dimensions are not consistent for non-repeating number of columns
Instead, if I use contrasts instead of specifying coefficients, I get a different error
treat4 <- glmTreat(fit, contrast = con, lfc=0.5)
error: Error in data.frame(logCPM = glmfit$AveLogCPM, PValue = p.value, row.names = rownames(glmfit)) :
row names supplied are of the wrong length
Is glmTreat not design to be used for multi-coefficient or analysis of deviance situtations? The help page seems to suggested multiple coefficients at least are permitted.
Thank you for your help.
My sessionInfo is
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)
Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.12.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2 sva_3.24.4 BiocParallel_1.10.1 genefilter_1.58.1 mgcv_1.8-22
[6] nlme_3.1-131 SummarizedExperiment_1.6.5 DelayedArray_0.2.7 matrixStats_0.52.2 Biobase_2.36.2
[11] GenomicRanges_1.28.4 GenomeInfoDb_1.12.3 IRanges_2.10.5 S4Vectors_0.14.7 BiocGenerics_0.22.1
[16] ggplot2_2.2.1 PoiClaClu_1.0.2 magrittr_1.5 edgeR_3.18.1 limma_3.32.5
loaded via a namespace (and not attached):
[1] tidyr_0.7.2 bit64_0.9-7 splines_3.4.2 Formula_1.2-2 assertthat_0.2.0 statmod_1.4.30
[7] latticeExtra_0.6-28 blob_1.1.0 cellranger_1.1.0 GenomeInfoDbData_0.99.0 yaml_2.1.15 RSQLite_2.0
[13] backports_1.1.1 lattice_0.20-35 glue_1.2.0 digest_0.6.12 RColorBrewer_1.1-2 XVector_0.16.0
[19] checkmate_1.8.5 colorspace_1.3-2 htmltools_0.3.6 Matrix_1.2-11 plyr_1.8.4 XML_3.98-1.9
[25] pkgconfig_2.0.1 zlibbioc_1.22.0 purrr_0.2.4 xtable_1.8-2 scales_0.5.0 pracma_2.1.1
[31] htmlTable_1.11.0 tibble_1.3.4 annotate_1.54.0 nnet_7.3-12 lazyeval_0.2.1 readxl_1.0.0
[37] survival_2.41-3 memoise_1.1.0 foreign_0.8-69 tools_3.4.2 data.table_1.10.4-3 stringr_1.2.0
[43] munsell_0.4.3 locfit_1.5-9.1 cluster_2.0.6 AnnotationDbi_1.38.2 compiler_3.4.2 rlang_0.1.2
[49] grid_3.4.2 RCurl_1.95-4.8 rstudioapi_0.7 htmlwidgets_0.9 labeling_0.3 bitops_1.0-6
[55] base64enc_0.1-3 gtable_0.2.0 DBI_0.7 R6_2.2.2 gridExtra_2.3 knitr_1.17
[61] dplyr_0.7.4 bit_1.1-12 bindr_0.1 Hmisc_4.0-3 stringi_1.1.6 Rcpp_0.12.12
[67] geneplotter_1.54.0 rpart_4.1-11 acepack_1.4.1 tidyselect_0.2.3
Normally the null hypothesis for the glm would be (in my case) H0: c=0 AND s =0.
What I want to test is the null hypothesis H0: |c| < lfc AND |s| <lfc using glmTreat. Is this inconceivable?
The problem with that is that the region defined by your null hypothesis is now dependent on how you've parametrized your design matrix in ways that are probably not intuitive. That is, tests using different design matrices that would be equivalent in glmLRT or glmQLFTest would give different results in glmTreat.
Just to give a few other possible ways to define the null hypothesis for multi-coefficient treat that make equal sense, consider: |c| + |s| < threshold, or c^2 + s^2 < threshold^2. Of course, all of these are still dependent on the parametrization.
To elaborate on Ryan's example, say we have two contrasts,
A - B
andA - C
. If we testA - B = 0
andA - C = 0
via an ANODEV, this is the same null hypothesis asA - B = 0
andB - C = 0
. However, if we ask for|A - B| <= 1
and|A - C| <= 1
, this is not equivalent to|A - B| <= 1
and|B - C| <= 1
. (For example, the first version is true while the second is not whenA = 0
,B = 1
andC = -1
.)This isn't an insurmountable difficulty - it just requires some care when phrasing the contrasts - but it does highlight the issues with dealing with composite null hypotheses. Indeed, when you start stacking together contrasts to try to perform a log-fold-change-aware equivalent of an ANODEV, you will become increasingly conservative if you only perform testing at the
lfc
limit, liketreat
; or you will need to consider the correlation in the true effect sizes between contrasts to obtain an appropriate p-value, which does not seem possible on a gene-by-gene basis. Maybe something could be made to work by sharing information across genes with empirical Bayes, but this would require some thought.