Hi experts,
I am trying to compare methylation for people with four combinations of childhood/adulthood socioeconomic status: 160 = LOW/LOW, 90 = LOW/HIGH, 90 = HIGH/LOW, 160 = HIGH/LOW.
I want to look for differences in DNAm between these groups.
My design matrix appears as follows:
(Intercept) SEA_grphilo SEA_grplohi SEA_grplolo sex1 20867 1 0 0 0 1 20373 1 0 0 1 1 21987 1 0 0 0 1 20685 1 0 0 0 1 20827 1 0 0 0 1 22421 1 0 0 1 1
I run lmFit() no problem:
> fit.SEA_grp_chg <-lmFit(matrix, design, na.action = na.omit)
I run eBayes to calculate the variance:
> fit.SEA_grp_chg <-eBayes(fit.SEA_grp_chg, design)
But I get the following error:
Error in rep.int(0, ntarget) : invalid 'times' value In addition: Warning messages: 1: In if (ntarget < 1) return(NA) : the condition has length > 1 and only the first element will be used 2: In 1:ntarget : numerical expression has 2465 elements: only the first used 3: In 1:ntarget : numerical expression has 2465 elements: only the first used
I don't know if this is something to be concerned about, and if/how it might be affecting my outcome, and can't seem to find anything about it.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> sessionInfo() R version 3.3.2 (2016-10-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux Server release 6.8 (Santiago) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] grid parallel stats graphics grDevices utils datasets methods [9] base other attached packages: [1] VennDiagram_1.6.17 futile.logger_1.4.3 gplots_3.0.1 readstata13_0.9.0 [5] Biobase_2.34.0 BiocGenerics_0.20.0 limma_3.30.13 loaded via a namespace (and not attached): [1] Rcpp_0.12.11 knitr_1.15.1 magrittr_1.5 munsell_0.4.3 [5] colorspace_1.3-2 R6_2.2.0 plyr_1.8.4 dplyr_0.5.0 [9] caTools_1.17.1 tools_3.3.2 gtable_0.2.0 KernSmooth_2.23-15 [13] DBI_0.6 lambda.r_1.1.9 gtools_3.5.0 yaml_2.1.14 [17] lazyeval_0.2.0 assertthat_0.1 tibble_1.2 ggplot2_2.2.1 [21] futile.options_1.0.0 bitops_1.0-6 gdata_2.18.0 scales_0.4.1
Thanks James, you just pointed out an obvious error in my code I was totally staring at yet still missing - now I feel silly!
By blocking on sex, you are accounting for female-specific differences. So one way you could interpret that contrast is that you are comparing the hilo and lohi samples after adjusting for sex. But this implies that any change in expression due to sex is simply a consistent shift in the expression level.
In other words, the 'Female' coefficient in this model is computing the mean difference between Female and Male subjects, and is using that to adjust the female observations down (or up) to the same level as the male subjects.
This assumes that for all the genes the only difference between males and females is that the gene expression for one sex is consistently shifted by a semi-constant amount, regardless of whatever hihi, hilo, etc are measuring.
It is always possible that a given gene reacts differently to changes in SEA_grph in females than in males, in which case you need to fit an interaction term between sex and whatever the SEA_grph thing is measuring, in order to capture any changes in gene expression between the SEA_grph levels that are sex-dependent.