Hello,
I cannot seem to structure my model.matrix properly. I would be very grateful for your assistance. I have tried to follow the protocol outlined in these Bioconductor posts, DESeq2, error: inv(): matrix appears to be singular and DESeq2, error: inv(): matrix appears to be singular .
This a metagenome dataset consisting of mouse dams sampled at 3 timepoints, pre-pregnancy TP.Am), during pregnancy TP.Bm), and after pregnancyTP.Cm) and the offspring of each dam sampled once, TP.Co). All of the dams received the same treatment . We want to know if there are differences between the offspring and the treated dams. I want to compare dam-timepointA to offspring-timepointC, dam-timepointB to offspring-timepointC , and dam-timepoint C to offspring-timepointC . The design table below uses “t” and “f” to indicate the correct time point for each sample. Kinship indicates the family group for dams and their offspring. The offspring include both males and females, Gender. I would also like to compare the male and female offspring separately to the dams at each timepoint to explore sex differences. Additionally, there is a second dataset for untreated dams and their offspring structured in the same way. I’ve divided the sets avoid yet another variable to consider which would greatly complicate the comparisons.
I need comparisons: TP.Am vs TP.Co controlling for Kinship and Gender
TP.Bm vs TP.Co controlling for Kinship and Gender
TP.Cm vs TP.Co controlling for Kinship and Gender
TP.Am vs TP.Co males controlling for Kinship
TP.Bm vs TP.Co males controlling for Kinship
TP.Cm vs TP.Co males controlling for Kinship
TP.Am vs TP.Co females controlling for Kinship
TP.Bm vs TP.Co females controlling for Kinship
TP.Cm vs TP.Co females controlling for Kinship
SampleName | Gender | Kinship | TP.Am | TP.Bm | TP.Cm | TP.Co | SexKin |
M_41_CD_A | F | 41 | t | f | f | f | 1 |
M_41_CD_B | F | 41 | f | t | f | f | 1 |
M_41_CD_C | F | 41 | f | f | t | f | 1 |
O_41_02_CD_F | F | 41 | f | f | f | t | 1 |
O_41_06_CD_F | F | 41 | f | f | f | t | 1 |
O_41_07_CD_F | F | 41 | f | f | f | t | 1 |
O_41_01_CD_M | M | 41 | f | f | f | t | 2 |
O_41_03_CD_M | M | 41 | f | f | f | t | 2 |
O_41_04_CD_M | M | 41 | f | f | f | t | 2 |
O_41_05_CD_M | M | 41 | f | f | f | t | 2 |
O_41_08_CD_M | M | 41 | f | f | f | t | 2 |
O_41_09_CD_M | M | 41 | f | f | f | t | 2 |
M_42_CD_A | F | 42 | t | f | f | f | 3 |
M_42_CD_B | F | 42 | f | t | f | f | 3 |
M_42_CD_C | F | 42 | f | f | t | f | 3 |
O_42_02_CD_F | F | 42 | f | f | f | t | 3 |
O_42_04_CD_F | F | 42 | f | f | f | t | 3 |
O_42_05_CD_F | F | 42 | f | f | f | t | 3 |
O_42_06_CD_F | F | 42 | f | f | f | t | 3 |
O_42_01_CD_M | M | 42 | f | f | f | t | 4 |
O_42_03_CD_M | M | 42 | f | f | f | t | 4 |
O_42_08_CD_M | M | 42 | f | f | f | t | 4 |
O_42_09_CD_M | M | 42 | f | f | f | t | 4 |
I have tried the model.matrix formula ~ Kinship + Gender + TP.Am + TP.Bm + TP.Cm + TP.Co. I intended to specify the needed comparisons in results(dds).
I have the error “model matrix is not full rank”. I’ve not been able to detect the linear terms in my design. I have tried combining the kinship & gender terms, no luck. Also, sample M_53, a dam, has no offspring. I removed this sample from the dataset.
I have also tried ~Kinship + Gender + TP.Am:TP.Co + TP.Bm:TP.Co + TP.Cm:TP.Co which results in a new error.
Error in validObject(object) :
invalid class “DESeqDataSet” object: all variables in design formula must be columns in colData
The model looks like this (abbreviated sample):
(Intercept) Kinship42 Kinship51 Kinship55 GenderM TP.Amt TP.Cot TP.Bmt TP.Cmt
1 1 0 0 0 0 1 0 0 0
2 1 0 0 0 0 0 0 1 0
3 1 0 0 0 0 0 0 0 1
4 1 0 0 0 0 0 1 0 0
5 1 0 0 0 0 0 1 0 0
6
So I now have new automatically generated column names that don’t match my columns in colData?
And another trial:
> dds<- DESeqDataSetFromMatrix(countData=InputData, colData=ExpDesign,
+ design = ~Kinship) #Design is a placeholder for correct model to follow below
> dds
class: DESeqDataSet
dim: 286 42
exptData(0):
assays(1): counts
rownames(286): Parabacteroides goldsteinii Akkermansia muciniphila ... Pelotomaculum isophthalicicum
Blautia obeum
rowRanges metadata column names(0):
colnames(42): 1 2 ... 41 42
colData names(8): SampleName Kinship ... TP.Co SexKin
> mm1 <- model.matrix(~Kinship + Gender + TP.Am:TP:Co + TP.Bm:TP.Co
+ + TP.Cm:TP.Co)
Error in eval(expr, envir, enclos) : object 'Kinship' not found
I have also tried :
mm1 <- model.matrix(~Kinship + Gender + TP.Am:TP:Co + TP.Bm:TP.Co
+ TP.Cm:TP.Co)#Correct design matrix
idx <- which(colSums(mm1==0) == nrow(mm1))
mm1 <- mm1[,-idx]
design(dds)<- ~ mm1
Error in validObject(object) :
invalid class “DESeqDataSet” object: all variables in design formula must be columns in colData
> sessionInfo() R version 3.2.0 (2015-04-16) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods [9] base other attached packages: [1] DESeq2_1.8.1 RcppArmadillo_0.5.400.2.0 Rcpp_0.12.0 [4] gplots_2.17.0 lattice_0.20-31 gtools_3.5.0 [7] GenomicRanges_1.20.4 GenomeInfoDb_1.4.0 IRanges_2.2.2 [10] S4Vectors_0.6.0 BiocGenerics_0.14.0 loaded via a namespace (and not attached): [1] genefilter_1.50.0 locfit_1.5-9.1 reshape2_1.4.1 splines_3.2.0 [5] colorspace_1.2-6 XML_3.98-1.3 survival_2.38-1 DBI_0.3.1 [9] foreign_0.8-63 BiocParallel_1.2.2 RColorBrewer_1.1-2 lambda.r_1.1.7 [13] plyr_1.8.3 stringr_1.0.0 munsell_0.4.2 gtable_0.1.2 [17] futile.logger_1.4.1 caTools_1.17.1 latticeExtra_0.6-26 Biobase_2.28.0 [21] geneplotter_1.46.0 AnnotationDbi_1.30.1 proto_0.3-10 acepack_1.3-3.3 [25] KernSmooth_2.23-14 xtable_1.7-4 scales_0.3.0 gdata_2.17.0 [29] Hmisc_3.16-0 annotate_1.46.0 XVector_0.8.0 gridExtra_2.0.0 [33] BiocStyle_1.6.0 ggplot2_1.0.1 digest_0.6.8 stringi_0.5-5 [37] grid_3.2.0 tools_3.2.0 bitops_1.0-6 magrittr_1.5 [41] RSQLite_1.0.0 Formula_1.2-1 cluster_2.0.1 futile.options_1.0.0 [45] MASS_7.3-40 rpart_4.1-9 nnet_7.3-9
Please suggest a model.matrix for this complicated design.
Susan
It appears that the source of the design problem is the presence of males only in the offspring group? Here is the restructured design as requested.
This works:
resultsNames (dds)
Is the best option to analyze male and female offspring separately to be able to compare dams to male offspring and dams to female offspring?
You can do
~Gender + Kinship + group
, which controls for gender differences or kinship differences overall, and you can contrast the four groups using the contrast argument of results().So this design runs without error.
What is the code to contrast the groups to see the differences between pre vs male offspring, preg vs male offspring, and post vs male offspring. Likewise for females. Don't I need an interaction term for this? That's where I ran into the "not full rank" error. May I assume that I'm controlling for Kinship and Gender when I use, for example, "Group post vs offspring"? I would love to have a short course or tutorial explaining how models work. Can you recommend one?
To compare female pre and male offspring you would add the pre vs offspring and the female vs male effects, but I don't really see the point of these comparisons. You would just be adding the female vs male effect you observed in offspring (e.g. chrY genes are expressed only in males) to the group effects.
I'd recommend discussing with a local statistician, who can help explain what kind of comparisons are possible with this experimental design. Note that there isn't anything DESeq2 specific here, these are the same terms you would have in a simple linear model.
Many thanks for your patient replies.