DESeq2 design model-mothers & offspring timepoint comparisons
1
0
Entering edit mode
newmanss • 0
@newmanss-7190
Last seen 6.9 years ago
United States

Hello,

I cannot seem to structure my model.matrix properly.  I would be very grateful for your assistance.  I have tried to follow the protocol outlined in these Bioconductor posts, DESeq2, error: inv(): matrix appears to be singular and DESeq2, error: inv(): matrix appears to be singular

This a metagenome dataset consisting of mouse dams sampled at 3 timepoints, pre-pregnancy TP.Am), during pregnancy TP.Bm), and after pregnancyTP.Cm) and the offspring of each dam sampled once, TP.Co).  All of the  dams received the same treatment .  We want to know if there are differences between the offspring and the treated dams.  I want to compare dam-timepointA to offspring-timepointC,  dam-timepointB to  offspring-timepointC , and dam-timepoint C to offspring-timepointC .  The design table below uses “t” and “f” to indicate the correct time point for each sample.  Kinship indicates the family group for dams and their offspring.  The offspring include both males and females, Gender.  I would also like to compare the male and female offspring separately to the dams at each timepoint to explore sex differences.  Additionally, there is a second dataset for untreated dams and their offspring structured in the same way.  I’ve divided the sets avoid yet another variable to consider which would greatly complicate the comparisons.

I need comparisons:       TP.Am vs TP.Co  controlling for Kinship and Gender

                                        TP.Bm vs TP.Co  controlling for Kinship and Gender

                                         TP.Cm vs TP.Co  controlling for Kinship and Gender

TP.Am vs TP.Co males controlling for Kinship

                                                TP.Bm vs TP.Co  males controlling for Kinship

                                                TP.Cm vs TP.Co  males controlling for Kinship

TP.Am vs TP.Co females controlling for Kinship

                                                TP.Bm vs TP.Co  females controlling for Kinship

                                                TP.Cm vs TP.Co  females controlling for Kinship

 

SampleName Gender Kinship TP.Am TP.Bm TP.Cm TP.Co SexKin
M_41_CD_A F 41 t f f f 1
M_41_CD_B F 41 f t f f 1
M_41_CD_C F 41 f f t f 1
O_41_02_CD_F F 41 f f f t 1
O_41_06_CD_F F 41 f f f t 1
O_41_07_CD_F F 41 f f f t 1
O_41_01_CD_M M 41 f f f t 2
O_41_03_CD_M M 41 f f f t 2
O_41_04_CD_M M 41 f f f t 2
O_41_05_CD_M M 41 f f f t 2
O_41_08_CD_M M 41 f f f t 2
O_41_09_CD_M M 41 f f f t 2
M_42_CD_A F 42 t f f f 3
M_42_CD_B F 42 f t f f 3
M_42_CD_C F 42 f f t f 3
O_42_02_CD_F F 42 f f f t 3
O_42_04_CD_F F 42 f f f t 3
O_42_05_CD_F F 42 f f f t 3
O_42_06_CD_F F 42 f f f t 3
O_42_01_CD_M M 42 f f f t 4
O_42_03_CD_M M 42 f f f t 4
O_42_08_CD_M M 42 f f f t 4
O_42_09_CD_M M 42 f f f t 4

I have tried the model.matrix formula ~ Kinship + Gender + TP.Am + TP.Bm + TP.Cm + TP.Co.  I intended to  specify the needed comparisons in results(dds).

I have the error “model matrix is not full rank”.  I’ve not been able to detect the linear terms in my design.   I have tried combining the kinship & gender terms, no luck.  Also, sample M_53, a dam, has no offspring.  I removed this sample from the dataset.

I have also tried ~Kinship + Gender + TP.Am:TP.Co + TP.Bm:TP.Co + TP.Cm:TP.Co which results in a new error.

Error in validObject(object) :

  invalid class “DESeqDataSet” object: all variables in design formula must be columns in colData

The model looks like this (abbreviated sample):

  (Intercept) Kinship42   Kinship51 Kinship55       GenderM            TP.Amt TP.Cot   TP.Bmt TP.Cmt

1            1         0         0         0       0      1      0      0      0

2            1         0         0         0       0      0      0      1      0

3            1         0         0         0       0      0      0      0      1

4            1         0         0         0       0      0      1      0      0

5            1         0         0         0       0      0      1      0      0

6  

So I now have new automatically generated column names that don’t match my columns in colData?

And another trial:

> dds<- DESeqDataSetFromMatrix(countData=InputData, colData=ExpDesign,

+                                     design = ~Kinship)  #Design is a placeholder for correct model to follow below

> dds

class: DESeqDataSet

dim: 286 42

exptData(0):

assays(1): counts

rownames(286): Parabacteroides goldsteinii Akkermansia muciniphila ... Pelotomaculum isophthalicicum

  Blautia obeum

rowRanges metadata column names(0):

colnames(42): 1 2 ... 41 42

colData names(8): SampleName Kinship ... TP.Co SexKin

> mm1 <- model.matrix(~Kinship + Gender + TP.Am:TP:Co + TP.Bm:TP.Co

+                     + TP.Cm:TP.Co)

Error in eval(expr, envir, enclos) : object 'Kinship' not found

 

I have also tried :

mm1 <- model.matrix(~Kinship + Gender + TP.Am:TP:Co + TP.Bm:TP.Co

                    + TP.Cm:TP.Co)#Correct design matrix

idx <- which(colSums(mm1==0) == nrow(mm1))

mm1 <- mm1[,-idx]

design(dds)<- ~ mm1

Error in validObject(object) :

  invalid class “DESeqDataSet” object: all variables in design formula must be columns in colData

  

> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] DESeq2_1.8.1              RcppArmadillo_0.5.400.2.0 Rcpp_0.12.0              
 [4] gplots_2.17.0             lattice_0.20-31           gtools_3.5.0             
 [7] GenomicRanges_1.20.4      GenomeInfoDb_1.4.0        IRanges_2.2.2            
[10] S4Vectors_0.6.0           BiocGenerics_0.14.0      

loaded via a namespace (and not attached):
 [1] genefilter_1.50.0    locfit_1.5-9.1       reshape2_1.4.1       splines_3.2.0       
 [5] colorspace_1.2-6     XML_3.98-1.3         survival_2.38-1      DBI_0.3.1           
 [9] foreign_0.8-63       BiocParallel_1.2.2   RColorBrewer_1.1-2   lambda.r_1.1.7      
[13] plyr_1.8.3           stringr_1.0.0        munsell_0.4.2        gtable_0.1.2        
[17] futile.logger_1.4.1  caTools_1.17.1       latticeExtra_0.6-26  Biobase_2.28.0      
[21] geneplotter_1.46.0   AnnotationDbi_1.30.1 proto_0.3-10         acepack_1.3-3.3     
[25] KernSmooth_2.23-14   xtable_1.7-4         scales_0.3.0         gdata_2.17.0        
[29] Hmisc_3.16-0         annotate_1.46.0      XVector_0.8.0        gridExtra_2.0.0     
[33] BiocStyle_1.6.0      ggplot2_1.0.1        digest_0.6.8         stringi_0.5-5       
[37] grid_3.2.0           tools_3.2.0          bitops_1.0-6         magrittr_1.5        
[41] RSQLite_1.0.0        Formula_1.2-1        cluster_2.0.1        futile.options_1.0.0
[45] MASS_7.3-40          rpart_4.1-9          nnet_7.3-9          

               

Please suggest a model.matrix for this complicated design.

Susan                   

model.matrix multiple factor design design and contrast matrix deseq2 • 1.9k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 3 hours ago
United States

The proper way to analyze will be much clearer if we clean up the coding of sample information. 

First, in order to compare the four groups (pre, during, after, offspring), I would code a single variable, 'group', as a new column in the colData with these levels. Then, if we were ignoring kinship and gender, you would just do:

design(dds) <- ~ group
dds <- DESeq(dds)
results(dds, contrast=c("group", "pre", "offspring"))

for example.

Can you post the sample information with just three columns: group, kinship and gender?

It might be useful to also sort the sample information by group for visualizing the groups:

df <- as.data.frame(colData(dds))
df[ order(df$group), c("group","kinship","gender") ]
ADD COMMENT
0
Entering edit mode

It appears that the source of the design problem is the presence of males only in the offspring group?  Here is the restructured design as requested.

This works:

dds <- DESeq(DESeqTable, betaPrior = FALSE)

resultsNames (dds)

[1] "Intercept"               "Kinship_42_vs_41"        "Kinship_51_vs_41"       
[4] "Kinship_55_vs_41"        "Group_post_vs_offspring" "Group_pre_vs_offspring" 
[7] "Group_preg_vs_offspring"

Is the best option to analyze male and female offspring separately to be able to compare dams to male offspring and dams to female offspring?

 

SampleName Group Kinship Gender
M_41_CD_A pre 41 F
M_42_CD_A pre 42 F
M_51_CD_A pre 51 F
M_55_CD_A pre 55 F
M_41_CD_B preg 41 F
M_42_CD_B preg 42 F
M_51_CD_B preg 51 F
M_55_CD_B preg 55 F
M_41_CD_C post 41 F
M_42_CD_C post 42 F
M_51_CD_C post 51 F
M_55_CD_C post 55 F
O_41_01_CD_M offspring 41 M
O_41_02_CD_F offspring 41 F
O_41_03_CD_M offspring 41 M
O_41_04_CD_M offspring 41 M
O_41_05_CD_M offspring 41 M
O_41_06_CD_F offspring 41 F
O_41_07_CD_F offspring 41 F
O_41_08_CD_M offspring 41 M
O_41_09_CD_M offspring 41 M
O_42_01_CD_M offspring 42 M
O_42_02_CD_F offspring 42 F
O_42_03_CD_M offspring 42 M
O_42_04_CD_F offspring 42 F
O_42_05_CD_F offspring 42 F
O_42_06_CD_F offspring 42 F
O_42_08_CD_M offspring 42 M
O_42_09_CD_M offspring 42 M
O_51_01_CD_F offspring 51 F
O_51_10_CD_F offspring 51 F
O_51_11_CD_F offspring 51 F
O_51_12_CD_M offspring 51 M
O_51_03_CD_M offspring 51 M
O_51_05_CD_M offspring 51 M
O_51_08_CD_F offspring 51 F
O_51_09_CD_M offspring 51 M
O_55_01_CD_F offspring 55 F
O_55_02_CD_F offspring 55 F
O_55_03_CD_M offspring 55 M
O_55_04_CD_F offspring 55 F
O_55_07_CD_F offspring 55 F

 

     
ADD REPLY
0
Entering edit mode

You can do ~Gender + Kinship + group, which controls for gender differences or kinship differences overall, and you can contrast the four groups using the contrast argument of results().

ADD REPLY
0
Entering edit mode

So this design runs without error.

[1] "Intercept"               "Gender_M_vs_F"           "Kinship_49_vs_47"       
[4] "Kinship_56_vs_47"        "Kinship_57_vs_47"        "Kinship_60_vs_47"       
[7] "Group_post_vs_offspring" "Group_pre_vs_offspring"  "Group_preg_vs_offspring"

What is the code to contrast the groups to see the differences between pre vs male offspring, preg vs male offspring, and post vs male offspring.  Likewise for females.  Don't I need an interaction term for this?  That's where I ran into the "not full rank" error.  May I assume that I'm controlling for Kinship and Gender when I use, for example, "Group post vs offspring"?  I would love to have a short course or tutorial explaining how models work.  Can you recommend one?

ADD REPLY
0
Entering edit mode

To compare female pre and male offspring you would add the pre vs offspring and the female vs male effects, but I don't really see the point of these comparisons. You would just be adding the female vs male effect you observed in offspring (e.g. chrY genes are expressed only in males) to the group effects. 

I'd recommend discussing with a local statistician, who can help explain what kind of comparisons are possible with this experimental design. Note that there isn't anything DESeq2 specific here, these are the same terms you would have in a simple linear model.

ADD REPLY
0
Entering edit mode

Many thanks for your patient replies.

ADD REPLY

Login before adding your answer.

Traffic: 908 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6