To Whom It May Concern,
I have been analyzing two large RNAseq projects separately and successfully. Recently, it was suggested that I merge the analyses so that normalized gene counts could be compared (not statistically) between the two experiments while also retaining the ability to perform statistical tests within the experiments. The two experimental designs differ greatly and I am having some trouble designing a model for the data that works without returning:
Error in checkFullRank(modelMatrix)
My goal is to generate one dataset normalized together and then run statistical analysis within the experiments: the effect of conditions at each stage for both groups. The code I used for the separate experimental (group) analyses worked, but the design falls apart when I attempt to merge them together. This is obvious because the experimental designs are different: Two way Vs Multifactorial. I am just confused about how to code this appropriately.
Here is the sample table (sorry for the jargon); simply the experiments are "group" Temp and Anoxia, on different stages of development with varying conditions. Following is the attempt I most recently made.
> sampleTable samples condition stage group 1 A1 esc Epiboly Temp 2 A2 esc Epiboly Temp 3 A3 esc Epiboly Temp 4 B1 esc NeuralKeel Temp 5 B2 esc NeuralKeel Temp 6 B3 esc NeuralKeel Temp 7 C1 esc 6somites Temp 8 C2 esc 6somites Temp 9 C3 esc 6somites Temp 10 D1 esc 10somites Temp 11 D2 esc 10somites Temp 12 D3 esc 10somites Temp 13 E1 esc 16somites Temp 14 E2 esc 16somites Temp 15 E3 esc 16somites Temp 16 F1 esc 20somites Temp 17 F2 esc 20somites Temp 18 F3 esc 20somites Temp 19 G1 esc 24somites Temp 20 G2 esc 24somites Temp 21 G3 esc 24somites Temp 22 H1 diap Epiboly Temp 23 H2 diap Epiboly Temp 24 H3 diap Epiboly Temp 25 I1 diap NeuralKeel Temp 26 I2 diap NeuralKeel Temp 27 I3 diap NeuralKeel Temp 28 J1 diap 6somites Temp 29 J2 diap 6somites Temp 30 J3 diap 6somites Temp 31 K1 diap 10somites Temp 32 K2 diap 10somites Temp 33 K3 diap 10somites Temp 34 L1 diap 16somites Temp 35 L2 diap 16somites Temp 36 L3 diap 16somites Temp 37 M1 diap 20somites Temp 38 M2 diap 20somites Temp 39 M3 diap 20somites Temp 40 N1 diap 24somites Temp 41 N2 diap 24somites Temp 42 N3 diap 24somites Temp 43 Wa1 t0 dpd_0 Anoxia 44 Wa2 t0 dpd_0 Anoxia 45 Wa3 t0 dpd_0 Anoxia 46 Wa4 t0 dpd_0 Anoxia 47 Wb1 Early_Anoxia dpd_0 Anoxia 48 Wb2 Early_Anoxia dpd_0 Anoxia 49 Wb3 Early_Anoxia dpd_0 Anoxia 50 Wb4 Early_Anoxia dpd_0 Anoxia 51 Wc1 Late_Anoxia dpd_0 Anoxia 52 Wc2 Late_Anoxia dpd_0 Anoxia 53 Wc3 Late_Anoxia dpd_0 Anoxia 54 Wc4 Late_Anoxia dpd_0 Anoxia 55 Wd1 Early_Rec dpd_0 Anoxia 56 Wd2 Early_Rec dpd_0 Anoxia 57 Wd3 Early_Rec dpd_0 Anoxia 58 Wd4 Early_Rec dpd_0 Anoxia 59 We1 Late_Rec dpd_0 Anoxia 60 We2 Late_Rec dpd_0 Anoxia 61 We3 Late_Rec dpd_0 Anoxia 62 We4 Late_Rec dpd_0 Anoxia 63 Xa1 t0 dpd_4 Anoxia 64 Xa2 t0 dpd_4 Anoxia 65 Xa3 t0 dpd_4 Anoxia 66 Xa4 t0 dpd_4 Anoxia 67 Xb1 Early_Anoxia dpd_4 Anoxia 68 Xb2 Early_Anoxia dpd_4 Anoxia 69 Xb3 Early_Anoxia dpd_4 Anoxia 70 Xb4 Early_Anoxia dpd_4 Anoxia 71 Xc1 Late_Anoxia dpd_4 Anoxia 72 Xc2 Late_Anoxia dpd_4 Anoxia 73 Xc3 Late_Anoxia dpd_4 Anoxia 74 Xc4 Late_Anoxia dpd_4 Anoxia 75 Xd1 Early_Rec dpd_4 Anoxia 76 Xd2 Early_Rec dpd_4 Anoxia 77 Xd3 Early_Rec dpd_4 Anoxia 78 Xd4 Early_Rec dpd_4 Anoxia 79 Xe1 Late_Rec dpd_4 Anoxia 80 Xe2 Late_Rec dpd_4 Anoxia 81 Xe3 Late_Rec dpd_4 Anoxia 82 Xe4 Late_Rec dpd_4 Anoxia 83 Ya1 t0 dpd_12 Anoxia 84 Ya2 t0 dpd_12 Anoxia 85 Ya3 t0 dpd_12 Anoxia 86 Ya4 t0 dpd_12 Anoxia 87 Ya5 t0 dpd_12 Anoxia 88 Ya6 t0 dpd_12 Anoxia 89 Yb1 Early_Anoxia dpd_12 Anoxia 90 Yb2 Early_Anoxia dpd_12 Anoxia 91 Yb3 Early_Anoxia dpd_12 Anoxia 92 Yb4 Early_Anoxia dpd_12 Anoxia 93 Yb5 Early_Anoxia dpd_12 Anoxia 94 Yb6 Early_Anoxia dpd_12 Anoxia 95 Yc1 Late_Anoxia dpd_12 Anoxia 96 Yc2 Late_Anoxia dpd_12 Anoxia 97 Yc3 Late_Anoxia dpd_12 Anoxia 98 Yc4 Late_Anoxia dpd_12 Anoxia 99 Yc5 Late_Anoxia dpd_12 Anoxia 100 Yc6 Late_Anoxia dpd_12 Anoxia 101 Yd1 Early_Rec dpd_12 Anoxia 102 Yd2 Early_Rec dpd_12 Anoxia 103 Yd3 Early_Rec dpd_12 Anoxia 104 Yd4 Early_Rec dpd_12 Anoxia 105 Yd5 Early_Rec dpd_12 Anoxia 106 Yd6 Early_Rec dpd_12 Anoxia 107 Ye1 Late_Rec dpd_12 Anoxia 108 Ye2 Late_Rec dpd_12 Anoxia 109 Ye3 Late_Rec dpd_12 Anoxia 110 Ye4 Late_Rec dpd_12 Anoxia 111 Ye5 Late_Rec dpd_12 Anoxia 112 Ye6 Late_Rec dpd_12 Anoxia 113 Za1 t0 dpd_20 Anoxia 114 Za2 t0 dpd_20 Anoxia 115 Za3 t0 dpd_20 Anoxia 116 Za4 t0 dpd_20 Anoxia 117 Zb1 Early_Anoxia dpd_20 Anoxia 118 Zb2 Early_Anoxia dpd_20 Anoxia 119 Zb3 Early_Anoxia dpd_20 Anoxia 120 Zb4 Early_Anoxia dpd_20 Anoxia 121 Zc1 Late_Anoxia dpd_20 Anoxia 122 Zc2 Late_Anoxia dpd_20 Anoxia 123 Zc3 Late_Anoxia dpd_20 Anoxia 124 Zc4 Late_Anoxia dpd_20 Anoxia 125 Zd1 Early_Rec dpd_20 Anoxia 126 Zd2 Early_Rec dpd_20 Anoxia 127 Zd3 Early_Rec dpd_20 Anoxia 128 Zd4 Early_Rec dpd_20 Anoxia 129 Ze1 Late_Rec dpd_20 Anoxia 130 Ze2 Late_Rec dpd_20 Anoxia 131 Ze3 Late_Rec dpd_20 Anoxia 132 Ze4 Late_Rec dpd_20 Anoxia > dds <- DESeqDataSet(se, design = ~ group)
(this I had to troubleshoot extensively but is likely the contributor to my problems)
> dds class: DESeqDataSet dim: 26751 132 exptData(0): assays(1): counts rownames(26751): 12S 16S ... zzef1 zzz3 rowRanges metadata column names(0): colnames(132): A1 A2 ... Ze3 Ze4 colData names(4): samples condition stage group > dds <- estimateSizeFactors(dds) > mnc <- rowMeans(counts(dds, normalized=TRUE)) > dds1 <- dds[ mnc > 1, ] > dds1 class: DESeqDataSet dim: 25945 132 exptData(0): assays(1): counts rownames(25945): 12S 16S ... zzef1 zzz3 rowRanges metadata column names(0): colnames(132): A1 A2 ... Ze3 Ze4 colData names(5): samples condition stage group sizeFactor > dds1.Epiboly <- dds1[dds1$group == "Temp", dds1$stage == "Epiboly", design= ~condition ] > dds1.Epiboly <- DESeq(dds1.Epiboly) using pre-existing size factors estimating dispersions gene-wise dispersion estimates Error in checkFullRank(modelMatrix) : the model matrix is not full rank, so the model cannot be fit as specified. Levels or combinations of levels without any samples have resulted in column(s) of zeros in the model matrix. See the section 'Model matrix not full rank' in vignette('DESeq2')
Thank you for any help with this!
> sessionInfo() R version 3.2.1 (2015-06-18) Platform: x86_64-unknown-linux-gnu (64-bit) Running under: CentOS release 6.8 (Final) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] DESeq2_1.8.2 RcppArmadillo_0.6.500.4.0 [3] Rcpp_0.12.3 GenomicFeatures_1.20.6 [5] AnnotationDbi_1.30.1 Biobase_2.28.0 [7] BiocParallel_1.2.22 GenomicAlignments_1.4.2 [9] Rsamtools_1.20.5 Biostrings_2.36.4 [11] XVector_0.8.0 GenomicRanges_1.20.8 [13] GenomeInfoDb_1.4.3 IRanges_2.2.9 [15] S4Vectors_0.6.6 BiocGenerics_0.14.0 loaded via a namespace (and not attached): [1] RColorBrewer_1.1-2 futile.logger_1.4.1 plyr_1.8.3 [4] bitops_1.0-6 futile.options_1.0.0 tools_3.2.1 [7] zlibbioc_1.14.0 rpart_4.1-10 biomaRt_2.24.1 [10] annotate_1.46.1 RSQLite_1.0.0 gtable_0.1.2 [13] lattice_0.20-33 DBI_0.3.1 gridExtra_2.0.0 [16] genefilter_1.50.0 cluster_2.0.3 rtracklayer_1.28.10 [19] locfit_1.5-9.1 nnet_7.3-12 grid_3.2.1 [22] XML_3.98-1.3 survival_2.38-3 foreign_0.8-66 [25] latticeExtra_0.6-26 Formula_1.2-1 geneplotter_1.46.0 [28] ggplot2_2.0.0 lambda.r_1.1.7 scales_0.3.0 [31] Hmisc_3.17-1 splines_3.2.1 xtable_1.8-0 [34] colorspace_1.2-6 acepack_1.3-3.3 RCurl_1.95-4.7 [37] munsell_0.4.2
Hi Michael,
Thank you! This is a much better approach.
-Amie