Thank you very much for the response. Please find my codes below,
# Twelve different mouse was used to find DEG between four somite stages
mouse.lr = c("one","two","three","four","five","six","seven","eight","nine","ten","eleven","twelve")
stage.lr = c(rep("3L",3), rep("4L",3), rep("5L",3), rep("6L",3))
# HTSeq count files
S3L1.fl = "../HTseq_files/v7_3L1.dat.gene"; S3L2.fl = "../HTseq_files/v7_3L2.dat.gene"; S3L3.fl = "../HTseq_files/v7_3L3.dat.gene"
S4L1.fl = "../HTseq_files/v7_4L1.dat.gene"; S4L2.fl = "../HTseq_files/v7_4L2.dat.gene"; S4L3.fl = "../HTseq_files/v7_4L3.dat.gene"
S5L2.fl = "../HTseq_files/v7_5L2.dat.gene"; S5L3.fl = "../HTseq_files/v7_5L3.dat.gene"; S5L4.fl = "../HTseq_files/v7_5L4.dat.gene"
S6L2.fl = "../HTseq_files/v7_6L2.dat.gene"; S6L3.fl = "../HTseq_files/v7_6L3.dat.gene"; S6L4.fl = "../HTseq_files/v7_6L4.dat.gene"
S3R1.fl = "../HTseq_files/v7_3R1.dat.gene"; S3R2.fl = "../HTseq_files/v7_3R2.dat.gene"; S3R3.fl = "../HTseq_files/v7_3R3.dat.gene"
S4R1.fl = "../HTseq_files/v7_4R1.dat.gene"; S4R2.fl = "../HTseq_files/v7_4R2.dat.gene"; S4R3.fl = "../HTseq_files/v7_4R3.dat.gene"
S5R2.fl = "../HTseq_files/v7_5R2.dat.gene"; S5R3.fl = "../HTseq_files/v7_5R3.dat.gene"; S5R4.fl = "../HTseq_files/v7_5R4.dat.gene"
S6R2.fl = "../HTseq_files/v7_6R2.dat.gene"; S6R3.fl = "../HTseq_files/v7_6R3.dat.gene"; S6R4.fl = "../HTseq_files/v7_6R4.dat.gene"
# Data frame
RNASeqDesign2 = data.frame(
sample.names= c("3L1", "3L2", "3L3", "4L1", "4L2", "4L3", "5L2", "5L3", "5L4", "6L2", "6L3", "6L4",
"3R1", "3R2", "3R3", "4R1", "4R2", "4R3", "5R2", "5R3", "5R4", "6R2", "6R3", "6R4"),
count.files = c(S3L1.fl, S3L2.fl, S3L3.fl, S4L1.fl, S4L2.fl, S4L3.fl, S5L2.fl, S5L3.fl, S5L4.fl, S6L2.fl, S6L3.fl, S6L4.fl,
S3R1.fl, S3R2.fl, S3R3.fl, S4R1.fl, S4R2.fl, S4R3.fl, S5R2.fl, S5R3.fl, S5R4.fl, S6R2.fl, S6R3.fl, S6R4.fl),
condition = c(rep("3left",3), rep("4left",3), rep("5left",3), rep("6left",3),
rep("3right",3), rep("4right",3), rep("5right",3), rep("6right",3)),
mouse = c(rep(mouse.lr,2)),
side = c(rep("left",12),rep("right",12)),
stage = c(rep(stage.lr,2)))
# our design is complex (pasted below) which gives Error in checkFullRank(modelMatrix) :
# ddsHTSeq2<-DESeqDataSetFromHTSeqCount(RNASeqDesign2, directory="/data/home/praveen/Pn_data/LR_RNA-seq/DEG_deseq2/",design=~ condition + condition:stage +condition:side+condition:mouse)
# the model matrix is not full rank, so the model cannot be fit as specified.
# Levels or combinations of levels without any samples have resulted in
# column(s) of zeros in the model matrix.
# hence I use simple condition so as to make DESeqDataSet and then manually create and edit model.matrix
ddsHTSeq2<-DESeqDataSetFromHTSeqCount(RNASeqDesign2, directory="/data/home/praveen/Pn_data/LR_RNA-seq/DEG_deseq2/",design=~ condition)
colData(ddsHTSeq2)$condition<-factor(colData(ddsHTSeq2)$condition,levels=c("3left","4left","5left","6left", "3right","4right","5right","6right"))
#creating the model matrix
mm = model.matrix(~ condition + condition:mouse +condition:side,RNASeqDesign2)
mm.2 = mm[,colSums(mm^2)!=0]
# This throws the error
dds.all.2<-DESeq(ddsHTSeq2,full=mm.2,betaPrior=FALSE)
error: inv(): matrix appears to be singular
Error: inv(): matrix appears to be singular
#============================SESSION INFO===================================
R version 3.2.0 (2015-04-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: CentOS release 6.7 (Final)
locale:
[1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
[3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
[5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8
[7] LC_PAPER=en_US.utf8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] DESeq2_1.10.1 RcppArmadillo_0.6.500.4.0
[3] Rcpp_0.12.3 SummarizedExperiment_1.0.2
[5] Biobase_2.30.0 GenomicRanges_1.22.4
[7] GenomeInfoDb_1.6.3 IRanges_2.4.7
[9] S4Vectors_0.8.11 BiocGenerics_0.16.1
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-2 futile.logger_1.4.1 plyr_1.8.3
[4] XVector_0.10.0 futile.options_1.0.0 zlibbioc_1.16.0
[7] rpart_4.1-10 RSQLite_1.0.0 annotate_1.48.0
[10] gtable_0.1.2 lattice_0.20-33 DBI_0.3.1
[13] gridExtra_2.0.0 genefilter_1.52.1 cluster_2.0.3
[16] locfit_1.5-9.1 grid_3.2.0 nnet_7.3-12
[19] AnnotationDbi_1.32.3 XML_3.98-1.3 survival_2.38-3
[22] BiocParallel_1.4.3 foreign_0.8-66 latticeExtra_0.6-28
[25] Formula_1.2-1 geneplotter_1.48.0 ggplot2_2.0.0
[28] lambda.r_1.1.7 Hmisc_3.17-2 scales_0.3.0
[31] splines_3.2.0 colorspace_1.2-6 xtable_1.8-2
[34] acepack_1.3-3.3 munsell_0.4.3
Biological question is that we are trying to find DEG between left and right side of embryo in each of the 4 somite stages(3, 4, 5 &6). Yes I understand that stage and condition are similar. I can avoid the column stage. Am I right in using DESeqDataSetFromHTSeqCount with simple design and all?. I can finally do something like this to get DEG for each somite stage.
res.3 <- results(dds.all.2, contrast=c("condition","3left","3right"))
res.4 <- results(dds.all.2, contrast=c("condition","4left","4right"))
res.5 <- results(dds.all.2, contrast=c("condition","5left","5right"))
res.6 <- results(dds.all.2, contrast=c("condition","6left","6right"))
Please let me know.
EDIT
confounding factors are different mouse and that there are just two sides among stages
Yes you can use a design of ~condition and use these comparisons to compare for each somite stage.
It's not so easy to control for the mouse effects using DESeq2 in this particular experimental design, if I've figured it out from your code. An easier approach would be to use the duplicateCorrelation() function in the limma RNA-seq pipeline, to inform which samples are from the same mouse.
Thank you so much.