I am having a lot of trouble designing a formula to look at differential gene expression in my study. This is largely based around errors in my DEseq2 code leading to “Error in checkFullRank(modelMatrix)” warnings. I have looked at the manual and multiple comments/forum threads, but cannot work it out.
Essentially I have three animal morphs, two of which are solid colours and one that has stripes. I want to look at the gene expression difference between the three colours, so we have sequenced (RNA-Seq) tissue samples of all three (Red, Purple and Blue), but due to the fact that the colours are on different parts of the Red-Blue body we have sequenced two body landmarks from each; a truncated example can be seen below:
Condition Individual Tissue landmark
Red 1 A
Blue 1 B
Red 2 A
Red 2 B
Purple 3 A
Purple 3 B
I have been able to look at pairwise comparisons of colour specific gene expression (taking into account the landmark) following:
d <-data.frame(s1=c(4:8),s2=c(5:9),s3=c(4:8),s4=c(5:9),s5=c(4:8),s6=c(5:9),s7=c(1:5),s8=c(1:5),s9=c(4:8),s10=c(5:9),s11=c(4:8),s12=c(5:9),s13=c(4:8),s14=c(5:9),s15=c(4:8),s16=c(5:9),s17=c(4:8),s18=c(5:9)) #Not real data
row.names(d) <- c("clust1","clust2","clust3","clust4","clust5")
d
countData <- as.matrix(d)
colData <- data.frame(row.names = c("s1","s2","s3","s4","s5","s6","s7","s8","s9","s10","s11","s12","s13","s14","s15","s16","s17","s18"),
condition = as.factor(c("Red","Blue","Red","Blue","Red","Red","Purple","Purple","Purple","Purple","Red","Red","Red","Red","Purple","Purple","Red","Blue")),
indiv = as.factor(c("RB_1","RB_1","RB_2","RB_2","R_1","R_1","P_1","P_1","P_2","P_2","R_2","R_2","R_3","R_3","P_3","P_3","RB_3","RB_3")),
landmark = as.factor(c("L1","L2","L1","L2","L1","L2","L1","L2","L1","L2","L1","L2","L1","L2","L1","L2","L1","L2")))
colData
dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~ landmark + condition)
However, really I need to include the individual in a paired analysis (which I do not think the above fully accounts for), but here is where I encounter problems.
I tried:
dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~ landmark + indiv + condition)
However, this returns the error:
---
Error in checkFullRank(modelMatrix) :
the model matrix is not full rank, so the model cannot be fit as specified.
One or more variables or interaction terms in the design formula are linear
combinations of the others and must be removed.
Please read the vignette section 'Model matrix not full rank':
---
I have checked the manual, and tried many iterations with various variables and “nested” groups, but I just seem to yo-yo between Linear combinations and columns of 0’s.
Is what I am wanting possible with the dataset? And if so how would I account for individual variation between my samples?
Hope that is clear. Much obliged for any help.
James