Hi, I use DESeq2 in R for differential expression calculation. I already know the design is terrible, but this is what I have to work with ;) Question 1: can I remove the batch effect between 2 groups if their samples do not share a batch, but some of their samples share a batch with a third group? (groups are different genotypes, batches are different experiments) Example: I want to compare group gA (samples s1, s2, s3) with group gC (samples s8, s9, s10, s11). These samples come from different batches (batch bX and bY), but no sample of group gA shares a batch with samples from group gC, so a batch effect removal would be impossible. However, there is group gB (samples s4, s5, s6, s7), which has 2 samples from batch bX and 2 from batch bY.
myColData <- data.frame(row.names=c("s1","s2","s3","s4","s5","s6","s7","s8","s9","s10","s11"),
group=c("gA","gA","gA","gB","gB","gB","gB","gC","gC","gC", "gC"),
batch=c("bX","bX","bX","bX","bX","bY","bY","bY","bY","bY", "bY"))
print(myColData)
myCounts <- matrix(round(runif(1100, 0, 20)), ncol=11, nrow=100)
My approach would be to 1) initiate a DESeq object, with known batch effect in the design 2) apply the DESeq function over it 3) calculate the results of that with only gC and gA given in the contrast
library(DESeq2)
myDE1 <- DESeqDataSetFromMatrix(myCounts, colData=myColData, design=~batch+group)
myDE2 <- DESeq(myDE1)
myDE3 <- results(myDE2, contrast=c("group", "gC", "gA"))
My 1st question: Is this a correct way to handle the batch effect? My 2nd question: Let's assume there is no batch here, just the groups. DESeq2 will give different results when I give as input (step 1) the whole matrix with all samples and then compare (in step 3) only the groups I care about (as done above, Input was gA, gB and gC but I am only interested in gC vs. gA), compared with when I give a matrix with only the groups of Interest to begin with. Is one of the ways incorrect? Shouldn't the results (myDE03 & myDE13) theoretically be the same?
#input whole matrix, compare only gC and gA in the end
myDE01 <- DESeqDataSetFromMatrix(myCounts, colData=myColData, design=~group)
myDE02 <- DESeq(myDE01)
myDE03 <- results(myDE02, contrast=c("group", "gC", "gA"))
myDE03
#input only samples from gC and gA and compare them
myDE11 <- DESeqDataSetFromMatrix(myCounts[,c(1:3,8:11)], colData=myColData[c(1:3,8:11),], design=~group)
myDE12 <- DESeq(myDE11)
myDE13 <- results(myDE12, contrast=c("group", "gC", "gA"))
myDE13
Any help is appreciated
thanks for the answer, I am just not sure I understand it correctly. So the first two code snippets I posted are the correct way to handle this situation with DESeq2?
The 2nd question (with the 3rd code snippet) is not about the batch effect, I just observed that results differ when having different matrix inputs, even though the samples that are compared are the same (just the presence of other samples changes the calculation apparently).
Yes
Yes, this is expected. And it is in our FAQ in the vignette.