DESeq2 indirect batch effect removal
1
0
Entering edit mode
Solarion • 0
@solarion-22030
Last seen 4.3 years ago
University Hospital Jena, Germany

Hi, I use DESeq2 in R for differential expression calculation. I already know the design is terrible, but this is what I have to work with ;) Question 1: can I remove the batch effect between 2 groups if their samples do not share a batch, but some of their samples share a batch with a third group? (groups are different genotypes, batches are different experiments) Example: I want to compare group gA (samples s1, s2, s3) with group gC (samples s8, s9, s10, s11). These samples come from different batches (batch bX and bY), but no sample of group gA shares a batch with samples from group gC, so a batch effect removal would be impossible. However, there is group gB (samples s4, s5, s6, s7), which has 2 samples from batch bX and 2 from batch bY.

myColData <- data.frame(row.names=c("s1","s2","s3","s4","s5","s6","s7","s8","s9","s10","s11"),
                            group=c("gA","gA","gA","gB","gB","gB","gB","gC","gC","gC", "gC"),
                            batch=c("bX","bX","bX","bX","bX","bY","bY","bY","bY","bY", "bY"))
print(myColData)

myCounts <- matrix(round(runif(1100, 0, 20)), ncol=11, nrow=100)

My approach would be to 1) initiate a DESeq object, with known batch effect in the design 2) apply the DESeq function over it 3) calculate the results of that with only gC and gA given in the contrast

library(DESeq2)
myDE1 <- DESeqDataSetFromMatrix(myCounts, colData=myColData, design=~batch+group)
myDE2 <- DESeq(myDE1)
myDE3 <- results(myDE2, contrast=c("group", "gC", "gA"))

My 1st question: Is this a correct way to handle the batch effect? My 2nd question: Let's assume there is no batch here, just the groups. DESeq2 will give different results when I give as input (step 1) the whole matrix with all samples and then compare (in step 3) only the groups I care about (as done above, Input was gA, gB and gC but I am only interested in gC vs. gA), compared with when I give a matrix with only the groups of Interest to begin with. Is one of the ways incorrect? Shouldn't the results (myDE03 & myDE13) theoretically be the same?

#input whole matrix, compare only gC and gA in the end
myDE01 <- DESeqDataSetFromMatrix(myCounts, colData=myColData, design=~group)
myDE02 <- DESeq(myDE01)
myDE03 <- results(myDE02, contrast=c("group", "gC", "gA"))
myDE03

#input only samples from gC and gA and compare them
myDE11 <- DESeqDataSetFromMatrix(myCounts[,c(1:3,8:11)], colData=myColData[c(1:3,8:11),], design=~group)
myDE12 <- DESeq(myDE11)
myDE13 <- results(myDE12, contrast=c("group", "gC", "gA"))
myDE13

Any help is appreciated

deseq2 batch • 1.2k views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 6 days ago
United States
> myColData
    group batch
s1     gA    bX
s2     gA    bX
s3     gA    bX
s4     gB    bX
s5     gB    bX
s6     gB    bY
s7     gB    bY
s8     gC    bY
s9     gC    bY
s10    gC    bY
s11    gC    bY

Yes the shared samples help you to estimate the batch effect.

DESeq2 will give different results when I give as input (step 1) the whole matrix with all samples and then compare (in step 3) only the groups I care about (as done above, Input was gA, gB and gC but I am only interested in gC vs. gA), compared with when I give a matrix with only the groups of Interest to begin with. Is one of the ways incorrect?

This is a FAQ in our vignette.

ADD COMMENT
0
Entering edit mode

thanks for the answer, I am just not sure I understand it correctly. So the first two code snippets I posted are the correct way to handle this situation with DESeq2?

The 2nd question (with the 3rd code snippet) is not about the batch effect, I just observed that results differ when having different matrix inputs, even though the samples that are compared are the same (just the presence of other samples changes the calculation apparently).

ADD REPLY
1
Entering edit mode

So the first two code snippets I posted are the correct way to handle this situation with DESeq2?

Yes

I just observed that results differ when having different matrix inputs, even though the samples that are compared are the same (just the presence of other samples changes the calculation apparently).

Yes, this is expected. And it is in our FAQ in the vignette.

ADD REPLY

Login before adding your answer.

Traffic: 415 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6