Entering edit mode
Rimma
•
0
@rimma-21441
Last seen 5.5 years ago
Hello, I'm struggling with batch correction for RNA-seq data in DESeq2. For example, my colData looks like this (10 samples, 6 controls+4 treatment, belong to 2 batches):
samples condition batch
100 PH7 1
101 PH7 1
103 PH7 1
63 PH7 1
64 ctr 1
74 ctr 1
75 ctr 1
76 ctr 2
88 ctr 2
99 ctr 2
As far as I understood from this post, my problem is that some conditions belongs only to one batch, for example, all "PH7" belong only to 1 batch. I tried to do as was suggested on the post:
mm = model.matrix(~ batch+conditions, colData(dds))
And then look up for columns where ALL zeros, however, I don't have such... At least in one raw of each column there is 1.
Is there a way to make such analysis?
I tried, it shows this one:
I don't get that error when I run this design and this column data. Maybe check your code?
Thank you for reply Michael!
I a bit simplified colData for post, but does it make changes if my actual colData looks like this (so the major difference I see is that the third batch has all conditions which don't belong to any other batches):
Otherwise, my code looks fine to me, but I will recheck it again
Yes it makes a difference. This is why it's good to try to describe your actual data, so we don't go back and forth while talking about different datasets.
In your actual dataset, you can't control for batch effects because your batch 3 is confounded with your condition there. This means that your results cannot be trusted entirely, regardless of what statistical method you use, because you can't tell batch 3 apart from that condition.
While this doesn't solve that particular problem, my preferred approach to deal with the two batches within control at this point would be to use SVA to capture heterogeneity that is orthogonal to the condition. We have example code in the workflow on how to do this.
Sorry for this.
Yes,I understand the problem now...
Thank you for clarifications :)