Hi,
I wonder if I can regress out batch effects from my single-cell data using following strategy. Though the tutorial suggested using limma to remove batch effects, it could output negative counts, which is not ideal in my case. I'd like to utilize DESeq2 result as much as possible, and I'm wondering if I am doing it in a correct way.
After fitting model as below:
dds <- DESeqDataSetFromMatrix(
countData = sub_count_matrix,
colData = cell_metadata,
design = ~ Batch + Condition
)
# size factor was pre-estimated using scran
sizeFactors(dds) <- size_factors
dds <- DESeq(
dds,
test = "LRT",
reduced = ~Batch,
useT = TRUE,
minmu = 1e-6,
minReplicatesForReplace = Inf,
parallel = TRUE,
fitType = "local"
)
I regressed out batch effects as following and stored the 'corrected' counts in corrected_count_matrix
.
res_batch <- results(dds, contrast = c("Batch", "Batch2", "Batch1"))
corrected_count_matrix <- counts(dds, normalized = TRUE)
cell_idx_Batch2 <- which(cell_metadata$Hour == "Batch2")
for (peakIdx in 1:dim(res_batch)[1]) {
peak_batch_effect <- 2^(res_batch[peakIdx, "log2FoldChange"])
corrected_count_matrix[peakIdx, cell_idx_Batch2] <- corrected_count_matrix[peakIdx, cell_idx_Batch2] / peak_batch_effect
}
Let me know if that strategy makes sense. Thanks a lot in advance!
The limma batch correction is done on the scale of log-counts per million, making it impossible to get negative counts as a result.