I have an RNASeq dataset with data from multiple sources and I have decided to use RUVSeq along with DESeq2 for the batch correction and DEG analysis. Following the information on the vignette here, I have the following code
#generating dds object and result
dds <- DESeq2::DESeqDataSetFromMatrix (countData = as.data.frame(final_merged_counts), colData = col_data , design = ~ Type )
dds$Type <- relevel(dds$Type,ref = "Control")
dds <- DESeq(dds,test = "LRT",reduced = ~1)
res <- results(dds)
#batch correction
set <- newSeqExpressionSet(counts(dds))
idx <- rowSums(counts(set) >5) >=2
set <- set[idx,]
set <- betweenLaneNormalization(set,which = "upper")
not.sig <- rownames(set)[which(res$pvalue>0.1)]
empirical <- rownames(set)[rownames(set) %in% not.sig]
set <- RUVg(set,empirical, k=7)
pData(set)
#results
dds$W1 <- set$W_1
dds$W2 <- set$W_2
dds$W3 <- set$W_3
dds$W4 <- set$W_4
dds$W5 <- set$W_5
dds$W6 <- set$W_6
dds$W7 <- set$W_7
design(dds) <- ~ W1 + W2 + W3 + W4+ W5+W6+W7+ Type
dds <- DESeq(dds)
dds <- dds[which(mcols(dds)$betaConv),]
result_after_ruv <- results(dds, contrast = c("Type", "Tumor", "Control"))
I would now like to use limma::removeBatchEffect
on the matrix assay(vsd)
however I am not sure how to incorporate the factors from RUVSeq
with the function. I would like to get final batch corrected counts for downstream analysis. Does it make sense to use counts(dds,normalized=TRUE)
I also tried
vsd <- vst(dds,blind=FALSE)
mat <- assay(vsd)
mat2 <- limma::removeBatchEffect(mat,batch = vsd$W1+vsd$W2+vsd$W3+vsd$W4+vsd$W5+vsd$W6+vsd$W7)
However the final results are not correct since all the values in a row are same
TCGA-AB-2955-03A-01T-0734-13 TCGA-AB-2936-03A-01T-0740-13 TCGA-AB-2913-03A-01T-0734-13 TCGA-AB-2928-03A-01T-0740-13
ENSG00000000003 4.380496 4.380496 4.380496 4.380496
ENSG00000000005 2.511648 2.511648 2.511648 2.511648
ENSG00000000419 9.187214 9.187214 9.187214 9.187214
ENSG00000000457 8.844177 8.844177 8.844177 8.844177
ENSG00000000460 8.529430 8.529430 8.529430 8.529430
TCGA-AB-2878-03A-01T-0734-13
ENSG00000000003 4.380496
ENSG00000000005 2.511648
ENSG00000000419 9.187214
ENSG00000000457 8.844177
ENSG00000000460 8.529430
Would really appreciate any help/guidance on this issue. Thank you
Thank you for the answer. I understand the issue now, do you think this would be right way to obtain the counts then:
Sorry I dont fully understand what you mean by "the matrix with the RUVg factors "
pData(set)
should contain a matrix (or data.frame, not sure) with the RUVseq factors that can go into removeBatchEffect. Use this matrix with covariates argument, just the matrix, not with+
or any formula.Ah ok, sorry if this out of scope of this question but can you explain why this works? What I mean is that the dataframe,
pData(set)
only contains information about the RUVg factors and not the initial covariate from the experiment. Also since we do not specify any value for thebatch
argument, does the function still work as intended?Thank you for your help.
you do not pass any group information to
set
so pDats only contains the factors.