Question

Getting batch corrected counts after RUVSeq

0

Entering edit mode

Shashank • 0

@fdf47975

Last seen 18 months ago

Germany

I have an RNASeq dataset with data from multiple sources and I have decided to use RUVSeq along with DESeq2 for the batch correction and DEG analysis. Following the information on the vignette here, I have the following code

#generating dds object and result
dds <- DESeq2::DESeqDataSetFromMatrix (countData = as.data.frame(final_merged_counts), colData = col_data , design = ~ Type )
dds$Type <- relevel(dds$Type,ref = "Control")
dds <- DESeq(dds,test = "LRT",reduced = ~1)
res <- results(dds)
#batch correction 
set <- newSeqExpressionSet(counts(dds))
idx <- rowSums(counts(set) >5) >=2
set <- set[idx,]
set <- betweenLaneNormalization(set,which = "upper")
not.sig <- rownames(set)[which(res$pvalue>0.1)]
empirical <- rownames(set)[rownames(set) %in% not.sig]
set <- RUVg(set,empirical, k=7)
pData(set)

#results
dds$W1 <- set$W_1
dds$W2 <- set$W_2
dds$W3 <- set$W_3
dds$W4 <- set$W_4
dds$W5 <- set$W_5
dds$W6 <- set$W_6
dds$W7 <- set$W_7
design(dds) <- ~ W1 + W2 + W3 + W4+ W5+W6+W7+ Type 
dds <- DESeq(dds)
dds <- dds[which(mcols(dds)$betaConv),]
result_after_ruv <- results(dds, contrast = c("Type", "Tumor", "Control"))

I would now like to use limma::removeBatchEffect on the matrix assay(vsd) however I am not sure how to incorporate the factors from RUVSeq with the function. I would like to get final batch corrected counts for downstream analysis. Does it make sense to use counts(dds,normalized=TRUE) I also tried

vsd <- vst(dds,blind=FALSE)
mat <- assay(vsd)
mat2 <- limma::removeBatchEffect(mat,batch = vsd$W1+vsd$W2+vsd$W3+vsd$W4+vsd$W5+vsd$W6+vsd$W7)

However the final results are not correct since all the values in a row are same

                TCGA-AB-2955-03A-01T-0734-13 TCGA-AB-2936-03A-01T-0740-13 TCGA-AB-2913-03A-01T-0734-13 TCGA-AB-2928-03A-01T-0740-13
ENSG00000000003                     4.380496                     4.380496                     4.380496                     4.380496
ENSG00000000005                     2.511648                     2.511648                     2.511648                     2.511648
ENSG00000000419                     9.187214                     9.187214                     9.187214                     9.187214
ENSG00000000457                     8.844177                     8.844177                     8.844177                     8.844177
ENSG00000000460                     8.529430                     8.529430                     8.529430                     8.529430
                TCGA-AB-2878-03A-01T-0734-13
ENSG00000000003                     4.380496
ENSG00000000005                     2.511648
ENSG00000000419                     9.187214
ENSG00000000457                     8.844177
ENSG00000000460                     8.529430

Would really appreciate any help/guidance on this issue. Thank you

BatchEffect RUVSeq RNASeq DESeq2 • 1.8k views

ADD COMMENT • link updated 19 months ago by ATpoint ★ 4.8k • written 19 months ago by Shashank • 0

score 1 · Answer 1 · 2023-09-12

1

Entering edit mode

ATpoint ★ 4.8k

@atpoint-13662

Last seen 1 hour ago

Germany

You're using removeBatchEffect incorrectly. The batch argument takes factors, but you have numeric values. You can feed the matrix with the RUVg factors (I know, name collision, RUVg "factors" are the numeric values, but batch= takes factor()) into the covariate= argument of the limma function.

ADD COMMENT • link 19 months ago ATpoint ★ 4.8k

0

Entering edit mode

Thank you for the answer. I understand the issue now, do you think this would be right way to obtain the counts then:

mat2 <- limma::removeBatchEffect(mat,covariates = vsd$W1+vsd$W2+vsd$W3+vsd$W4+vsd$W5+vsd$W6+vsd$W7)

Sorry I dont fully understand what you mean by "the matrix with the RUVg factors "

ADD REPLY • link 19 months ago Shashank • 0

1

Entering edit mode

pData(set) should contain a matrix (or data.frame, not sure) with the RUVseq factors that can go into removeBatchEffect. Use this matrix with covariates argument, just the matrix, not with + or any formula.

ADD REPLY • link 19 months ago ATpoint ★ 4.8k

0

Entering edit mode

Ah ok, sorry if this out of scope of this question but can you explain why this works? What I mean is that the dataframe, pData(set) only contains information about the RUVg factors and not the initial covariate from the experiment. Also since we do not specify any value for the batch argument, does the function still work as intended?

Thank you for your help.

ADD REPLY • link 19 months ago Shashank • 0

1

Entering edit mode

you do not pass any group information to set so pDats only contains the factors.

ADD REPLY • link 19 months ago ATpoint ★ 4.8k