Hello,
I would like to use DESeq2 on ATAC-seq data from human disease and control individuals. I know based on PCA plot that age is a big confounder. So, I want to first correct for age as a covariate, and then use RUVseq on the age-corrected count set to identify other unknown confounders and batch effects. What's the best way to do it?
Should I do this below or something else? 1) Using design(ddsruv) <- ~ W1 + age + disease
dds <- DESeqDataSetFromMatrix(countData = cts,
colData = condition,
design = ~ age + disease)
dds$disease <- relevel(dds$disease, ref = "Control")
#DE analysis
dds<-DESeq(dds)
res<-results(dds)
#RUVseq
set <- newSeqExpressionSet(counts(dds))
idx <- rowSums(counts(set) > 5) >= 2
set <- set[idx, ]
set <- betweenLaneNormalization(set, which="upper")
not.sig <- rownames(res)[which(res$pvalue > .95)]
empirical <- rownames(set)[ rownames(set) %in% not.sig ]
set <- RUVg(set, empirical, k=1)
pData(set)
#re-run DEseq w/ RUV factors added
ddsruv<-dds
ddsruv$W1 <-set$W_1
design(ddsruv) <- ~ W1 + age + disease
ddsruv<-DESeq(ddsruv)
ddsruvClean <- ddsruv[which(mcols(ddsruv)$betaConv),]
resruv<-results(ddsruvClean)
summary(resruv)
Alternatively, should I just use a design ~W1 + disease (perhaps using k=2, and adding W2), because RUVseq would take into account the known age-related confounding as well.
Thanks, Lisa
Thank you, I think that makes sense. And should I still use the original dds matrix after I do the RUVseq analysis? Is this correct?
This looks correct to me. You can also plot W1 and W2 against age and disease.