Question

question about removing both known and unknown confounders

0

Entering edit mode

Lisa • 0

@2d183270

Last seen 2.4 years ago

United States

Hello,

I would like to use DESeq2 on ATAC-seq data from human disease and control individuals. I know based on PCA plot that age is a big confounder. So, I want to first correct for age as a covariate, and then use RUVseq on the age-corrected count set to identify other unknown confounders and batch effects. What's the best way to do it?

Should I do this below or something else? 1) Using design(ddsruv) <- ~ W1 + age + disease

dds <- DESeqDataSetFromMatrix(countData = cts,
                              colData = condition,
                              design = ~ age + disease) 
dds$disease <- relevel(dds$disease, ref = "Control")

#DE analysis
dds<-DESeq(dds)
res<-results(dds)

#RUVseq
set <- newSeqExpressionSet(counts(dds))
idx  <- rowSums(counts(set) > 5) >= 2
set  <- set[idx, ]
set <- betweenLaneNormalization(set, which="upper")
not.sig <- rownames(res)[which(res$pvalue > .95)] 
empirical <- rownames(set)[ rownames(set) %in% not.sig ]
set <- RUVg(set, empirical, k=1) 
pData(set)

#re-run DEseq w/ RUV factors added
ddsruv<-dds
ddsruv$W1 <-set$W_1

design(ddsruv) <- ~ W1 + age + disease

ddsruv<-DESeq(ddsruv)
ddsruvClean <- ddsruv[which(mcols(ddsruv)$betaConv),]
resruv<-results(ddsruvClean)
summary(resruv)

Alternatively, should I just use a design ~W1 + disease (perhaps using k=2, and adding W2), because RUVseq would take into account the known age-related confounding as well.

Thanks, Lisa

DESeq2 RUVSeq • 1.3k views

ADD COMMENT • link updated 2.4 years ago by Michael Love 43k • written 2.4 years ago by Lisa • 0

score 0 · Answer 1 · 2022-12-05

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 6 days ago

United States

If my understanding of RUV is correct, I think you should do a LRT in the first DESeq() run (the one that defines not.sig). I would use there:

dds<-DESeq(dds, test="LRT", reduced=~1)
res<-results(dds)

In this way, not.sig will avoid age-related genes. Make sense?

ADD COMMENT • link 2.4 years ago Michael Love 43k

0

Entering edit mode

Thank you, I think that makes sense. And should I still use the original dds matrix after I do the RUVseq analysis? Is this correct?

#input matrix for DESeq2
dds <- DESeqDataSetFromMatrix(countData = cts,
                          colData = condition,
                          design = ~age+disease)
dds$disease <- relevel(dds$disease, ref = "Control")

#DE analysis before RUVseq
dds_for_ruv<-DESeq(dds, test="LRT", reduced=~1)
res<-results(dds_for_ruv)
#RUVseq
library("RUVSeq")
set <- newSeqExpressionSet(counts(dds_for_ruv))
idx  <- rowSums(counts(set) > 5) >= 2
set  <- set[idx, ]
set <- betweenLaneNormalization(set, which="upper")
not.sig <- rownames(res)[which(res$pvalue > .2)] 
empirical <- rownames(set)[ rownames(set) %in% not.sig ]
set <- RUVg(set, empirical, k=2) 
pData(set)

#re-run DEseq with the original dds matrix (including age) w/ RUV factors added
dds$W1 <-set$W_1
dds$W2 <-set$W_2
design(dds) <- ~ W1 + W2 + age + disease
dds<-DESeq(dds)
ddsruvClean <- dds[which(mcols(dds)$betaConv),]
resruv<-results(ddsruvClean, alpha=0.05)