Question

DESeq2 PCA plot: paired analysis

0

Entering edit mode

g.atla ▴ 10

@gatla-9491

Last seen 8.2 years ago

I am using DESeq2 to analysis rna-seq data with 8 biological replicates, which are paired samples. These samples are of primary cells, where variation between samples is expected. As this is a paired analysis, I am not removing batch effects.

When I plot PCA, I could do not see that the samples are separated in to two groups.

Here is my code:

x <- read.table("filt_counts.txt", header=T, row.names=1)

subjects=factor(c(rep(1:8, each=2)))
treat <- as.factor(rep(c("High","Low"),8))

colData <- data.frame(colnames(x),subjects=subjects, treat=treat, row.names=1)
dds <- DESeqDataSetFromMatrix(countData = x, colData = colData, design = ~ subjects + treat)
design(dds) <- formula(~ subjects + treat)
dds <- DESeq(dds)

rld <- rlog(dds)
data <- plotPCA(rld, intgroup=c("treat", "subjects"), returnData=TRUE)
percentVar <- round(100 * attr(data, "percentVar"))

ggplot(data, aes(PC1, PC2, color=treat)) +
        geom_point(size=3) +
        xlab(paste0("PC1: ",percentVar[1],"% variance")) +
        ylab(paste0("PC2: ",percentVar[2],"% variance")) 

Should I trust the results despite having a PCA plot like above ?

deseq2 pca • 9.5k views

ADD COMMENT • link 9.2 years ago g.atla ▴ 10

score 2 · Answer 1 · 2016-02-07

2

Entering edit mode

Michael Love 43k

@mikelove

Last seen 3 hours ago

United States

This just means that the subject effect is larger than the treatment effect. But you can still perform inference on the treatment effects using the ~subject + treat design. If you want, you can look at the results for significant genes using plotCounts, to see how treatment effects within subjects look.

ADD COMMENT • link 9.2 years ago Michael Love 43k

0

Entering edit mode

Thanks Michael.

ADD REPLY • link 9.2 years ago g.atla ▴ 10

0

Entering edit mode

What if I need to select few samples for further assays ? What would be the best approach ?

ADD REPLY • link 9.2 years ago g.atla ▴ 10

0

Entering edit mode

I don't have a good answer for this. Remember, the observed data for samples and so their distances depends on underlying biology and also on technical factors like library preparation.

ADD REPLY • link 9.2 years ago Michael Love 43k