I am using DESeq2 to analysis rna-seq data with 8 biological replicates, which are paired samples. These samples are of primary cells, where variation between samples is expected. As this is a paired analysis, I am not removing batch effects.
When I plot PCA, I could do not see that the samples are separated in to two groups.
Here is my code:
x <- read.table("filt_counts.txt", header=T, row.names=1) subjects=factor(c(rep(1:8, each=2))) treat <- as.factor(rep(c("High","Low"),8)) colData <- data.frame(colnames(x),subjects=subjects, treat=treat, row.names=1) dds <- DESeqDataSetFromMatrix(countData = x, colData = colData, design = ~ subjects + treat) design(dds) <- formula(~ subjects + treat) dds <- DESeq(dds) rld <- rlog(dds) data <- plotPCA(rld, intgroup=c("treat", "subjects"), returnData=TRUE) percentVar <- round(100 * attr(data, "percentVar")) ggplot(data, aes(PC1, PC2, color=treat)) + geom_point(size=3) + xlab(paste0("PC1: ",percentVar[1],"% variance")) + ylab(paste0("PC2: ",percentVar[2],"% variance")) Should I trust the results despite having a PCA plot like above ?
Thanks Michael.
What if I need to select few samples for further assays ? What would be the best approach ?