Hi everyone, I am confused with my DEseq2 result and PCA plot. I have two RNAseq samples(each sample has three biological replicates) that are very close to each other on the PCA plot. However, there is still a lot of differentially expressed genes(padj <0.05) between these two samples. I was thinking these two RNAseq samples should be very similar, because they are close to each other on the PCA plot, how can I still get a lot of differentially expressed genes between these two?
Is this normal? If so, can anyone give me some explanation ? Thanks in advance.
You need to provide more information to get some useful advice:
- your column data: as.data.frame(colData(dds))
- your design
- the code you used
- the output of sessionInfo()
- a picture of the PCA plot*
Thanks Michael.
Following are more information about my analysis:
genotype sizeFactor
rep1_A_16 A 0.5108994
rep2_A_16 A 1.8407776
rep3_A_16 A 0.8506794
rep1_B_16 B 1.0531460
rep2_B_16 B 0.4112253
rep3_B_16 B 1.0702545
rep2_C_16 C 1.2199964
rep3_C_16 C 1.2810071
rep1_D_16 D 1.0954071
rep2_D_16 D 1.1814218
rep3_D_16 D 1.4053602
LRTDesign = data.frame(row.names = colnames(R_data ),genotype = c( "A", "A", "A", "B", "B", "B", "C", "C", "D", "D", "D"))
R_data_matrix <- data.matrix(R_data)
ddsTC <- DESeqDataSetFromMatrix(countData = R_data_matrix, colData= LRTDesign, design = ~ genotype)
ddsTC <- DESeq(ddsTC, test="LRT", reduced = ~ 1)
D_vs_A_16<-results(ddsTC, name = "genotype_D_vs_A", test = "Wald")
I don't understand why there are more differentially expressed genes between genotype A and D than between genotype A and C, although A and D are more close in PCA plot. Thanks a lot.
A minor clarification in terminology: generally a sample refers to a single replicate, not a group of replicates. Anyway, if you could show us the PCA plot somehow, it would be much easier to see what you're trying to describe.
Thanks for your reply.
I do try to show the PCA plot, however, I don't know how to insert an image here.
What I am trying to describe is that I have done differentially expressed genes anlysis (DEseq2) with two data points A and B, both of which have three biological replicates. I got a lot of differentially expressed genes. However, on the PCA plot, data points A and B are very close to each other. I was thinking the gene expression in data points A and B should be very similar, since they are close in the PCA plot. Why I still got a lot of differentially expressed genes?