Hello,
I am analyzing RNA-seq raw counts data from 5 conditions (vehicle, tx1, tx2, tx3, tx4). When I label the groups "A", "B", "C", "D", "E" and run DESeq2 on all 5 groups at once, I get a list of 7000+ differentially expressed genes (padj < 0.05). However, when I label the groups with their actual names ("vehicle", "name1", "name2", etc), my results are now ~3000 differentially expressed genes. I know that R automatically alphabetizes the groups, so that in my second run, samples labeled as "vehicle" are considered last (at least that's how it shows up when I pull out individual genes using plotCounts, as well as when I do PCA). Has anyone had this issue before? Shouldn't my results theoretically be the same regardless of how I named the groups, since I gave DESeq2 the same data? Am I doing something wrong/missing something? Or maybe I shouldn't be using DESeq2 to analyze multiple groups at once? I checked my results against the DESeq2 analysis that our genome center did, and the 7000+ list was exactly the same (which makes sense because the only info I gave them was which samples were "A", which ones were "B", etc). Any insight would be much appreciated!
In case anyone is wondering how I labeled the samples, here is how I did it:
SamplesABCDE <- data.frame(row.names=colnames(CountTable), condition=as.factor(c(rep("A",3),rep("B",3),rep("C",3),rep("D",3),rep("E",3))))
SamplesGroupNames <- data.frame(row.names=colnames(CountTable), condition=as.factor(c(rep("Vehicle",3),rep("HLow",3),rep("HHigh",3),rep("MLow",3),rep("MHigh",3))))
ddsGroupNames <- DESeqDataSetFromMatrix(countData = CountTable, colData = SamplesGroupNames, design=~condition)
dds2_GroupNames <- DESeq(ddsGroupNames)
(For ABCDE, "ddsGroupNames" is replaced by "ddsABCDE", etc)