Question

Changing group labels yields different results in DESeq2

0

Entering edit mode

blobbyscience • 0

@blobbyscience-15133

Last seen 7.0 years ago

Hello,

I am analyzing RNA-seq raw counts data from 5 conditions (vehicle, tx1, tx2, tx3, tx4). When I label the groups "A", "B", "C", "D", "E" and run DESeq2 on all 5 groups at once, I get a list of 7000+ differentially expressed genes (padj < 0.05). However, when I label the groups with their actual names ("vehicle", "name1", "name2", etc), my results are now ~3000 differentially expressed genes. I know that R automatically alphabetizes the groups, so that in my second run, samples labeled as "vehicle" are considered last (at least that's how it shows up when I pull out individual genes using plotCounts, as well as when I do PCA). Has anyone had this issue before? Shouldn't my results theoretically be the same regardless of how I named the groups, since I gave DESeq2 the same data? Am I doing something wrong/missing something? Or maybe I shouldn't be using DESeq2 to analyze multiple groups at once? I checked my results against the DESeq2 analysis that our genome center did, and the 7000+ list was exactly the same (which makes sense because the only info I gave them was which samples were "A", which ones were "B", etc). Any insight would be much appreciated!

In case anyone is wondering how I labeled the samples, here is how I did it:

SamplesABCDE <- data.frame(row.names=colnames(CountTable), condition=as.factor(c(rep("A",3),rep("B",3),rep("C",3),rep("D",3),rep("E",3))))

SamplesGroupNames <- data.frame(row.names=colnames(CountTable), condition=as.factor(c(rep("Vehicle",3),rep("HLow",3),rep("HHigh",3),rep("MLow",3),rep("MHigh",3))))

ddsGroupNames <- DESeqDataSetFromMatrix(countData = CountTable, colData = SamplesGroupNames, design=~condition)
dds2_GroupNames <- DESeq(ddsGroupNames)

(For ABCDE, "ddsGroupNames" is replaced by "ddsABCDE", etc)

deseq2 multiple groups • 1.1k views

ADD COMMENT • link updated 7.0 years ago by Michael Love 43k • written 7.0 years ago by blobbyscience • 0

score 0 · Answer 1 · 2018-03-01

I'd say it's likely a bug in your code somewhere. E.g. here's a reproducible example of renaming the levels:

> dds <- makeExampleDESeqDataSet(m=15,betaSD=1)
> dds$condition <- factor(rep(1:5,each=3))
> dds <- DESeq(dds, quiet=TRUE)
> summary(results(dds, contrast=c("condition","5","1")))

out of 999 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 72, 7.2% 
LFC < 0 (down)   : 66, 6.6% 
outliers [1]     : 0, 0% 
low counts [2]   : 329, 33% 
(mean count < 8)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

> dds$condition <- factor(rep(5:1,each=3))
> dds <- DESeq(dds, quiet=TRUE)
> summary(results(dds, contrast=c("condition","1","5")))

out of 999 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 72, 7.2% 
LFC < 0 (down)   : 66, 6.6% 
outliers [1]     : 0, 0% 
low counts [2]   : 329, 33% 
(mean count < 8)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results