I'm trying to set up a DEG analysis with deseq2. I've been getting an error running DESeqDataSetFromMatrix and I haven't been able to figure out the source from the error message.
> dim(countdata)
[1] 34722 112
> dim(samples)
[1] 112 4
> identical(sort(colnames(countdata)), sort(rownames(samples)))
[1] TRUE
> dds <- DESeqDataSetFromMatrix(
+ countData = countdata,
+ colData = samples,
+ design = ~Dose+Compound #must match testConditon
+ )
Error in DESeqDataSetFromMatrix(countData = countdata, colData = samples, :
rownames of the colData:
Vehicle_A_Liver_16399,Vehicle_A_Liver_16400
It goes on to list many but not all of the colnames of countdata, which leads me to think that something is wrong with that, but I've confirmed above that the colnames of the count table and the rownames of the sample table are identical. Any help would be appreciated.
Did not know that was a requirement, thank you!
A DESeqDataSet is a subclass of a RangedSummarizedExperiment, and the colData slot is intended to describe the columns of the 'assays' slot. So there is a check when you instantiate a new object that the rownames of the colData and the colnames of the samples (which ends up in the 'assays' slot) are identical.
That's one of the nice things about using data structures like this - they ensure that what you put together makes sense (for some definition of 'makes sense'), so it's hard to make errors that in retrospect would look silly, like mixing up which samples are which.