When running DESeq2 with the same expression data I get different results depending on the order of the columns in my count data matrix. Is this expected behaviour or am I messing something up?
My code looks like this:
countData <- read.table("matrix") colData <- read.table("samples", header=F) colnames(colData) <- c("", "genotype", "treatment") colData$treatment <- relevel(colData$treatment, ref="control") dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~ treatment)
I have two replicates per condition. Depending if my matrix file has the columns in the order control,control,treatment,treatment
or treatment,
control,treatment,control
I get completely different results. I would think that the information from the colData data frame is actually there to make the column order unimportant in countData.
Thanks in advance for any help.
Should the
col.names
of the counts be therow.names
of the colData? Or the first column? Do they have to be in the same order?The names are not important. The important thing is the order. In providing
countData
andcolData
to DESeqDataSetFromMatrix(), you are saying, row 1 of colData is the sample information for column 1 of countData, row 2 of colData is the sample information for column 2 of countData, etc.The best practice is to do the counting using a column of colData which gives the filename for the BAM files (this is what we do in the workflow), using either summarizeOverlaps or featureCounts. Or alternatively, if you use htseq-count, our import function takes the sample table and then uses that to put together the matrix for you.
But if you provide a matrix, it's up to you to ensure the orders are the same.
Ok, the problem was that the order of the rows in colData and the columns in counts was different in my case.
Thanks a lot for the quick help.