Question

Column order of count data for DESeq2

0

Entering edit mode

lelle • 0

@lelle-8914

Last seen 9.4 years ago

European Union

When running DESeq2 with the same expression data I get different results depending on the order of the columns in my count data matrix. Is this expected behaviour or am I messing something up?

My code looks like this:

countData <- read.table("matrix")
colData <- read.table("samples", header=F)
colnames(colData) <- c("", "genotype", "treatment")
colData$treatment <- relevel(colData$treatment, ref="control")
dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~ treatment)

I have two replicates per condition. Depending if my matrix file has the columns in the order control,control,treatment,treatment or treatment,control,treatment,control I get completely different results. I would think that the information from the colData data frame is actually there to make the column order unimportant in countData.

Thanks in advance for any help.

deseq2 • 3.7k views

ADD COMMENT • link updated 9.4 years ago by Michael Love 43k • written 9.4 years ago by lelle • 0

score 2 · Accepted Answer · 2015-10-01

2

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

This should not happen, so I'd go over the code carefully. I would make sure that the columns of counts are correctly lined up with the rows of colData. Usually this is taken care of, because we do the counting or read in count files using a column from colData. If you are providing a matrix and a colData which gives the filenames of the BAM files, it's up to you to ensure that these two objects match.

With only four files, you can try to keep track of which is which by total number of reads aligned to genes:

colSums(countData)

ADD COMMENT • link 9.4 years ago Michael Love 43k

0

Entering edit mode

Should the col.names of the counts be the row.names of the colData? Or the first column? Do they have to be in the same order?

ADD REPLY • link 9.4 years ago lelle • 0

0

Entering edit mode

The names are not important. The important thing is the order. In providing countData and colData to DESeqDataSetFromMatrix(), you are saying, row 1 of colData is the sample information for column 1 of countData, row 2 of colData is the sample information for column 2 of countData, etc.

The best practice is to do the counting using a column of colData which gives the filename for the BAM files (this is what we do in the workflow), using either summarizeOverlaps or featureCounts. Or alternatively, if you use htseq-count, our import function takes the sample table and then uses that to put together the matrix for you.

But if you provide a matrix, it's up to you to ensure the orders are the same.

ADD REPLY • link 9.4 years ago Michael Love 43k

0

Entering edit mode

Ok, the problem was that the order of the rows in colData and the columns in counts was different in my case.

Thanks a lot for the quick help.

ADD REPLY • link 9.4 years ago lelle • 0