Column order of count data for DESeq2
1
0
Entering edit mode
lelle • 0
@lelle-8914
Last seen 9.4 years ago
European Union

When running DESeq2 with the same expression data I get different results depending on the order of the columns in my count data matrix. Is this expected behaviour or am I messing something up?

My code looks like this:

countData <- read.table("matrix")
colData <- read.table("samples", header=F)
colnames(colData) <- c("", "genotype", "treatment")
colData$treatment <- relevel(colData$treatment, ref="control")
dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~ treatment)

I have two replicates per condition. Depending if my matrix file has the columns in the order control,control,treatment,treatment or treatment,control,treatment,control I get completely different results. I would think that the information from the colData data frame is actually there to make the column order unimportant in countData.

Thanks in advance for any help.

deseq2 • 3.7k views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 1 day ago
United States

This should not happen, so I'd go over the code carefully. I would make sure that the columns of counts are correctly lined up with the rows of colData. Usually this is taken care of, because we do the counting or read in count files using a column from colData. If you are providing a matrix and a colData which gives the filenames of the BAM files, it's up to you to ensure that these two objects match.

With only four files, you can try to keep track of which is which by total number of reads aligned to genes:

colSums(countData)
ADD COMMENT
0
Entering edit mode

Should the col.names of the counts be the row.names of the colData? Or the first column? Do they have to be in the same order?

ADD REPLY
0
Entering edit mode

The names are not important. The important thing is the order. In providing countData and colData to DESeqDataSetFromMatrix(), you are saying, row 1 of colData is the sample information for column 1 of countData, row 2 of colData is the sample information for column 2 of countData, etc.

The best practice is to do the counting using a column of colData which gives the filename for the BAM files (this is what we do in the workflow), using either summarizeOverlaps or featureCounts. Or alternatively, if you use htseq-count, our import function takes the sample table and then uses that to put together the matrix for you.

But if you provide a matrix, it's up to you to ensure the orders are the same.

ADD REPLY
0
Entering edit mode

Ok, the problem was that the order of the rows in colData and the columns in counts was different in my case.

Thanks a lot for the quick help.

ADD REPLY

Login before adding your answer.

Traffic: 1113 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6