Question

DESeq2 generating different DEGs based on the order of samples listed in the colData = samples file

0

Entering edit mode

knholm • 0

@knholm-18825

Last seen 4.3 years ago

In running

dds<- DESeqDataSetFromMatrix(countData = counts,
                             colData = samples,
                             design = ~ Group)
dds <- DESeq(dds)

res1 <- results(dds, contrast=c("Group","Pre","Ctl"))

resOrdered1 <- res1[order(res1$pvalue),]

I found that if my samples file lists the sample IDs in a different order than they are listed in the counts file, I get completely different DEGs in my results file.

Even though the samples IDs are all the same, it appears to be important that they are listed in the same order - is this normal?

DESeq2 • 2.6k views

ADD COMMENT • link updated 5.0 years ago by Michael Love 43k • written 5.0 years ago by knholm • 0

score 0 · Answer 1 · 2020-04-14

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 9 days ago

United States

I guarantee the cause is this: you are not ordering your samples in countData in the correct order vis a vis colData. The results are order invariant (e.g. it doesn't matter which sample comes first in the DESeqDataSet), unless you switch up those inputs such that the i-th column of countData is not paired with the i-th column of colData, etc.

ADD COMMENT • link 5.0 years ago Michael Love 43k

0

Entering edit mode

Yes, that is my question.

So to confirm - the order of the samples in colData need to match the order of the samples in countData - or else the results will be inaccurate?

ADD REPLY • link 5.0 years ago knholm • 0

0

Entering edit mode

This is very clearly spelled out in the documentation and guides, yes.

ADD REPLY • link 5.0 years ago Michael Love 43k

0

Entering edit mode

In the documentation and previous posts, I see that this is addressed as

for matrix input: a DataFrame or data.frame with at least a single column. Rows of colData correspond to columns of countData

and that an error message

"assay colnames() must be NULL or equal colData rownames()"

will appear if the first row of colData does not match the first colmun of countData.

However, this is not entirely clear that all remaining values are matched on order/position of items in the matrix - rather than by character string match.

It would be helpful if there was an error or warning message that they are matched by order/position in the matrix.

So in my case, the first item matched and the remaining items corresponded by character match, but the if the colData file differed in order from the one used in the generation of the count file, it threw everything off. I understand why this is now, but this was not abundantly clear before.

(this previous support post)

ADD REPLY • link 5.0 years ago knholm • 0

0

Entering edit mode

"It would be helpful if there was an error or warning message that they are matched by order/position in the matrix."

There is in fact such an error, which only can work if the strings match but are not in order. If the strings do not match, DESeq2 can't guess the matching obviously.

> coldata <- data.frame(x=factor(c(1,1,2,2)), row.names=LETTERS[1:4])
> cts <- matrix(1:16, ncol=4, dimnames=list(1:4, LETTERS[4:1]))
> dds <- DESeqDataSetFromMatrix(cts, coldata, ~x)
Error in DESeqDataSetFromMatrix(cts, coldata, ~x) :
  rownames of the colData:
   A,B,C,D
  are not in the same order as the colnames of the countData:
   D,C,B,A

As far as our documentation, in the vignette we have:

It is absolutely critical that the columns of the count matrix and the rows of the column data (information about samples) are in the same order. DESeq2 will not make guesses as to which column of the count matrix belongs to which row of the column data, these must be provided to DESeq2 already in consistent order.

In the workflow we have:

If you’ve imported the count data in some other way, for example loading a pre-computed count matrix, it is very important to check manually that the columns of the count matrix correspond to the rows of the sample information table.

ADD REPLY • link 5.0 years ago Michael Love 43k

0

Entering edit mode

Ah OK I see that in vignette.

I was looking in the package manual and did not see mention of the order.

Even though my strings match, I see that as manually imported data from tximport it needs to be matched in order.

Thank you for clarifying, and I will be sure to refer (myself and others) to the vignette in the future.

ADD REPLY • link 5.0 years ago knholm • 0

0

Entering edit mode

I’m still confused as to how you didn't get an error (side note: we have a dedicated tximport to DESeq2 function).

Was it because your counts matrix was unnamed on the columns? Can you give an example when you say the strings matched: specifically which strings matched but DESeq2 didn’t give an error.

ADD REPLY • link 5.0 years ago Michael Love 43k

0

Entering edit mode

The columns in my count matrix are labeled - the strings/character in the first column matches the first row of data in colData, but after initial analysis a couple months ago I had rearranged the colData file. I realized the rearranged input file was generating different DEGs.

Here are glimpses of my countData vs colData:

countData 


            PH6DS10 PH4RH25 PH4ER7  PH4JG26
A1BG          6       5       2        5
A1BG-AS1      3       2       4        4
A1CF          0       0       0      2.993
A2M          83      111     38       97    

Same order colData:

sample
PH6DS10
PH4RH25
PH4ER7
PH4JG26

Different order colData:

sample
PH6DS10
PH4JG26
PH4ER7
PH4RH25

As for the dedicated tximport to DESeq2 function, is that different from tximport(files = , type = , tx2gene = , )?

ADD REPLY • link 5.0 years ago knholm • 0

0

Entering edit mode

Oh, you were referring to sample names in a column of colData, but not the rownames? The matching check is based on the rownames.

See the vignette for details on the recommended import function for tximport.

ADD REPLY • link 5.0 years ago Michael Love 43k

0

Entering edit mode

When I read in the colData file and made row.names = 1, I got the warning that they aren't in the same order!

Thank you!

Error in DESeqDataSetFromMatrix(counts, colData = samples, design = ~Group) : 
  rownames of the colData:
   PH6DS10,PH38X22,PH40X23,PH4RH25,PH4ER7,PH4JG26,PH4LL8,PH6WH11,PH8JF14,PH8EB15,PH8WS17,PH7RS13
  are not in the same order as the colnames of the countData:
   PH6DS10,PH4RH25,PH4ER7,PH4JG26,PH4LL8,PH6WH11,PH8JF14,PH8EB15,PH38X22,PH40X23,PH8WS17,PH7RS13