In running
dds<- DESeqDataSetFromMatrix(countData = counts,
colData = samples,
design = ~ Group)
dds <- DESeq(dds)
res1 <- results(dds, contrast=c("Group","Pre","Ctl"))
resOrdered1 <- res1[order(res1$pvalue),]
I found that if my samples
file lists the sample IDs in a different order than they are listed in the counts
file, I get completely different DEGs in my results file.
Even though the samples IDs are all the same, it appears to be important that they are listed in the same order - is this normal?
Yes, that is my question.
So to confirm - the order of the samples in
colData
need to match the order of the samples incountData
- or else the results will be inaccurate?This is very clearly spelled out in the documentation and guides, yes.
In the documentation and previous posts, I see that this is addressed as
and that an error message
will appear if the first row of colData does not match the first colmun of countData.
However, this is not entirely clear that all remaining values are matched on order/position of items in the matrix - rather than by character string match.
It would be helpful if there was an error or warning message that they are matched by order/position in the matrix.
So in my case, the first item matched and the remaining items corresponded by character match, but the if the colData file differed in order from the one used in the generation of the count file, it threw everything off. I understand why this is now, but this was not abundantly clear before.
(this previous support post)
"It would be helpful if there was an error or warning message that they are matched by order/position in the matrix."
There is in fact such an error, which only can work if the strings match but are not in order. If the strings do not match, DESeq2 can't guess the matching obviously.
As far as our documentation, in the vignette we have:
In the workflow we have:
Ah OK I see that in vignette.
I was looking in the package manual and did not see mention of the order.
Even though my strings match, I see that as manually imported data from tximport it needs to be matched in order.
Thank you for clarifying, and I will be sure to refer (myself and others) to the vignette in the future.
I’m still confused as to how you didn't get an error (side note: we have a dedicated tximport to DESeq2 function).
Was it because your counts matrix was unnamed on the columns? Can you give an example when you say the strings matched: specifically which strings matched but DESeq2 didn’t give an error.
The columns in my count matrix are labeled - the strings/character in the first column matches the first row of data in colData, but after initial analysis a couple months ago I had rearranged the colData file. I realized the rearranged input file was generating different DEGs.
Here are glimpses of my countData vs colData:
As for the dedicated tximport to DESeq2 function, is that different from tximport(files = , type = , tx2gene = , )?
Oh, you were referring to sample names in a column of colData, but not the rownames? The matching check is based on the rownames.
See the vignette for details on the recommended import function for tximport.
When I read in the
colData
file and maderow.names = 1
, I got the warning that they aren't in the same order!Thank you!
Great! Thanks for posting again.