DESeq2 counts and output variables are Numbers instead of Sample Names
1
1
Entering edit mode
jshouse ▴ 10
@jshouse-10956
Last seen 23 months ago
United States

When using DESeq2 in the past, count data and normalized count data, etc... retained the names of samples indicated in the first column of the colData table.

I've used dds = DESeqDataSetFromMatrix(countData = countdata,
                             colData = colData,
                             design = ~ treatment) 

where colData looks like:

                 sample.name    treatment surgergy treatment.1   day
                      (fctr)       (fctr)   (fctr)      (fctr) (int)
1 GRC307R.15_S21_L001_R1_001 Day1AirDEMED    DEMED         Air     1
2 GRC307R.16_S10_L001_R1_001 Day1AirDEMED    DEMED         Air     1
3 GRC307R.17_S18_L001_R1_001 Day1AirDEMED    DEMED         Air     1

I expected the names of the columns in my count matrix from DESeq2 and subsequent analyses to contain the names (GRC307R.15_S21_L001_R1_001  etc...) but instead they are named 1:45 for the 45 samples. 

Any ideas? Thanks for your time.

 

deseq2 • 875 views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 1 hour ago
United States

Recently, some changes in SummarizedExperiment (an upstream package which defines the superclass that DESeqDataSet is based on) affected this behavior. In the NEWS file (which can be found on the SummarizedExperiment landing page):

"assay colnames() must agree with colData rownames()"

But anyway, it looks to me like your colData rownames here are 1,2,3,etc. Before you build the DESeqDataSet, what is colnames(countdata) and rownames(colData)?

 

ADD COMMENT
0
Entering edit mode

I assigned row.names(colData) to be my sample.name  

I ended up with an error when I tried define dds.

> colData<-(hash_table);head(colData)
                                               sample     treatment surgergy treatment.1 day rep          group
GRC307R.2p_S24_L001_R1_001 GRC307R.2p_S24_L001_R1_001   Day1AirSHAM     SHAM         Air   1   1   Day1AirSHAM1
GRC307R.6_S17_L001_R1_001   GRC307R.6_S17_L001_R1_001   Day1AirSHAM     SHAM         Air   1   2   Day1AirSHAM2
GRC307R.7_S2_L001_R1_001     GRC307R.7_S2_L001_R1_001   Day1AirSHAM     SHAM         Air   1   3   Day1AirSHAM3
GRC307R.8_S15_L001_R1_001   GRC307R.8_S15_L001_R1_001   Day1AirSHAM     SHAM         Air   1   4   Day1AirSHAM4
GRC307R.37_S22_L001_R1_001 GRC307R.37_S22_L001_R1_001 Day1OzoneSHAM     SHAM       Ozone   1   1 Day1OzoneSHAM1
GRC307R.38_S12_L001_R1_001 GRC307R.38_S12_L001_R1_001 Day1OzoneSHAM     SHAM       Ozone   1   2 Day1OzoneSHAM2
> dds = DESeqDataSetFromMatrix(countData = countdata,
+                              colData = colData,
+                              design = ~ treatment)
Error in DESeqDataSetFromMatrix(countData = countdata, colData = colData,  : 
  rownames of the colData:
   GRC307R.2p_S24_L001_R1_001,GRC307R.6_S17_L001_R1_001,GRC307R.7_S2_L001_R1_001,GRC307R.8_S15_L001_R1_001,GRC307R.37_S22_L001_R1_001,GRC307R.38_S12_L001_R1_0

 

Do the columns of the countData have to be in the same order as rows of colData?

Second question while I have you.  I have counts from Partek flow that assigns reads based on EM. I rounded the matrix of counts to nearest integer to feed integers into DESeq2. Is that still considered acceptable?

 

ADD REPLY
1
Entering edit mode

The columns and the rows need to be in the exact same order. This is very important! From our RNA-seq workflow:

"If you’ve counted reads with some other software, it is very important to check that the columns of the count matrix correspond to the rows of the sample information table."

From the help page for ?DESeqDataSetFromMatrix:

"Rows of colData correspond to columns of countData"

Yes, we have evaluated recently that rounded estimated counts (from an EM) can be used as input to DESeq2. What shouldn't be used is any kind of "normalized counts" which would mean that they have been divided by something or transformed at all.

ADD REPLY

Login before adding your answer.

Traffic: 757 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6