Question

duplicated row names when creating DESeqDataSetFromMatrix

0

Entering edit mode

Assa Yeroslaviz ★ 1.5k

@assa-yeroslaviz-1597

Last seen 10 weeks ago

Germany

Hi,

I am working with a workflow for Cripr/Cas9 screening and try to analyze the data with `DESeq2`.

my count table looks like that:

> head(countdata)
                         CTRL CTRL TREAT TREAT
BAX_GAAACATGTCAGCTGCCACT   87  267   511   353
BAX_GAACTCACCCCTGAAGCAAA  340  474   772  1063
BAX_GAAGCGCATCGGGGACGAAC   88  117   365   461
BAX_GACAGGGGCCCTTTTGCTTC  731  690    99   450
BAX_GACCGGGTCCAGGGCCAGCT  374  649   150   230
BAX_GACCTTGAGCACCAGTTTGC  425  634   258   203
...
random_GGGCGGACGCACCGACCAAA  159  155    21     4
random_GGGGAACGGACGCCGAACGG  302  320   156   120
random_GGGGACGCGAGGCACGCGAC  233  134     0     0
random_GGGGACGCGGGCCCGCACAA  306  334   251   549
random_GGGGCGGCAACGAAAACGCG    7   42     0     0
random_GGGGGAACGAAACACGAGCG  296  260    40    39

When I'm trying to read it into a dds object I get the following error:

> dds <- DESeq2::DESeqDataSetFromMatrix(countData = countdata,
+     colData = coldata, design = ~condition)
Error in `rownames<-`(`*tmp*`, value = colnames(countData)) :
  duplicate rownames not allowed

But when I test the row names for duplications I can't find any.

> anyDuplicated(rownames(countdata))
[1] 0

> table(table(rownames(countdata)))
    1
12402

What am I missing here?

How can I find out why this error is occuring?

thanks

Assa

The countdata object is a data.frame

> str(countdata)
'data.frame': 12402 obs. of  4 variables:
$ CTRL : num  87 340 88 731 374 425 753 279 151 249 ...
$ CTRL : num  267 474 117 690 649 634 957 375 145 374 ...
$ TREAT: num  511 772 365 99 150 ...
$ TREAT: num  353 1063 461 450 230 ...

R.version
               _
platform       x86_64-pc-linux-gnu
arch           x86_64
os             linux-gnu
system         x86_64, linux-gnu
status
major          3
minor          4.0
year           2017
month          04
day            21
svn rev        72570
language       R
version.string R version 3.4.0 (2017-04-21)
nickname       You Stupid Darkness

deseq2 DESeqDataSetFromMatrix duplicate • 3.4k views

ADD COMMENT • link updated 7.1 years ago by James W. MacDonald 68k • written 7.1 years ago by Assa Yeroslaviz ★ 1.5k

score 1 · Accepted Answer · 2018-03-15

1

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 15 hours ago

United States

Here is the error:

Error in `rownames<-`(`*tmp*`, value = colnames(countData))

Do you now see what the problem is? Hint: colnames(countData)

ADD COMMENT • link 7.1 years ago James W. MacDonald 68k

0

Entering edit mode

Do the column names needs to differ?

ADD REPLY • link 7.1 years ago Assa Yeroslaviz ★ 1.5k

0

Entering edit mode

The idea behind the SummarizedExperiment class is similar to ExpressionSet, where there is this complex object that acts like a more simple object in order to streamline analyses and shield the end user from having to know too much about the messy underlying reality of what they are doing.

This is unfortunately more of an ideal than an actuality, and you really do have to know something about SummarizedExperiments if you expect to be able to confidently and expertly use them in an analysis. There is no substitute for knowing what you are doing, and there is no way for you to know what you are doing than by reading all the documentation that comes with e.g., the SummarizedExperiment package. So you should do that, because it's far more efficient than asking questions on this site.

But to answer your question, one of the slots of a SummarizedExperiment is the colData slot, which contains information about the columns of your data. The colData itself is a DataFrame, and like data.frames, you have to have unique row.names. The row.names come from the column names of your count data, and if those are not unique you get the error that you see.

ADD REPLY • link 7.1 years ago James W. MacDonald 68k