Hi,
I am working with a workflow for Cripr/Cas9 screening and try to analyze the data with `DESeq2`.
my count table looks like that:
> head(countdata) CTRL CTRL TREAT TREAT BAX_GAAACATGTCAGCTGCCACT 87 267 511 353 BAX_GAACTCACCCCTGAAGCAAA 340 474 772 1063 BAX_GAAGCGCATCGGGGACGAAC 88 117 365 461 BAX_GACAGGGGCCCTTTTGCTTC 731 690 99 450 BAX_GACCGGGTCCAGGGCCAGCT 374 649 150 230 BAX_GACCTTGAGCACCAGTTTGC 425 634 258 203 ... random_GGGCGGACGCACCGACCAAA 159 155 21 4 random_GGGGAACGGACGCCGAACGG 302 320 156 120 random_GGGGACGCGAGGCACGCGAC 233 134 0 0 random_GGGGACGCGGGCCCGCACAA 306 334 251 549 random_GGGGCGGCAACGAAAACGCG 7 42 0 0 random_GGGGGAACGAAACACGAGCG 296 260 40 39
When I'm trying to read it into a dds object I get the following error:
> dds <- DESeq2::DESeqDataSetFromMatrix(countData = countdata, + colData = coldata, design = ~condition) Error in `rownames<-`(`*tmp*`, value = colnames(countData)) : duplicate rownames not allowed
But when I test the row names for duplications I can't find any.
> anyDuplicated(rownames(countdata)) [1] 0 > table(table(rownames(countdata))) 1 12402
What am I missing here?
How can I find out why this error is occuring?
thanks
Assa
The countdata
object is a data.frame
> str(countdata) 'data.frame': 12402 obs. of 4 variables: $ CTRL : num 87 340 88 731 374 425 753 279 151 249 ... $ CTRL : num 267 474 117 690 649 634 957 375 145 374 ... $ TREAT: num 511 772 365 99 150 ... $ TREAT: num 353 1063 461 450 230 ...
R.version
_
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 3
minor 4.0
year 2017
month 04
day 21
svn rev 72570
language R
version.string R version 3.4.0 (2017-04-21)
nickname You Stupid Darkness
Do the column names needs to differ?
The idea behind the SummarizedExperiment class is similar to ExpressionSet, where there is this complex object that acts like a more simple object in order to streamline analyses and shield the end user from having to know too much about the messy underlying reality of what they are doing.
This is unfortunately more of an ideal than an actuality, and you really do have to know something about SummarizedExperiments if you expect to be able to confidently and expertly use them in an analysis. There is no substitute for knowing what you are doing, and there is no way for you to know what you are doing than by reading all the documentation that comes with e.g., the SummarizedExperiment package. So you should do that, because it's far more efficient than asking questions on this site.
But to answer your question, one of the slots of a SummarizedExperiment is the colData slot, which contains information about the columns of your data. The colData itself is a DataFrame, and like data.frames, you have to have unique row.names. The row.names come from the column names of your count data, and if those are not unique you get the error that you see.