Hi, I'm trying to run DEseq2. As a test I'm using RNAseq data from 8 samples. My countdata, coldata, and rowdata objects look (to me) formatted as they should, the dimensions/lengths match, count data is correct, etc. But when I run DESeqDataSetFromMatrix()
I get this error:
> ddsFull <- DESeqDataSetFromMatrix(countData = countdata,
+ colData = coldata, rowData = rowdata, design = ~ type + sex)
Error in seq_len(length(idx) - 1) :
argument must be coercible to non-negative integer
In addition: Warning message:
In DESeqDataSet(se, design = design, ignoreRank) :
58 duplicate rownames were renamed by adding numbers
Here is the (detailed) step-by-step. First I generate my SE object (works without problems):
> ex3 <- summarizeOverlaps(features=grl, reads=bamLst, ignore.strand=T, singleEnd=T) > class(ex3) [1] "RangedSummarizedExperiment" attr(,"package") [1] "SummarizedExperiment"
Then I create the countdata, coldata, rowdata objects (without problems):
> countdata <- assay(ex3) > coldata <- colData(ex3) > rowdata <- rowRanges(ex3) > class(coldata) [1] "DataFrame" attr(,"package") [1] "S4Vectors" > class(rowdata) [1] "GRangesList" attr(,"package") [1] "GenomicRanges" > class(countdata) [1] "matrix" > length(rowdata) [1] 24943 > dim(coldata) [1] 8 6 > dim(countdata) [1] 24943 8 > head(countdata) OM_003 OM_005 OM_014 OM_023 A1BG 259 69 116 69 NAT2 6 11 0 0 ADA 1785 396 964 441 CDH2 119 52 35 45 ... > head(rowdata) GRangesList object of length 6: $A1BG GRanges object with 15 ranges and 2 metadata columns: seqnames ranges strand | exon_id exon_name <Rle> <IRanges> <Rle> | <integer> <character> [1] chr19 [58346806, 58347029] - | 264625 <NA> [2] chr19 [58347353, 58347640] - | 264626 <NA> ... > head(coldata) DataFrame with 6 rows and 6 columns type sex status height weight tech <factor> <factor> <factor> <numeric> <numeric> <factor> OM_003 AA F yes 15.9 36.67 2 OM_005 AA M no 10.5 83.35 1 OM_014 BB F yes 14.3 31.22 7 ...
And then the error:
> ddsFull <- DESeqDataSetFromMatrix(countData = countdata,
+ colData = coldata, rowData = rowdata, design = ~ type + sex)
Error in seq_len(length(idx) - 1) :
argument must be coercible to non-negative integer
In addition: Warning message:
In DESeqDataSet(se, design = design, ignoreRank) :
58 duplicate rownames were renamed by adding numbers
The traceback():
7: eval(expr, envir, enclos) 6: eval(quote(list(...)), env) 5: eval(quote(list(...)), env) 4: standardGeneric("paste") 3: paste(rnms[idx[-1]], c(seq_len(length(idx) - 1)), sep = ".") 2: DESeqDataSet(se, design = design, ignoreRank) 1: DESeqDataSetFromMatrix(countData = countdata, colData = coldata, rowData = rowdata, design = ~type + sex
Any thoughts?? Count data is correct (zeros and positive integers, no "negative" counts), colData is correctly formatted, rowData seems correct as well. I am not sure what paste(rnms[idx[-1]], c(seq_len(length(idx)-1), sep=".")
means, but it seems like maybe that is where the error is generating??
Gene names are not syntactically valid row names. You can use
make.names
to convert row names into syntactically valid names. But then you have mutated gene names for downstream analyses.A syntactically valid name consists of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number.