Dear All,
I am having trouble dealing with technical replicates in my RNAseq data analysis using Deseq2. I have generated a SummarizedExperiment object for my dataset but couldn't pass the collapseReplicates() stage. Please see below for more information.
> head(sampleInfo)
sampleName fileName copyNumber population
1 NA06984 NA06984_accepted_hits.bam control CEU
2 NA06985 NA06985_accepted_hits.bam control CEU
3 NA06989 NA06989_accepted_hits.bam control CEU
4 NA07000 NA07000_accepted_hits.bam control CEU
5 NA07051 NA070512_accepted_hits.bam control CEU
6 NA07051 NA07051_accepted_hits.bam control CEU
> (colData(GenesSE)<- DataFrame(sampleInfo))
DataFrame with 486 rows and 4 columns
sampleName fileName copyNumber population
<factor> <factor> <factor> <factor>
1 NA06984 NA06984_accepted_hits.bam control CEU
2 NA06985 NA06985_accepted_hits.bam control CEU
3 NA06989 NA06989_accepted_hits.bam control CEU
4 NA07000 NA07000_accepted_hits.bam control CEU
5 NA07051 NA070512_accepted_hits.bam control CEU
... ... ... ... ...
482 NA20540 NA20540_accepted_hits.bam null TSI
483 NA20589 NA20589_accepted_hits.bam null TSI
484 NA20754 NA207542_accepted_hits.bam null TSI
485 NA20754 NA20754_accepted_hits.bam null TSI
486 NA20778 NA20778_accepted_hits.bam null TSI
> colnames(GenesSE) <- sampleInfo$sampleName
Error in `rownames<-`(`*tmp*`, value = c(180L, 181L, 183L, 185L, 188L, :
duplicate rownames not allowed
## Oblivously the sampleName column contains duplicated names which was to be used for grouping with collapseReplicates() but below is what I got
techreplictes <-as.data.frame(colData( ddsFull )[ ,c("sampleName","fileName","copyNumber","population") ] )
> head(techreplictes)
sampleName fileName copyNumber population
1 NA06984 NA06984_accepted_hits.bam control CEU
2 NA06985 NA06985_accepted_hits.bam control CEU
3 NA06989 NA06989_accepted_hits.bam control CEU
4 NA07000 NA07000_accepted_hits.bam control CEU
5 NA07051 NA070512_accepted_hits.bam control CEU
6 NA07051 NA07051_accepted_hits.bam control CEU
> ddsCollapsed <- collapseReplicates(ddsFull,groupby= ddsFull$sampleName, run= ddsFull$fileName)
Error: sum(assay(object)) == sum(assay(collapsed)) is not TRUE
In addition: Warning messages:
1: In sum(assay(object)) : integer overflow - use sum(as.numeric(.))
2: In sum(assay(collapsed)) : integer overflow - use sum(as.numeric(.))
> traceback()
3: stop(sprintf(ngettext(length(r), "%s is not TRUE", "%s are not all TRUE"),
ch), call. = FALSE, domain = NA)
2: stopifnot(sum(assay(object)) == sum(assay(collapsed)))
1: collapseReplicates(ddsFull, groupby = ddsFull$sampleName, run = ddsFull$fileName)
> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.0.0
[2] GenomicAlignments_1.2.1
[3] DESeq2_1.6.3
[4] RcppArmadillo_0.4.400.0
[5] Rcpp_0.11.4
[6] Rsamtools_1.18.2
[7] Biostrings_2.34.1
[8] XVector_0.6.0
[9] GenomicFeatures_1.18.3
[10] AnnotationDbi_1.28.1
[11] Biobase_2.26.0
[12] GenomicRanges_1.18.4
[13] GenomeInfoDb_1.2.4
[14] IRanges_2.0.1
[15] S4Vectors_0.4.0
[16] BiocGenerics_0.12.1
loaded via a namespace (and not attached):
[1] annotate_1.42.1 BatchJobs_1.5 BBmisc_1.8
[4] BiocParallel_1.0.2 biomaRt_2.22.0 bitops_1.0-6
[7] brew_1.0-6 checkmate_1.5.1 cluster_2.0.1
[10] codetools_0.2-10 colorspace_1.2-4 DBI_0.3.1
[13] digest_0.6.8 fail_1.2 foreach_1.4.2
[16] Formula_1.2-0 genefilter_1.48.1 geneplotter_1.44.0
[19] ggplot2_1.0.0 grid_3.1.1 gtable_0.1.2
[22] Hmisc_3.14-4 iterators_1.0.7 lattice_0.20-29
[25] latticeExtra_0.6-26 locfit_1.5-9.1 MASS_7.3-34
[28] munsell_0.4.2 plyr_1.8.1 proto_0.3-10
[31] RColorBrewer_1.0-5 RCurl_1.95-4.3 reshape2_1.4
[34] RSQLite_1.0.0 rtracklayer_1.26.2 scales_0.2.4
[37] sendmailR_1.1-2 splines_3.1.1 stringr_0.6.2
[40] survival_2.37-7 tools_3.1.1 XML_3.98-1.1
[43] xtable_1.7-3 zlibbioc_1.10.0
>
I would appreciate some suggestions on how to proceed with my analysis. Ultimately, I am interested in differential expression and being able to aggregate reads from technical samples improves the analysis.
Many thanks
Adeolu
Hi Michael, Thank you for your response.
The function works and no more error message when I collapsed the replicates.
Many thanks,
Adeolu
Great, I will fix this bug in the devel branch.