Entering edit mode
Hello,
I'm trying to use DESeq2 for differential expression analysis of RNA-
seq
data containing Control (CR) and two Treatments (HR and SR). One of
the
biological replicate for HR treatment failed leaving me with 2
replicates
for CR and SR and 1 for HR.
DESeq document described "Working partially without replicates"
however,
that is not in the DESeq2 documentation.
I want to check if I could use the code below to analyze data with no
biological replicate for one of the treatments.
Also, I appreciate if someone could weigh in on using edgeR with
partial
replicates (by estimating dispersion from house-keeping genes) as an
option.
Thank you for your time,
Avinash
Here is the code I used and my sessionInfo().
countsTablePop <- read.delim(filetxt, row.names=1 )
countTable <-as.data.frame(countsTablePop)
colData <- DataFrame(condition = factor(c("CR", "CR", "HR", "SR",
"SR")))
dds <- DESeqDataSetFromMatrix(countData = countTable, colData =
colData,
design = ~ condition)
colData(dds)$condition <- factor(colData(dds)$condition,
levels=c("CR","HR","SR"))
design(dds)
dds <- DESeq(dds)
res <- results(dds)
res <- res[order(res$padj),]
#####
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-redhat-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid splines parallel stats graphics grDevices utils
datasets methods base
other attached packages:
[1] VennDiagram_1.6.5 vsn_3.28.0 gplots_2.12.1
RColorBrewer_1.0-5 DESeq2_1.0.19
[6] RcppArmadillo_0.4.000.2 Rcpp_0.10.6
GenomicRanges_1.13.35
XVector_0.1.0 IRanges_1.19.19
[11] lattice_0.20-23 locfit_1.5-9.1
BiocInstaller_1.10.4
limma_3.17.21 topGO_2.12.0
[16] SparseM_1.03 GO.db_2.9.0 RSQLite_0.11.4
DBI_0.2-7 AnnotationDbi_1.23.18
[21] Biobase_2.21.6 BiocGenerics_0.7.3 graph_1.38.3
plyr_1.8 reshape2_1.2.2
[26] ggplot2_0.9.3.1
loaded via a namespace (and not attached):
[1] affy_1.38.1 affyio_1.28.0 annotate_1.38.0
bitops_1.0-6 caTools_1.16 colorspace_1.2-4
[7] dichromat_2.0-0 digest_0.6.4 gdata_2.13.2
genefilter_1.42.0 gtable_0.1.2 gtools_3.2.1
[13] KernSmooth_2.23-10 labeling_0.2 MASS_7.3-29
munsell_0.4.2 preprocessCore_1.22.0 proto_0.3-10
[19] scales_0.2.3 stats4_3.0.2 stringr_0.6.2
survival_2.37-7 tools_3.0.2 XML_3.98-1.1
[25] xtable_1.7-1 zlibbioc_1.7.0
[[alternative HTML version deleted]]
hi Ming,
We do not recommend using rounded estimated values of read counts with DESeq2 (although I found an email from myself to the list one year ago contradicting this in the case that someone had no access to the raw data). Counts of reads which are proportionally assigned to genes and then rounded can be a bad fit for distributions like the Negative Binomial (and Poisson). For example, this procedure could generate values arising from a distribution that has variance less than the mean. As a rule, and to help avoid erroneous results, users should produce a matrix containing integer counts of reads uniquely aligned to features.Update (12/13/15): After investigation into the RSEM method and performance, I've come around and recommend the option of using rounded estimated gene-level counts from RSEM as input to DESeq2. My concern above was from a misunderstanding of how RSEM works.
Mike