when I run DESeq2, I get some trouble with design=~patient+phenotype
> head(coldata)
patient phenotype
ESC00000000001N ESC00000000001 normal
ESC00000000001T ESC00000000001 tumor
ESC00000000002N ESC00000000002 normal
ESC00000000002T ESC00000000002 tumor
ESC00000000003N ESC00000000003 normal
ESC00000000003T ESC00000000003 tumor
> dim(coldata)
[1] 242 2
>dds1=DESeqDataSetFromTximport(txi,colData=coldata,design=~phenotype)
>dds1=dds1[1:500,]
> dds1=DESeq(dds1)
estimating size factors
using 'avgTxLength' from assays(dds), correcting for library size
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
-- replacing outliers and refitting for 12 genes
-- DESeq argument 'minReplicatesForReplace' = 7
-- original counts are preserved in counts(dds)
estimating dispersions
fitting model and testing
This takes about one minute, but when I change the design to design=~patient+phenotype , it stuck!
> dds2=DESeqDataSetFromTximport(txi,colData=coldata,design=~patient+phenotype)
using counts and average transcript lengths from tximport
> dds2=dds2[1:500,]
> dds2=DESeq(dds2)
estimating size factors
using 'avgTxLength' from assays(dds), correcting for library size
estimating dispersions
gene-wise dispersion estimates
What did I do wrong?
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.8 (Final)
Matrix products: default
BLAS: /software/bcbio/anaconda/lib/R/lib/libRblas.so
LAPACK: /software/bcbio/anaconda/lib/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] splines parallel stats4 stats graphics grDevices utils
[8] datasets methods base
other attached packages:
[1] dplyr_0.7.4 ggrepel_0.6.5
[3] ggplot2_2.2.0 CancerSubtypes_1.4.0
[5] NMF_0.20.6 cluster_2.0.6
[7] rngtools_1.2.4 pkgmaker_0.22
[9] registry_0.3 sigclust_1.1.0
[11] cqn_1.24.0 quantreg_5.33
[13] SparseM_1.76 preprocessCore_1.40.0
[15] nor1mix_1.2-3 mclust_5.3
[17] IHW_1.6.0 DESeq2_1.18.1
[19] SummarizedExperiment_1.8.0 DelayedArray_0.4.1
[21] matrixStats_0.52.2 Biobase_2.38.0
[23] GenomicRanges_1.30.0 GenomeInfoDb_1.14.0
[25] IRanges_2.12.0 S4Vectors_0.16.0
[27] BiocGenerics_0.24.0 tximport_1.6.0
loaded via a namespace (and not attached):
[1] bitops_1.0-6 bit64_0.9-5
[3] doParallel_1.0.10 RColorBrewer_1.1-2
[5] tools_3.4.1 backports_1.0.5
[7] R6_2.2.0 rpart_4.1-10
[9] Hmisc_4.0-3 DBI_0.6-1
[11] lazyeval_0.2.0 colorspace_1.3-2
[13] nnet_7.3-12 gridExtra_2.2.1
[15] bit_1.1-12 compiler_3.4.1
[17] fdrtool_1.2.15 htmlTable_1.11.1
[19] labeling_0.3 slam_0.1-40
[21] scales_0.4.1 checkmate_1.8.2
[23] readr_1.1.0 genefilter_1.60.0
[25] stringr_1.2.0 digest_0.6.12
[27] foreign_0.8-67 XVector_0.18.0
[29] pkgconfig_2.0.1 base64enc_0.1-3
[31] htmltools_0.3.6 lpsymphony_1.6.0
[33] limma_3.34.1 htmlwidgets_0.9
[35] rlang_0.1.2 impute_1.52.0
[37] rstudioapi_0.7 RSQLite_2.0
[39] bindr_0.1 BiocParallel_1.12.0
[41] acepack_1.4.1 RCurl_1.95-4.8
[43] magrittr_1.5 GenomeInfoDbData_0.99.1
[45] Formula_1.2-1 heatmap.plus_1.3
[47] Matrix_1.2-7.1 Rcpp_0.12.13
[49] munsell_0.4.3 stringi_1.1.2
[51] zlibbioc_1.24.0 plyr_1.8.4
[53] grid_3.4.1 blob_1.1.0
[55] lattice_0.20-34 annotate_1.56.0
[57] hms_0.3 locfit_1.5-9.1
[59] knitr_1.16 SNFtool_2.2.1
[61] rjson_0.2.15 geneplotter_1.56.0
[63] reshape2_1.4.2 codetools_0.2-15
[65] glue_1.1.1 XML_3.98-1.6
[67] iCluster_2.1.0 latticeExtra_0.6-28
[69] data.table_1.10.4 foreach_1.4.3
[71] MatrixModels_0.4-1 gtable_0.2.0
[73] assertthat_0.1 gridBase_0.4-7
[75] xtable_1.8-2 ConsensusClusterPlus_1.42.0
[77] survival_2.40-1 tibble_1.3.3
[79] iterators_1.0.8 AnnotationDbi_1.40.0
[81] memoise_1.1.0 bindrcpp_0.2
Thanks for any response!
Hi,Michael!
I have 121 tumor vs normal pairs. and I only took genes with TPM >1 in at least 75% samples. After filtering, 28102 genes left. It have been running more than one day. I interrupted it and then did the different expression analysis with "limma-voom". it is really fast!