Hi there,
I am writing because I am lost in the last step after use limma::removeBatchEffect and introduce the new matrix to DESeq2. The reason I used limma::removeBatchEffect is because the design is not full rank and I can't fix my batch in the design. The PCAs from before and after batch effect look correct.
Please, see my code below:
> library(DESeq2)
> library(limma)
> dds <- DESeqDataSetFromMatrix(countData = as.matrix(counts),
> colData = colData,
> design = ~ trt)
>
> dds <- estimateSizeFactors(dds)
> sizeFactors(dds)
>
> vsd <- vst(dds)
> plotPCA(vsd, "trt", "ID") # PCA before
>
> design0 <- model.matrix(~ 0 + colData$trt,
> data=data.frame(assay(vsd)))
>
> assay(vsd) <- limma::removeBatchEffect(assay(vsd),
> batch=as.vector(colData$batch), batch2=as.vector(ID), design =
> design0)
>
> plotPCA(vsd, "trt", "ID") #PCA after
>
> **How can I introduce vsd in DESeq() to obtain the DEGs ???**
> dds <- DESeq(dds)
> resultsNames(dds)
> res <- results(dds, =c("trt","Condition1","Condition2"))
Those are the PCAs
These are my failed attempts:
option1
> dds2 <- assay(vsd)
> dds2 <- DESeq(dds2)
> Error in DESeq(dds2) : is(object, "DESeqDataSet") is not TRUE
Option2
> assay(dds) <- assay(vsd)
> dds <- DESeq(dds)
> using pre-existing size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
-- note: fitType='parametric', but the dispersion trend was not well captured by the
function: y = a/x + b, and a local regression fit was automatically substituted.
specify fitType='local' or 'mean' to avoid this message next time.
NaNs producedError in locfit(logDisps ~ logMeans, data = d[disps >= minDisp * 10, , :
fewer than one row in the data
... many thanks in advance!
> sessionInfo()
> R version 3.6.1 (2019-07-05) Platform:
> x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Catalina
> 10.15.6
>
> Matrix products: default BLAS:
> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
> LAPACK:
> /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
>
> locale: [1]
> en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
>
> attached base packages: [1] parallel stats4 stats graphics
> grDevices utils datasets methods base
>
> other attached packages: [1] DESeq2_1.26.0
> SummarizedExperiment_1.16.1 DelayedArray_0.12.2
> BiocParallel_1.20.1 [5] matrixStats_0.56.0
> Biobase_2.46.0 GenomicRanges_1.38.0
> GenomeInfoDb_1.22.0 [9] IRanges_2.20.2
> S4Vectors_0.24.3 BiocGenerics_0.32.0 edgeR_3.28.1
> [13] limma_3.42.2
>
> loaded via a namespace (and not attached): [1] bit64_0.9-7
> splines_3.6.1 Formula_1.2-3 assertthat_0.2.1
> [5] latticeExtra_0.6-29 blob_1.2.1
> GenomeInfoDbData_1.2.2 pillar_1.4.3 [9] RSQLite_2.2.0
> backports_1.1.5 lattice_0.20-40 glue_1.4.1
> [13] digest_0.6.25 RColorBrewer_1.1-2 XVector_0.26.0
> checkmate_2.0.0 [17] colorspace_1.4-1 htmltools_0.5.0
> Matrix_1.2-18 XML_3.99-0.3 [21] pkgconfig_2.0.3
> genefilter_1.68.0 zlibbioc_1.32.0 purrr_0.3.3
> [25] xtable_1.8-4 scales_1.1.0 jpeg_0.1-8.1
> htmlTable_1.13.3 [29] tibble_3.0.3 annotate_1.64.0
> ggplot2_3.3.0 ellipsis_0.3.0 [33] nnet_7.3-13
> survival_3.1-11 magrittr_1.5 crayon_1.3.4
> [37] memoise_1.1.0 foreign_0.8-76 tools_3.6.1
> data.table_1.12.8 [41] lifecycle_0.2.0 stringr_1.4.0
> munsell_0.5.0 locfit_1.5-9.1 [45] cluster_2.1.0
> AnnotationDbi_1.48.0 compiler_3.6.1 rlang_0.4.7
> [49] grid_3.6.1 RCurl_1.98-1.1 rstudioapi_0.11
> htmlwidgets_1.5.1 [53] bitops_1.0-6 base64enc_0.1-3
> gtable_0.3.0 DBI_1.1.0 [57] R6_2.4.1
> gridExtra_2.3 knitr_1.29 dplyr_0.8.5
> [61] bit_1.1-15.2 Hmisc_4.3-1 stringi_1.4.6
> Rcpp_1.0.5 [65] geneplotter_1.64.0 vctrs_0.2.4
> rpart_4.1-15 acepack_1.4.1 [69] png_0.1-7
> tidyselect_1.0.0 xfun_0.16
DESeq2 expects raw counts, vst is not appropriate since it is already normalized plus variance-stabilized. If you batch design is not full rank then there is little point even including it into either the design or an explicit batch correction. Both rely on the same basic principles and your batch is probably nested with something else. Can you show the colData/design to illustrate how batch is related to the actual experimental groups? In general it is not a good idea to try and force data into a framework when there is an explicit warning after using the standard approaches.
Hi ATpoint,
Thanks very much for your reply. I was aware that vts gave me a transformed matrix that is why I wasn't sure how proceed, so thanks for your advises.
About this project batches, there are eighteen and two samples from different sequencing batches. Also, I called "batch" the genetic background of the patients, but this is a covariante more than a batch. Please see below my ColData/Design.