Hi Michael,
I am having some trouble fitting my model in DEseq2. In my design, I am missing one condition (A-5); see below:
stiffness shear
A 1
A 1
A 1
A 2
A 2
A 2
A 2
A 3
A 3
A 4
A 4
B 1
B 1
B 1
B 1
B 2
B 2
B 2
B 3
B 3
B 3
B 4
B 4
B 4
B 5
B 5
B 5
C 1
C 1
C 1
C 2
C 2
C 2
C 2
C 3
C 3
C 3
C 4
C 4
C 4
C 5
C 5
C 5
I tried to solve this by adding all-zero columns to my data_matrix and sample files (adding "fake-samples" to both). Only then I was able to fit the model ~design shear + stiffness + shear:stiffness:
dds <- DESeqDataSetFromMatrix(countData = counts_data,
colData = colData,
design = ~ shear + stiffness + shear:stiffness)
However, then I need to remove these all-0 columns to be able to use the DESeq function. I followed the script in one previous reply [DESeq2 paired and multi factor comparison]
But I can't make it work, and I get an error in the very first step:
#First I create this matrix and I relavel to my reference
mm1 <- model.matrix(~ shear + stiffness + shear:stiffness, colData)
dds$shear <- relevel(dds$shear, ref='1')
#Then I create this object of zeros as you do in your reply script (I think it takes all 0 columns and create a new matrix with them to then substract it?):
idx <- which(colSums(mm1 == 0) == nrow(mm1))
mm1 <- mm1[,-idx] # removing the columns with all 0's
mm0 <- model.matrix(~ shear + stiffness + shear:stiffness, colData) # this matrix is necessary to run nbinomLRT
design(dds) <- ~ shear + stiffness + shear:stiffness
dds <- estimateSizeFactors(dds)
Here I already get: Error in estimateSizeFactorsForMatrix(counts(object), locfunc = locfunc, : every gene contains at least one zero, cannot compute log geometric means
And I see that after applying
mm1 <- mm1[,-idx]
My mm1 that initially was
num [1:46 , 1:15]
it becomes
num [1:46 , 0]
Can you help me understand what am I doing wrong? Thank you.
Laia
> sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)
Matrix products: default
locale:
[1] LC_COLLATE=Catalan_Spain.utf8 LC_CTYPE=Catalan_Spain.utf8
[3] LC_MONETARY=Catalan_Spain.utf8 LC_NUMERIC=C
[5] LC_TIME=Catalan_Spain.utf8
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] BiocParallel_1.30.3 vsn_3.64.0
[3] gplots_3.1.3 RColorBrewer_1.1-3
[5] pheatmap_1.0.12 apeglm_1.18.0
[7] ggrepel_0.9.1 writexl_1.4.0
[9] airway_1.16.0 forcats_0.5.2
[11] stringr_1.4.1 dplyr_1.0.10
[13] purrr_0.3.4 readr_2.1.2
[15] tidyr_1.2.1 tibble_3.1.8
[17] ggplot2_3.3.6 tidyverse_1.3.2
[19] DESeq2_1.36.0 SummarizedExperiment_1.26.1
[21] Biobase_2.56.0 MatrixGenerics_1.8.1
[23] matrixStats_0.62.0 GenomicRanges_1.48.0
[25] GenomeInfoDb_1.32.4 IRanges_2.30.1
[27] S4Vectors_0.34.0 BiocGenerics_0.42.0
loaded via a namespace (and not attached):
[1] readxl_1.4.1 backports_1.4.1 plyr_1.8.7
[4] splines_4.2.1 usethis_2.1.6 digest_0.6.29
[7] htmltools_0.5.3 fansi_1.0.3 magrittr_2.0.3
[10] memoise_2.0.1 googlesheets4_1.0.1 limma_3.52.4
[13] tzdb_0.3.0 remotes_2.4.2 Biostrings_2.64.1
[16] annotate_1.74.0 modelr_0.1.9 bdsmatrix_1.3-6
[19] prettyunits_1.1.1 colorspace_2.0-3 blob_1.2.3
[22] rvest_1.0.3 haven_2.5.1 callr_3.7.2
[25] crayon_1.5.2 RCurl_1.98-1.8 jsonlite_1.8.0
[28] genefilter_1.78.0 survival_3.3-1 glue_1.6.2
[31] gtable_0.3.1 gargle_1.2.1 zlibbioc_1.42.0
[34] XVector_0.36.0 DelayedArray_0.22.0 pkgbuild_1.3.1
[37] scales_1.2.1 mvtnorm_1.1-3 DBI_1.1.3
[40] miniUI_0.1.1.1 Rcpp_1.0.9 xtable_1.8-4
[43] emdbook_1.3.12 bit_4.0.4 preprocessCore_1.58.0
[46] profvis_0.3.7 htmlwidgets_1.5.4 httr_1.4.4
[49] ellipsis_0.3.2 urlchecker_1.0.1 pkgconfig_2.0.3
[52] XML_3.99-0.10 dbplyr_2.2.1 locfit_1.5-9.6
[55] utf8_1.2.2 tidyselect_1.2.0 rlang_1.0.6
[58] later_1.3.0 AnnotationDbi_1.58.0 munsell_0.5.0
[61] cellranger_1.1.0 tools_4.2.1 cachem_1.0.6
[64] cli_3.4.1 generics_0.1.3 RSQLite_2.2.17
[67] devtools_2.4.5 broom_1.0.1 fastmap_1.1.0
[70] processx_3.7.0 bit64_4.0.5 fs_1.5.2
[73] caTools_1.18.2 KEGGREST_1.36.3 mime_0.12
[76] xml2_1.3.3 compiler_4.2.1 rstudioapi_0.14
[79] png_0.1-7 affyio_1.66.0 reprex_2.0.2
[82] geneplotter_1.74.0 stringi_1.7.8 ps_1.7.1
[85] lattice_0.20-45 Matrix_1.4-1 vctrs_0.4.1
[88] pillar_1.8.1 lifecycle_1.0.3 BiocManager_1.30.18
[91] bitops_1.0-7 httpuv_1.6.6 R6_2.5.1
[94] affy_1.74.0 promises_1.2.0.1 KernSmooth_2.23-20
[97] sessioninfo_1.2.2 codetools_0.2-18 MASS_7.3-57
[100] gtools_3.9.3 assertthat_0.2.1 pkgload_1.3.0
[103] withr_2.5.0 GenomeInfoDbData_1.2.8 parallel_4.2.1
[106] hms_1.1.2 grid_4.2.1 coda_0.19-4
[109] googledrive_2.0.0 bbmle_1.0.25 numDeriv_2016.8-1.1
[112] shiny_1.7.2 lubridate_1.8.0
Thank you Michael,
I was trying the strategy with "fake samples" because I thought they would be ignored aftermath, but I stopped that. Yesterday I could manage to give the full matrix (including the incomplete condition) to DEseq2 by just defining it manually, and now LRT is finally working on all my samples. I am using LRT as follows:
Here, I included the batch (Bch) effect in the model (because due to technical issues the RNAseq samples ended up being split into different batches..), and this brought me to a new question:
When I want to see the interaction effect, I should only remove from the equation the interaction (Fl:St) term, is that right? So the batch has to stay in the reduced model, otherwise would be also tested together with the interaction term.
Also, when I want to see the effect of St alone, I should remove only St, because if I would remove both St and Fl:St, then I would be testing for both terms, right?
Thank you once again.
Oh, I just realised of this warning message after running DEseq(LRT) with my manually-entered-factors matrix:
This is because I made all my factor levels numerical.. If I keep them non-numerical, then I get the full rank error:
So I guess that the results from LRT_int may not be correct?
If using the manually entered dataset is not working properly, then I think that another option would be to remove the incomplete condition from my data, run the LRT tests on the new data, and afterwards do Wald contrasts on the excluded condition...
Is there a better idea?
Can we go back to my suggestion where you form
mm0
properly? In the first post,mm0
was justmm1
but without taking care of the 0 columns.mm0
should be a matrix that represents the null (removing variables of interest in testing and leaving only variables that you want to control for).Hi Michael,
Thank you for your message. I tried now to go back to the mm1/mm0 as you advised, and I think it worked :)
I started with the dds without the interaction because I cannot run the function with the missing condition, otherwise:
Here I am following your script, but I don't get why are we doing these steps manually. This is what DEseq would do automatically right?
And here is where LRT is called:
I think this worked this time :) Thank you!
Laia
You don’t need to call those functions manually, just DESeq(…, test=“LRT”, …) should be enough.
I would filter at the top, not bottom of the script.
Thank you Michael!
Laia