Handling NA's in Deseq2
1
0
Entering edit mode
@40ff56d3
Last seen 18 months ago
Denmark

Hi everyone

First of all thank you for making rna-seq data much more accessible to an average clinical doctor through the DEseq2 packages and vignettes. I am though running into some trouble: I have a dataset of Nanostring mRNA-data from clinical study, which later was followed up. I therefore have a tremendous amount of metadata both from the primary trial and follow-up. The problem is though that the datasets contains a rather large amount of NA's. I keep getting an error message due to the NA's when I try to relate my data to metadata variables. My question therefore is: how do I subset vsd and set to exclude NA's in the specific metadata of interest for the research question?

I have already made calculations on the impact of perioperative medication to changes in geneexpression (because these datasets are full), however, now I'm on the rather frustrating part, where data have to be related to clinical impact and not just a description of physiology.

Thank you for your answer in advance!

 vsd@colData@listData$acplacPOD1 <- relevel(vsd@colData@listData$acplacPOD1, ref="Placebo")
 vsd@colData@listData$time <- factor(vsd@colData@listData$time, levels=c(0,1))
 vsd@colData@listData$chronic_pain_intensity_ACTIVITY_FU <- relevel(vsd@colData@listData$chronic_pain_intensity_ACTIVITY_FU, ref="non/slight")

 dds_2f_int <- DESeqDataSetFromMatrix(countData = counts(set[1:579,]), #keep only endogenous genes
                               colData = colData(vsd), #select metadata file
                               design = ~ W_1 + W_2 + W_3 + W_4 + W_5 + time + chronic_pain_intensity_ACTIVITY_FU)

Error message: converting counts to integer mode Error in DESeqDataSet(se, design = design, ignoreRank) : variables in design formula cannot contain NA: chronic_pain_intensity_ACTIVITY_FU

sessionInfo()

R version 4.3.0 (2023-04-21 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale: [1] LC_COLLATE=Danish_Denmark.utf8 LC_CTYPE=Danish_Denmark.utf8 LC_MONETARY=Danish_Denmark.utf8 [4] LC_NUMERIC=C LC_TIME=Danish_Denmark.utf8

time zone: Europe/Copenhagen tzcode source: internal

attached base packages: [1] grid stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] reshape2_1.4.4 lme4_1.1-33 DOSE_3.26.1 ReactomePA_1.44.0
[5] clusterProfiler_4.8.1 EnhancedVolcano_1.18.0 rstatix_0.7.2 PoiClaClu_1.0.2.1
[9] hexbin_1.28.3 vsn_3.68.0 magrittr_2.0.3 org.Hs.eg.db_3.17.0
[13] AnnotationDbi_1.62.1 ComplexHeatmap_2.16.0 DelayedMatrixStats_1.22.0 ggridges_0.5.4
[17] ggnewscale_0.4.8 gridExtra_2.3 ggalt_0.4.0 RColorBrewer_1.1-3
[21] ggvenn_0.1.10 ggpubr_0.6.0 ggrastr_1.0.1 pheatmap_1.0.12
[25] viridis_0.6.3 viridisLite_0.4.2 DelayedArray_0.26.3 S4Arrays_1.0.4
[29] Matrix_1.5-4.1 WriteXLS_6.4.0 PCAtools_2.12.0 ggrepel_0.9.3
[33] ggfortify_0.4.16 MASS_7.3-60 DESeq2_1.40.1 RUVSeq_1.34.0
[37] edgeR_3.42.2 limma_3.56.1 EDASeq_2.34.0 ShortRead_1.58.0
[41] GenomicAlignments_1.36.0 SummarizedExperiment_1.30.1 MatrixGenerics_1.12.0 matrixStats_0.63.0
[45] Rsamtools_2.16.0 GenomicRanges_1.52.0 Biostrings_2.68.1 GenomeInfoDb_1.36.0
[49] XVector_0.40.0 BiocParallel_1.34.2 NanoStringQCPro_1.32.0 Biobase_2.60.0
[53] NanoNormIter_0.1.0 EnvStats_2.7.0 devtools_2.4.5 usethis_2.1.6
[57] xlsx_0.6.5 readxl_1.4.2 lubridate_1.9.2 forcats_1.0.0
[61] stringr_1.5.0 dplyr_1.1.2 purrr_1.0.1 readr_2.1.4
[65] tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.2 tidyverse_2.0.0
[69] IRanges_2.34.0 S4Vectors_0.38.1 BiocGenerics_0.46.0 BiocManager_1.30.20

loaded via a namespace (and not attached): [1] R.methodsS3_1.8.2 progress_1.2.2 urlchecker_1.0.1 vctrs_0.6.2
[5] digest_0.6.31 png_0.1-8 shape_1.4.6 registry_0.5-1
[9] deldir_1.0-9 httpuv_1.6.11 foreach_1.5.2 qvalue_2.32.0
[13] withr_2.5.0 xfun_0.39 ggfun_0.0.9 ellipsis_0.3.2
[17] memoise_2.0.1 ggbeeswarm_0.7.2 gson_0.1.0 profvis_0.3.8
[21] tidytree_0.4.2 GlobalOptions_0.1.2 R.oo_1.25.0 prettyunits_1.1.1
[25] KEGGREST_1.40.0 promises_1.2.0.1 httr_1.4.6 downloader_0.4
[29] restfulr_0.0.15 ps_1.7.5 rstudioapi_0.14 miniUI_0.1.1.1
[33] generics_0.1.3 reactome.db_1.84.0 processx_3.8.1 curl_5.0.0
[37] zlibbioc_1.46.0 ScaledMatrix_1.8.1 ggraph_2.1.0 polyclip_1.10-4
[41] GenomeInfoDbData_1.2.10 xtable_1.8-4 doParallel_1.0.17 evaluate_0.21
[45] BiocFileCache_2.8.0 preprocessCore_1.62.1 hms_1.1.3 irlba_2.3.5.1
[49] colorspace_2.1-0 filelock_1.0.2 later_1.3.1 ggtree_3.8.0
[53] lattice_0.21-8 NMF_0.26 shadowtext_0.1.2 XML_3.99-0.14
[57] cowplot_1.1.1 pillar_1.9.0 nlme_3.1-162 iterators_1.0.14
[61] gridBase_0.4-7 compiler_4.3.0 beachmat_2.16.0 stringi_1.7.12
[65] minqa_1.2.5 plyr_1.8.8 crayon_1.5.2 abind_1.4-5
[69] BiocIO_1.10.0 gridGraphics_0.5-1 locfit_1.5-9.7 graphlayouts_1.0.0
[73] bit_4.0.5 fastmatch_1.1-3 codetools_0.2-19 BiocSingular_1.16.0
[77] GetoptLong_1.0.5 mime_0.12 splines_4.3.0 circlize_0.4.15
[81] Rcpp_1.0.10 dbplyr_2.3.2 sparseMatrixStats_1.12.0 HDO.db_0.99.1
[85] cellranger_1.1.0 Rttf2pt1_1.3.12 interp_1.1-4 knitr_1.43
[89] blob_1.2.4 utf8_1.2.3 clue_0.3-64 fs_1.6.2
[93] pkgbuild_1.4.0 ggsignif_0.6.4 ggplotify_0.1.0 callr_3.7.3
[97] tzdb_0.4.0 tweenr_2.0.2 pkgconfig_2.0.3 tools_4.3.0
[101] cachem_1.0.8 RSQLite_2.3.1 DBI_1.1.3 graphite_1.46.0
[105] fastmap_1.1.1 rmarkdown_2.21 scales_1.2.1 broom_1.0.4
[109] patchwork_1.1.2 graph_1.78.0 carData_3.0-5 farver_2.1.1
[113] scatterpie_0.1.9 tidygraph_1.2.3 yaml_2.3.7 latticeExtra_0.6-30
[117] rtracklayer_1.60.0 cli_3.6.1 lifecycle_1.0.3 sessioninfo_1.2.2
[121] backports_1.4.1 timechange_0.2.0 gtable_0.3.3 rjson_0.2.21
[125] parallel_4.3.0 ape_5.7-1 jsonlite_1.8.4 bitops_1.0-7
[129] bit64_4.0.5 yulab.utils_0.0.6 GOSemSim_2.26.0 dqrng_0.3.0
[133] R.utils_2.12.2 lazyeval_0.2.2 shiny_1.7.4 htmltools_0.5.5
[137] affy_1.78.0 proj4_1.0-12 rJava_1.0-6 enrichplot_1.20.0
[141] GO.db_3.17.0 rappdirs_0.3.3 glue_1.6.2 RCurl_1.98-1.12
[145] treeio_1.24.0 jpeg_0.1-10 boot_1.3-28.1 igraph_1.4.3
[149] extrafontdb_1.0 R6_2.5.1 labeling_0.4.2 xlsxjars_0.6.1
[153] GenomicFeatures_1.52.0 cluster_2.1.4 rngtools_1.5.2 pkgload_1.3.2
[157] aplot_0.1.10 nloptr_2.0.3 tidyselect_1.2.0 vipor_0.4.5
[161] maps_3.4.1 ggforce_0.4.1 xml2_1.3.4 ash_1.0-15
[165] car_3.1-2 rsvd_1.0.5 munsell_0.5.0 KernSmooth_2.23-21
[169] affyio_1.70.0 data.table_1.14.8 htmlwidgets_1.6.2 aroma.light_3.30.0
[173] fgsea_1.26.0 hwriter_1.3.2.1 biomaRt_2.56.0 rlang_1.1.0
[177] extrafont_0.19 remotes_2.4.2 fansi_1.0.4 beeswarm_0.4.0

Dese DESeq2 • 1.5k views
ADD COMMENT
0
Entering edit mode
ATpoint ★ 4.6k
@atpoint-13662
Last seen 13 minutes ago
Germany

DESeq2 objects are SummarizedExperiments and these follow standard R rules.

If you had an object dds with a column group then you would do dds[,!is.na(dds$group)].

By the way, things like vsd@colData@listData$time are unnecessary, it's just vsd$time. In general, use getter and setter functions rather than accessing slots directly.

ADD COMMENT
0
Entering edit mode

Dear ATpoint Thank you for your response - it works like a charm! You just saved a PhD-study! Thank you :)

ADD REPLY

Login before adding your answer.

Traffic: 804 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6