Question

Handling NA's in Deseq2

0

Entering edit mode

Asger Mølgaard • 0

@40ff56d3

Last seen 22 months ago

Denmark

Hi everyone

First of all thank you for making rna-seq data much more accessible to an average clinical doctor through the DEseq2 packages and vignettes. I am though running into some trouble: I have a dataset of Nanostring mRNA-data from clinical study, which later was followed up. I therefore have a tremendous amount of metadata both from the primary trial and follow-up. The problem is though that the datasets contains a rather large amount of NA's. I keep getting an error message due to the NA's when I try to relate my data to metadata variables. My question therefore is: how do I subset vsd and set to exclude NA's in the specific metadata of interest for the research question?

I have already made calculations on the impact of perioperative medication to changes in geneexpression (because these datasets are full), however, now I'm on the rather frustrating part, where data have to be related to clinical impact and not just a description of physiology.

Thank you for your answer in advance!

 vsd@colData@listData$acplacPOD1 <- relevel(vsd@colData@listData$acplacPOD1, ref="Placebo")
 vsd@colData@listData$time <- factor(vsd@colData@listData$time, levels=c(0,1))
 vsd@colData@listData$chronic_pain_intensity_ACTIVITY_FU <- relevel(vsd@colData@listData$chronic_pain_intensity_ACTIVITY_FU, ref="non/slight")

 dds_2f_int <- DESeqDataSetFromMatrix(countData = counts(set[1:579,]), #keep only endogenous genes
                               colData = colData(vsd), #select metadata file
                               design = ~ W_1 + W_2 + W_3 + W_4 + W_5 + time + chronic_pain_intensity_ACTIVITY_FU)

Error message: converting counts to integer mode Error in DESeqDataSet(se, design = design, ignoreRank) : variables in design formula cannot contain NA: chronic_pain_intensity_ACTIVITY_FU

sessionInfo()

R version 4.3.0 (2023-04-21 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale: [1] LC_COLLATE=Danish_Denmark.utf8 LC_CTYPE=Danish_Denmark.utf8 LC_MONETARY=Danish_Denmark.utf8 [4] LC_NUMERIC=C LC_TIME=Danish_Denmark.utf8

time zone: Europe/Copenhagen tzcode source: internal

attached base packages: [1] grid stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] reshape2_1.4.4 lme4_1.1-33 DOSE_3.26.1 ReactomePA_1.44.0
[5] clusterProfiler_4.8.1 EnhancedVolcano_1.18.0 rstatix_0.7.2 PoiClaClu_1.0.2.1
[9] hexbin_1.28.3 vsn_3.68.0 magrittr_2.0.3 org.Hs.eg.db_3.17.0
[13] AnnotationDbi_1.62.1 ComplexHeatmap_2.16.0 DelayedMatrixStats_1.22.0 ggridges_0.5.4
[17] ggnewscale_0.4.8 gridExtra_2.3 ggalt_0.4.0 RColorBrewer_1.1-3
[21] ggvenn_0.1.10 ggpubr_0.6.0 ggrastr_1.0.1 pheatmap_1.0.12
[25] viridis_0.6.3 viridisLite_0.4.2 DelayedArray_0.26.3 S4Arrays_1.0.4
[29] Matrix_1.5-4.1 WriteXLS_6.4.0 PCAtools_2.12.0 ggrepel_0.9.3
[33] ggfortify_0.4.16 MASS_7.3-60 DESeq2_1.40.1 RUVSeq_1.34.0
[37] edgeR_3.42.2 limma_3.56.1 EDASeq_2.34.0 ShortRead_1.58.0
[41] GenomicAlignments_1.36.0 SummarizedExperiment_1.30.1 MatrixGenerics_1.12.0 matrixStats_0.63.0
[45] Rsamtools_2.16.0 GenomicRanges_1.52.0 Biostrings_2.68.1 GenomeInfoDb_1.36.0
[49] XVector_0.40.0 BiocParallel_1.34.2 NanoStringQCPro_1.32.0 Biobase_2.60.0
[53] NanoNormIter_0.1.0 EnvStats_2.7.0 devtools_2.4.5 usethis_2.1.6
[57] xlsx_0.6.5 readxl_1.4.2 lubridate_1.9.2 forcats_1.0.0
[61] stringr_1.5.0 dplyr_1.1.2 purrr_1.0.1 readr_2.1.4
[65] tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.2 tidyverse_2.0.0
[69] IRanges_2.34.0 S4Vectors_0.38.1 BiocGenerics_0.46.0 BiocManager_1.30.20

loaded via [1] R.methodsS3_1.8.2 [5] digest_0.6.31 [9] deldir_1.0-9 [13] withr_2.5.0 [17] memoise_2.0.1 [21] tidytree_0.4.2 [25] KEGGREST_1.40.0 [29] restfulr_0.0.15 [33] generics_0.1.3 [37] zlibbioc_1.46.0 [41] GenomeInfoDbData_1.2.10 [45] BiocFileCache_2.8.0 [49] colorspace_2.1-0 [53] lattice_0.21-8 [57] cowplot_1.1.1 [61] gridBase_0.4-7 [65] minqa_1.2.5 [69] BiocIO_1.10.0 [73] bit_4.0.5 [77] GetoptLong_1.0.5 [81] Rcpp_1.0.10 [85] cellranger_1.1.0 [89] blob_1.2.4 [93] pkgbuild_1.4.0 [97] tzdb_0.4.0 [101] cachem_1.0.8 [105] fastmap_1.1.1 [109] patchwork_1.1.2 [113] scatterpie_0.1.9 [117] rtracklayer_1.60.0 [121] backports_1.4.1 [125] parallel_4.3.0 [129] bit64_4.0.5 [133] R.utils_2.12.2 [137] affy_1.78.0 [141] GO.db_3.17.0 [145] treeio_1.24.0 [149] extrafontdb_1.0 [153] GenomicFeatures_1.52.0 [157] aplot_0.1.10 [161] maps_3.4.1 [165] car_3.1-2 [169] affyio_1.70.0 [173] fgsea_1.26.0 [177] extrafont_0.19 a namespace (and not attached): progress_1.2.2 urlchecker_1.0.1 vctrs_0.6.2
png_0.1-8 shape_1.4.6 registry_0.5-1
httpuv_1.6.11 foreach_1.5.2 qvalue_2.32.0
xfun_0.39 ggfun_0.0.9 ellipsis_0.3.2
ggbeeswarm_0.7.2 gson_0.1.0 profvis_0.3.8
GlobalOptions_0.1.2 R.oo_1.25.0 prettyunits_1.1.1
promises_1.2.0.1 httr_1.4.6 downloader_0.4
ps_1.7.5 rstudioapi_0.14 miniUI_0.1.1.1
reactome.db_1.84.0 processx_3.8.1 curl_5.0.0
ScaledMatrix_1.8.1 ggraph_2.1.0 polyclip_1.10-4
xtable_1.8-4 doParallel_1.0.17 evaluate_0.21
preprocessCore_1.62.1 hms_1.1.3 irlba_2.3.5.1
filelock_1.0.2 later_1.3.1 ggtree_3.8.0
NMF_0.26 shadowtext_0.1.2 XML_3.99-0.14
pillar_1.9.0 nlme_3.1-162 iterators_1.0.14
compiler_4.3.0 beachmat_2.16.0 stringi_1.7.12
plyr_1.8.8 crayon_1.5.2 abind_1.4-5
gridGraphics_0.5-1 locfit_1.5-9.7 graphlayouts_1.0.0
fastmatch_1.1-3 codetools_0.2-19 BiocSingular_1.16.0
mime_0.12 splines_4.3.0 circlize_0.4.15
dbplyr_2.3.2 sparseMatrixStats_1.12.0 HDO.db_0.99.1
Rttf2pt1_1.3.12 interp_1.1-4 knitr_1.43
utf8_1.2.3 clue_0.3-64 fs_1.6.2
ggsignif_0.6.4 ggplotify_0.1.0 callr_3.7.3
tweenr_2.0.2 pkgconfig_2.0.3 tools_4.3.0
RSQLite_2.3.1 DBI_1.1.3 graphite_1.46.0
rmarkdown_2.21 scales_1.2.1 broom_1.0.4
graph_1.78.0 carData_3.0-5 farver_2.1.1
tidygraph_1.2.3 yaml_2.3.7 latticeExtra_0.6-30
cli_3.6.1 lifecycle_1.0.3 sessioninfo_1.2.2
timechange_0.2.0 gtable_0.3.3 rjson_0.2.21
ape_5.7-1 jsonlite_1.8.4 bitops_1.0-7
yulab.utils_0.0.6 GOSemSim_2.26.0 dqrng_0.3.0
lazyeval_0.2.2 shiny_1.7.4 htmltools_0.5.5
proj4_1.0-12 rJava_1.0-6 enrichplot_1.20.0
rappdirs_0.3.3 glue_1.6.2 RCurl_1.98-1.12
jpeg_0.1-10 boot_1.3-28.1 igraph_1.4.3
R6_2.5.1 labeling_0.4.2 xlsxjars_0.6.1
cluster_2.1.4 rngtools_1.5.2 pkgload_1.3.2
nloptr_2.0.3 tidyselect_1.2.0 vipor_0.4.5
ggforce_0.4.1 xml2_1.3.4 ash_1.0-15
rsvd_1.0.5 munsell_0.5.0 KernSmooth_2.23-21
data.table_1.14.8 htmlwidgets_1.6.2 aroma.light_3.30.0
hwriter_1.3.2.1 biomaRt_2.56.0 rlang_1.1.0
remotes_2.4.2 fansi_1.0.4 beeswarm_0.4.0

Dese DESeq2 • 2.0k views

ADD COMMENT • link 22 months ago Asger Mølgaard • 0

score 0 · Answer 1 · 2023-06-02

0

Entering edit mode

ATpoint ★ 4.8k

@atpoint-13662

Last seen 8 hours ago

Germany

DESeq2 objects are SummarizedExperiments and these follow standard R rules.

If you had an object dds with a column group then you would do dds[,!is.na(dds$group)].

By the way, things like vsd@colData@listData$time are unnecessary, it's just vsd$time. In general, use getter and setter functions rather than accessing slots directly.

ADD COMMENT • link 22 months ago ATpoint ★ 4.8k

0

Entering edit mode

Dear ATpoint Thank you for your response - it works like a charm! You just saved a PhD-study! Thank you :)

ADD REPLY • link 22 months ago Asger Mølgaard • 0