Hi,
I want to clean my RNA-seq data with the SVA package, but the sva() function gives me error:
*Error in density.default(x, adjust = adj) : 'x' contains missing values
In addition: Warning message:
In pf(fstats, df1 = (df1 - df0), df2 = (n - df1)) : NaNs produced*
I attach the code I used since count table preparation and gene filtering:
*dds.txi <- DESeqDataSetFromTximport(txi = txi.kallisto, colData = sample_info, design = ~ Sequencing.lane+Disease)*
*sum(rowSums(counts(dds.txi))==0)
keep <- rowSums(counts(dds.txi)) >= 10
dds.txi <- dds.txi[keep,]*
*dds<-DESeq(dds.txi)
dds
design<-design(dds)*
*rld<-rlog(dds, blind=T)
rld_df<-as.data.frame(assay(rld))*
*mod<-model.matrix(~ Age+Sequencing.lane+Disease, sample_info)
mod0<-model.matrix(~ Age+Sequencing.lane, sample_info)*
*dat<-counts(dds,normalized=T)
n.sv=num.sv(dat,mod,method='leek')
svseq <- sva(assay(rld), mod, mod0, n.sv = n.sv)*
... and my *SessionInfo()*:
*R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252
attached base packages:
[1] parallel stats4 grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] org.Hs.eg.db_3.8.2 apeglm_1.6.0 mixOmics_6.8.4 DOSE_3.10.2
[5] pcaGoPromoter.Mm.mm9_1.20.0 pcaGoPromoter.Hs.hg19_1.20.0 pcaGoPromoter_1.28.0 Biostrings_2.52.0
[9] XVector_0.24.0 ellipse_0.4.1 ReactomePA_1.28.0 biomaRt_2.40.4
[13] geneplotter_1.62.0 lattice_0.20-38 limma_3.40.6 tximport_1.12.3
[17] sva_3.32.1 genefilter_1.66.0 mgcv_1.8-28 nlme_3.1-140
[21] GSEABase_1.46.0 graph_1.62.0 annotate_1.62.0 XML_3.98-1.20
[25] AnnotationDbi_1.46.1 clusterProfiler_3.12.0 IHW_1.12.0 DESeq2_1.24.0
[29] SummarizedExperiment_1.14.1 DelayedArray_0.10.0 BiocParallel_1.18.1 Biobase_2.44.0
[33] GenomicRanges_1.36.1 GenomeInfoDb_1.20.0 IRanges_2.18.2 S4Vectors_0.22.1
[37] BiocGenerics_0.30.0 rhdf5_2.28.0 ggrepel_0.8.1 gtools_3.8.1
[41] stringi_1.4.3 ggforce_0.3.1 MASS_7.3-51.4 rlang_0.4.0
[45] survival_2.44-1.1 gridBase_0.4-7 matrixStats_0.55.0 devtools_2.2.1
[49] usethis_1.5.1 Rcpp_1.0.2 RSQLite_2.1.2 data.table_1.12.4
[53] bit_1.1-14 magrittr_1.5 scales_1.0.0 ggbeeswarm_0.6.0
[57] VennDiagram_1.6.20 futile.logger_1.4.3 colourlovers_0.3.5 pheatmap_1.0.12
[61] reshape2_1.4.3 forcats_0.4.0 stringr_1.4.0 dplyr_0.8.3
[65] purrr_0.3.2 readr_1.3.1 tidyr_1.0.0 tibble_2.1.3
[69] ggplot2_3.2.1 tidyverse_1.2.1
loaded via a namespace (and not attached):
[1] rappdirs_0.3.1 coda_0.19-3 acepack_1.4.1 bit64_0.9-7 knitr_1.25
[6] rpart_4.1-15 RCurl_1.95-4.12 generics_0.0.2 callr_3.3.2 cowplot_1.0.0
[11] lambda.r_1.2.4 europepmc_0.3 enrichplot_1.4.0 xml2_1.2.2 lubridate_1.7.4
[16] assertthat_0.2.1 viridis_0.5.1 xfun_0.10 hms_0.5.1 progress_1.2.2
[21] readxl_1.3.1 igraph_1.2.4.1 DBI_1.0.0 htmlwidgets_1.3 rARPACK_0.11-0
[26] ellipsis_0.3.0 RSpectra_0.15-0 backports_1.1.5 vctrs_0.2.0 remotes_2.1.0
[31] withr_2.1.2 triebeard_0.3.0 checkmate_1.9.4 fdrtool_1.2.15 prettyunits_1.0.2
[36] cluster_2.1.0 lazyeval_0.2.2 crayon_1.3.4 pkgconfig_2.0.3 slam_0.1-45
[41] labeling_0.3 tweenr_1.0.1 vipor_0.4.5 pkgload_1.0.2 nnet_7.3-12
[46] lifecycle_0.1.0 modelr_0.1.5 cellranger_1.1.0 rprojroot_1.3-2 polyclip_1.10-0
[51] Matrix_1.2-17 urltools_1.7.3 lpsymphony_1.12.0 Rhdf5lib_1.6.1 base64enc_0.1-3
[56] beeswarm_0.2.3 ggridges_0.5.1 processx_3.4.1 png_0.1-7 viridisLite_0.3.0
[61] bitops_1.0-6 blob_1.2.0 qvalue_2.16.0 gridGraphics_0.4-1 reactome.db_1.68.0
[66] memoise_1.1.0 graphite_1.30.0 plyr_1.8.4 zlibbioc_1.30.0 compiler_3.6.1
[71] bbmle_1.0.20 RColorBrewer_1.1-2 cli_1.1.0 ps_1.3.0 htmlTable_1.13.2
[76] formatR_1.7 Formula_1.2-3 tidyselect_0.2.5 emdbook_1.3.11 yaml_2.2.0
[81] GOSemSim_2.10.0 locfit_1.5-9.1 latticeExtra_0.6-28 fastmatch_1.1-0 tools_3.6.1
[86] rstudioapi_0.10 foreign_0.8-71 gridExtra_2.3 farver_1.1.0 ggraph_2.0.0
[91] digest_0.6.21 rvcheck_0.1.5 BiocManager_1.30.4 broom_0.5.2 httr_1.4.1
[96] colorspace_1.4-1 rvest_0.3.4 fs_1.3.1 splines_3.6.1 graphlayouts_0.5.0
[101] ggplotify_0.0.4 sessioninfo_1.1.1 xtable_1.8-4 jsonlite_1.6 futile.options_1.0.1
[106] tidygraph_1.1.2 corpcor_1.6.9 UpSetR_1.4.0 zeallot_0.1.0 testthat_2.2.1
[111] R6_2.4.0 Hmisc_4.2-0 pillar_1.4.2 htmltools_0.3.6 glue_1.3.1
[116] fgsea_1.10.1 pkgbuild_1.0.5 numDeriv_2016.8-1.1 GO.db_3.8.2 desc_1.2.0
[121] munsell_0.5.0 DO.db_2.9 GenomeInfoDbData_1.2.1 haven_2.1.1 gtable_0.3.0*
Any help would be greatly appreciated!
Theo
Thanks James,
I followed your advice and used the svaseq() function instead. This time I had a different error:
Would a deeper filtering of low expressed genes help here?
What do you get from
Huh. Doesn't seem like you should get that error then. You have one remaining degree of freedom, so you are really close to an over-parameterized model anyway, but it shouldn't error like that? The error is coming from
num.sv
and seems to indicate that you have a fully parameterized model.Fake example:
I see. The only explanation I can think is that the batch that causes the problem is Age. There are a few patients that have the same value. Is it possible to remove this unwanted variation from my model?
Subjects having the same age isn't a problem. Were the samples really run on separate sequencing lanes? These days it seems like the usual MO is to barcode the samples, pool, and then run all the samples on each lane. In which case it's not useful to block on the sequencing lane.
Yes, Sequencing.lane refers to different dates so they were really put on separate flow cells.