Error in sva() function: missing values, NaNs produced
1
0
Entering edit mode
thkapell ▴ 10
@tkapell-14647
Last seen 23 months ago
Helmholtz Center Munich, Germany

Hi,

I want to clean my RNA-seq data with the SVA package, but the sva() function gives me error:

*Error in density.default(x, adjust = adj) : 'x' contains missing values
In addition: Warning message:
In pf(fstats, df1 = (df1 - df0), df2 = (n - df1)) : NaNs produced*

I attach the code I used since count table preparation and gene filtering:

*dds.txi <- DESeqDataSetFromTximport(txi = txi.kallisto, colData = sample_info, design = ~ Sequencing.lane+Disease)*

*sum(rowSums(counts(dds.txi))==0)
keep <- rowSums(counts(dds.txi)) >= 10
dds.txi <- dds.txi[keep,]*

*dds<-DESeq(dds.txi)    
dds
design<-design(dds)*

*rld<-rlog(dds, blind=T)                   
rld_df<-as.data.frame(assay(rld))*

*mod<-model.matrix(~ Age+Sequencing.lane+Disease, sample_info)
mod0<-model.matrix(~  Age+Sequencing.lane, sample_info)*

*dat<-counts(dds,normalized=T)
n.sv=num.sv(dat,mod,method='leek')
svseq <- sva(assay(rld), mod, mod0, n.sv = n.sv)*

... and my *SessionInfo()*:

*R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252    
attached base packages:
 [1] parallel  stats4    grid      stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
 [1] org.Hs.eg.db_3.8.2           apeglm_1.6.0                 mixOmics_6.8.4               DOSE_3.10.2                 
 [5] pcaGoPromoter.Mm.mm9_1.20.0  pcaGoPromoter.Hs.hg19_1.20.0 pcaGoPromoter_1.28.0         Biostrings_2.52.0           
 [9] XVector_0.24.0               ellipse_0.4.1                ReactomePA_1.28.0            biomaRt_2.40.4              
[13] geneplotter_1.62.0           lattice_0.20-38              limma_3.40.6                 tximport_1.12.3             
[17] sva_3.32.1                   genefilter_1.66.0            mgcv_1.8-28                  nlme_3.1-140                
[21] GSEABase_1.46.0              graph_1.62.0                 annotate_1.62.0              XML_3.98-1.20               
[25] AnnotationDbi_1.46.1         clusterProfiler_3.12.0       IHW_1.12.0                   DESeq2_1.24.0               
[29] SummarizedExperiment_1.14.1  DelayedArray_0.10.0          BiocParallel_1.18.1          Biobase_2.44.0              
[33] GenomicRanges_1.36.1         GenomeInfoDb_1.20.0          IRanges_2.18.2               S4Vectors_0.22.1            
[37] BiocGenerics_0.30.0          rhdf5_2.28.0                 ggrepel_0.8.1                gtools_3.8.1                
[41] stringi_1.4.3                ggforce_0.3.1                MASS_7.3-51.4                rlang_0.4.0                 
[45] survival_2.44-1.1            gridBase_0.4-7               matrixStats_0.55.0           devtools_2.2.1              
[49] usethis_1.5.1                Rcpp_1.0.2                   RSQLite_2.1.2                data.table_1.12.4           
[53] bit_1.1-14                   magrittr_1.5                 scales_1.0.0                 ggbeeswarm_0.6.0            
[57] VennDiagram_1.6.20           futile.logger_1.4.3          colourlovers_0.3.5           pheatmap_1.0.12             
[61] reshape2_1.4.3               forcats_0.4.0                stringr_1.4.0                dplyr_0.8.3                 
[65] purrr_0.3.2                  readr_1.3.1                  tidyr_1.0.0                  tibble_2.1.3                
[69] ggplot2_3.2.1                tidyverse_1.2.1             
loaded via a namespace (and not attached):
  [1] rappdirs_0.3.1         coda_0.19-3            acepack_1.4.1          bit64_0.9-7            knitr_1.25            
  [6] rpart_4.1-15           RCurl_1.95-4.12        generics_0.0.2         callr_3.3.2            cowplot_1.0.0         
 [11] lambda.r_1.2.4         europepmc_0.3          enrichplot_1.4.0       xml2_1.2.2             lubridate_1.7.4       
 [16] assertthat_0.2.1       viridis_0.5.1          xfun_0.10              hms_0.5.1              progress_1.2.2        
 [21] readxl_1.3.1           igraph_1.2.4.1         DBI_1.0.0              htmlwidgets_1.3        rARPACK_0.11-0        
 [26] ellipsis_0.3.0         RSpectra_0.15-0        backports_1.1.5        vctrs_0.2.0            remotes_2.1.0         
 [31] withr_2.1.2            triebeard_0.3.0        checkmate_1.9.4        fdrtool_1.2.15         prettyunits_1.0.2     
 [36] cluster_2.1.0          lazyeval_0.2.2         crayon_1.3.4           pkgconfig_2.0.3        slam_0.1-45           
 [41] labeling_0.3           tweenr_1.0.1           vipor_0.4.5            pkgload_1.0.2          nnet_7.3-12           
 [46] lifecycle_0.1.0        modelr_0.1.5           cellranger_1.1.0       rprojroot_1.3-2        polyclip_1.10-0       
 [51] Matrix_1.2-17          urltools_1.7.3         lpsymphony_1.12.0      Rhdf5lib_1.6.1         base64enc_0.1-3       
 [56] beeswarm_0.2.3         ggridges_0.5.1         processx_3.4.1         png_0.1-7              viridisLite_0.3.0     
 [61] bitops_1.0-6           blob_1.2.0             qvalue_2.16.0          gridGraphics_0.4-1     reactome.db_1.68.0    
 [66] memoise_1.1.0          graphite_1.30.0        plyr_1.8.4             zlibbioc_1.30.0        compiler_3.6.1        
 [71] bbmle_1.0.20           RColorBrewer_1.1-2     cli_1.1.0              ps_1.3.0               htmlTable_1.13.2      
 [76] formatR_1.7            Formula_1.2-3          tidyselect_0.2.5       emdbook_1.3.11         yaml_2.2.0            
 [81] GOSemSim_2.10.0        locfit_1.5-9.1         latticeExtra_0.6-28    fastmatch_1.1-0        tools_3.6.1           
 [86] rstudioapi_0.10        foreign_0.8-71         gridExtra_2.3          farver_1.1.0           ggraph_2.0.0          
 [91] digest_0.6.21          rvcheck_0.1.5          BiocManager_1.30.4     broom_0.5.2            httr_1.4.1            
 [96] colorspace_1.4-1       rvest_0.3.4            fs_1.3.1               splines_3.6.1          graphlayouts_0.5.0    
[101] ggplotify_0.0.4        sessioninfo_1.1.1      xtable_1.8-4           jsonlite_1.6           futile.options_1.0.1  
[106] tidygraph_1.1.2        corpcor_1.6.9          UpSetR_1.4.0           zeallot_0.1.0          testthat_2.2.1        
[111] R6_2.4.0               Hmisc_4.2-0            pillar_1.4.2           htmltools_0.3.6        glue_1.3.1            
[116] fgsea_1.10.1           pkgbuild_1.0.5         numDeriv_2016.8-1.1    GO.db_3.8.2            desc_1.2.0            
[121] munsell_0.5.0          DO.db_2.9              GenomeInfoDbData_1.2.1 haven_2.1.1            gtable_0.3.0* 

Any help would be greatly appreciated!

Theo

sva batch-correction • 1.7k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 3 days ago
United States

First off, note that sva is intended for continuous data that are sort of normal distributed, which is not true of count data. The svaseq function is intended for count data.

Second, when you type in your question, there is a preview pane right below where you are typing, that shows what your post will look like. Bracketing things with an asterisk makes things italicized, which doesn't help with code. Instead you want to either bracket with a triple back-tick (top left key on a QWERTY keyboard) or highlight and click the code button (with the10101010). If your preview doesn't look like what I fixed for you, above, then you need to adjust until it does.

So try using svaseq and post again if it still doesn't work.

ADD COMMENT
0
Entering edit mode

Thanks James,

I followed your advice and used the svaseq() function instead. This time I had a different error:

dat<-counts(dds,normalized=T)
idx <- rowMeans(dat) > 1
dat <- dat[idx,]

mod<-model.matrix(~ Age+Sequencing.lane+Disease, colData(dds))
mod0<-model.matrix(~  Age+Sequencing.lane, colData(dds))

svseq <- svaseq(dat, mod, mod0)

Error in dstat0[, i] : subscript out of bounds

Would a deeper filtering of low expressed genes help here?

ADD REPLY
0
Entering edit mode

What do you get from

min(ncol(dat), nrow(dat)) - ceiling(sum(hat(mod)))

ADD REPLY
0
Entering edit mode
 min(ncol(dat), nrow(dat)) - ceiling(sum(hat(mod)))

[1]1
ADD REPLY
0
Entering edit mode

Huh. Doesn't seem like you should get that error then. You have one remaining degree of freedom, so you are really close to an over-parameterized model anyway, but it shouldn't error like that? The error is coming from num.sv and seems to indicate that you have a fully parameterized model.

Fake example:

## design matrix that has 1 df
> mod2
  (Intercept) A B C D E F  continuous
1           1 0 0 0 0 0 0 -1.57605014
2           1 0 0 0 0 1 0 -1.07814260
3           1 0 0 1 0 0 1  0.43780250
4           1 1 0 1 0 0 0  0.62172593
5           1 1 0 0 1 1 0 -0.61158475
6           1 1 0 0 1 0 1 -0.20464747
7           1 0 1 0 0 0 0 -0.70204198
8           1 0 1 0 0 1 0 -1.83367875
9           1 0 1 0 0 0 1 -0.07603098

> y <- matrix(rnorm(9000), 100, 9)
> 9 - ceiling(sum(hat(mod2)))
[1] 1
> num.sv(y, mod2)
[1] 0
> mod2[,9] <- runif(9)
> 9 - ceiling(sum(hat(mod2)))
[1] 0
> num.sv(y, mod2)
Error in dstat0[, i] : subscript out of bounds
ADD REPLY
0
Entering edit mode

I see. The only explanation I can think is that the batch that causes the problem is Age. There are a few patients that have the same value. Is it possible to remove this unwanted variation from my model?

ADD REPLY
0
Entering edit mode

Subjects having the same age isn't a problem. Were the samples really run on separate sequencing lanes? These days it seems like the usual MO is to barcode the samples, pool, and then run all the samples on each lane. In which case it's not useful to block on the sequencing lane.

ADD REPLY
0
Entering edit mode

Yes, Sequencing.lane refers to different dates so they were really put on separate flow cells.

ADD REPLY

Login before adding your answer.

Traffic: 453 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6