Question

Error using Conditional Quartile Normalization (CQN)

0

Entering edit mode

garbuzov • 0

@garbuzov-20174

Last seen 22 months ago

United States

Hi Everyone, I would like to use the CQN package to normalize my RNAseq data prior to network analysis. I've used edgeR and Deseq2 for differential expression analysis. I am following the (CQN vignette), and can run the analysis using their toy data set, but I am getting an error when I try to run it on my own.

Here are the components I need for the analysis: 1. gene by sample count table 2. vector of size factors for each sample 3. a matrix containing length and GC-content for each gene.

I use the DGEList function from EdgeR to make my count table (countsDGE) and calculate the normalization factor for the samples.

countsDGE_cqn <- countsDGE$counts
> str(countsDGE_cqn)
 int [1:22053, 1:24] 394 1 63 946 1003 1142 1072 3444 77 13 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:22053] "497097" "19888" "20671" "27395" ...
  ..$ : chr [1:24] "MSC_21d_IP_1" "MSC_21d_IP_2" "MSC_21d_IP_3" "E12_14d_IP_1" ...
> str(countsDGE$samples$norm.factors)
 num [1:24] 0.8 0.704 0.783 0.785 0.778 ...

I use the getGeneLengthAndGCContent function from the EDASeq package to get the length and GC content matrix.

GC_content <- getGeneLengthAndGCContent(rownames(countsDGE_cqn), "mm10", mode="org.db")
str(GC_content)
 num [1:22053, 1:2] 3634 9747 4340 4203 4128 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:22053] "497097" "19888" "20671" "27395" ...
  ..$ : chr [1:2] "length" "gc"

Then I call the CQN function :

cqn.countsDGE <- cqn(counts = countsDGE_cqn, lengths = GC_content[,1],
                  x = GC_content[,2], sizeFactors = countsDGE$samples$norm.factors,
                  verbose = TRUE)

I get the following error, which makes no sense to me:

Error in if (any(lengths <= 0)) stop("argument 'lengths' need to be greater than zero") : missing value where TRUE/FALSE needed

All my arguments have length greater than zero. Any help would be greatly appreciated!

> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.3

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
 [1] splines   parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BSgenome.Mmusculus.UCSC.mm10_1.4.0       BSgenome_1.50.0                         
 [3] rtracklayer_1.42.1                       TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.4
 [5] GenomicFeatures_1.34.3                   org.Mm.eg.db_3.7.0                      
 [7] AnnotationDbi_1.44.0                     EDASeq_2.16.3                           
 [9] ShortRead_1.40.0                         GenomicAlignments_1.18.1                
[11] Rsamtools_1.34.1                         Biostrings_2.50.2                       
[13] XVector_0.22.0                           scales_1.0.0                            
[15] cqn_1.28.1                               quantreg_5.38                           
[17] SparseM_1.77                             preprocessCore_1.44.0                   
[19] nor1mix_1.2-3                            mclust_5.4.2                            
[21] edgeR_3.24.3                             limma_3.38.3                            
[23] DESeq2_1.22.2                            SummarizedExperiment_1.12.0             
[25] DelayedArray_0.8.0                       BiocParallel_1.16.6                     
[27] matrixStats_0.54.0                       Biobase_2.42.0                          
[29] GenomicRanges_1.34.0                     GenomeInfoDb_1.18.2                     
[31] IRanges_2.16.0                           S4Vectors_0.20.1                        
[33] BiocGenerics_0.28.0                      WGCNA_1.66                              
[35] fastcluster_1.1.25                       dynamicTreeCut_1.63-1                   

loaded via a namespace (and not attached):
 [1] colorspace_1.4-0       hwriter_1.3.2          htmlTable_1.13.1       base64enc_0.1-3       
 [5] rstudioapi_0.9.0       MatrixModels_0.4-1     bit64_0.9-7            mvtnorm_1.0-8         
 [9] codetools_0.2-16       R.methodsS3_1.7.1      doParallel_1.0.14      impute_1.56.0         
[13] DESeq_1.34.1           robustbase_0.93-3      geneplotter_1.60.0     knitr_1.21            
[17] Formula_1.2-3          annotate_1.60.0        cluster_2.0.7-1        GO.db_3.7.0           
[21] R.oo_1.22.0            BiocManager_1.30.4     httr_1.4.0             rrcov_1.4-7           
[25] compiler_3.5.2         backports_1.1.3        assertthat_0.2.0       Matrix_1.2-15         
[29] lazyeval_0.2.1         prettyunits_1.0.2      acepack_1.4.1          htmltools_0.3.6       
[33] tools_3.5.2            gtable_0.2.0           glue_1.3.0             GenomeInfoDbData_1.2.0
[37] dplyr_0.8.0.1          Rcpp_1.0.0             iterators_1.0.10       xfun_0.5              
[41] stringr_1.4.0          XML_3.98-1.17          DEoptimR_1.0-8         zlibbioc_1.28.0       
[45] MASS_7.3-51.1          aroma.light_3.12.0     hms_0.4.2              RColorBrewer_1.1-2    
[49] memoise_1.1.0          gridExtra_2.3          ggplot2_3.1.0          biomaRt_2.38.0        
[53] rpart_4.1-13           latticeExtra_0.6-28    stringi_1.3.1          RSQLite_2.1.1         
[57] genefilter_1.64.0      pcaPP_1.9-73           foreach_1.4.4          checkmate_1.9.1       
[61] rlang_0.3.1            pkgconfig_2.0.2        bitops_1.0-6           lattice_0.20-38       
[65] purrr_0.3.0            htmlwidgets_1.3        bit_1.1-14             tidyselect_0.2.5      
[69] robust_0.4-18          plyr_1.8.4             magrittr_1.5           R6_2.4.0              
[73] Hmisc_4.2-0            fit.models_0.5-14      DBI_1.0.0              pillar_1.3.1          
[77] foreign_0.8-71         survival_2.43-3        RCurl_1.95-4.11        nnet_7.3-12           
[81] tibble_2.0.1           crayon_1.3.4           progress_1.2.0         locfit_1.5-9.1        
[85] grid_3.5.2             data.table_1.12.0      blob_1.1.1             digest_0.6.18         
[89] xtable_1.8-3           R.utils_2.8.0          munsell_0.5.0

RNAseq normalization CQN conditionalquartilenormalization debugging • 1.6k views

ADD COMMENT • link updated 6.1 years ago by Kasper Daniel Hansen ★ 6.5k • written 6.1 years ago by garbuzov • 0

score 1 · Answer 1 · 2019-03-12

1

Entering edit mode

Kasper Daniel Hansen ★ 6.5k

@kasper-daniel-hansen-2979

Last seen 22 months ago

United States

What is the output of summary(GC_content[,1])

Best, Kasper

ADD COMMENT • link 6.1 years ago Kasper Daniel Hansen ★ 6.5k

0

Entering edit mode

> summary(GC_content[,1])
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
     28    1286    2517    3221    4292  107966     240

Are NAs a problem here? I didn't of that... Ok, got rid of NAs and it ran. Thank you!

ADD REPLY • link 6.1 years ago garbuzov • 0

0

Entering edit mode

Well, it shows that you do in fact have NA's in the length vector, contrary to what you said. You could remove those 25 genes or figure out what their length is and enter it into the GC_content object.

Best, Kasper

ADD REPLY • link 6.1 years ago Kasper Daniel Hansen ★ 6.5k