Error using Conditional Quartile Normalization (CQN)
1
0
Entering edit mode
garbuzov • 0
@garbuzov-20174
Last seen 18 months ago
United States

Hi Everyone, I would like to use the CQN package to normalize my RNAseq data prior to network analysis. I've used edgeR and Deseq2 for differential expression analysis. I am following the (CQN vignette), and can run the analysis using their toy data set, but I am getting an error when I try to run it on my own.

Here are the components I need for the analysis: 1. gene by sample count table 2. vector of size factors for each sample 3. a matrix containing length and GC-content for each gene.

I use the DGEList function from EdgeR to make my count table (countsDGE) and calculate the normalization factor for the samples.

countsDGE_cqn <- countsDGE$counts
> str(countsDGE_cqn)
 int [1:22053, 1:24] 394 1 63 946 1003 1142 1072 3444 77 13 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:22053] "497097" "19888" "20671" "27395" ...
  ..$ : chr [1:24] "MSC_21d_IP_1" "MSC_21d_IP_2" "MSC_21d_IP_3" "E12_14d_IP_1" ...
> str(countsDGE$samples$norm.factors)
 num [1:24] 0.8 0.704 0.783 0.785 0.778 ...

I use the getGeneLengthAndGCContent function from the EDASeq package to get the length and GC content matrix.

GC_content <- getGeneLengthAndGCContent(rownames(countsDGE_cqn), "mm10", mode="org.db")
str(GC_content)
 num [1:22053, 1:2] 3634 9747 4340 4203 4128 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:22053] "497097" "19888" "20671" "27395" ...
  ..$ : chr [1:2] "length" "gc"

Then I call the CQN function :

cqn.countsDGE <- cqn(counts = countsDGE_cqn, lengths = GC_content[,1],
                  x = GC_content[,2], sizeFactors = countsDGE$samples$norm.factors,
                  verbose = TRUE)

I get the following error, which makes no sense to me:

Error in if (any(lengths <= 0)) stop("argument 'lengths' need to be greater than zero") : missing value where TRUE/FALSE needed

All my arguments have length greater than zero. Any help would be greatly appreciated!

> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.3

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
 [1] splines   parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BSgenome.Mmusculus.UCSC.mm10_1.4.0       BSgenome_1.50.0                         
 [3] rtracklayer_1.42.1                       TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.4
 [5] GenomicFeatures_1.34.3                   org.Mm.eg.db_3.7.0                      
 [7] AnnotationDbi_1.44.0                     EDASeq_2.16.3                           
 [9] ShortRead_1.40.0                         GenomicAlignments_1.18.1                
[11] Rsamtools_1.34.1                         Biostrings_2.50.2                       
[13] XVector_0.22.0                           scales_1.0.0                            
[15] cqn_1.28.1                               quantreg_5.38                           
[17] SparseM_1.77                             preprocessCore_1.44.0                   
[19] nor1mix_1.2-3                            mclust_5.4.2                            
[21] edgeR_3.24.3                             limma_3.38.3                            
[23] DESeq2_1.22.2                            SummarizedExperiment_1.12.0             
[25] DelayedArray_0.8.0                       BiocParallel_1.16.6                     
[27] matrixStats_0.54.0                       Biobase_2.42.0                          
[29] GenomicRanges_1.34.0                     GenomeInfoDb_1.18.2                     
[31] IRanges_2.16.0                           S4Vectors_0.20.1                        
[33] BiocGenerics_0.28.0                      WGCNA_1.66                              
[35] fastcluster_1.1.25                       dynamicTreeCut_1.63-1                   

loaded via a namespace (and not attached):
 [1] colorspace_1.4-0       hwriter_1.3.2          htmlTable_1.13.1       base64enc_0.1-3       
 [5] rstudioapi_0.9.0       MatrixModels_0.4-1     bit64_0.9-7            mvtnorm_1.0-8         
 [9] codetools_0.2-16       R.methodsS3_1.7.1      doParallel_1.0.14      impute_1.56.0         
[13] DESeq_1.34.1           robustbase_0.93-3      geneplotter_1.60.0     knitr_1.21            
[17] Formula_1.2-3          annotate_1.60.0        cluster_2.0.7-1        GO.db_3.7.0           
[21] R.oo_1.22.0            BiocManager_1.30.4     httr_1.4.0             rrcov_1.4-7           
[25] compiler_3.5.2         backports_1.1.3        assertthat_0.2.0       Matrix_1.2-15         
[29] lazyeval_0.2.1         prettyunits_1.0.2      acepack_1.4.1          htmltools_0.3.6       
[33] tools_3.5.2            gtable_0.2.0           glue_1.3.0             GenomeInfoDbData_1.2.0
[37] dplyr_0.8.0.1          Rcpp_1.0.0             iterators_1.0.10       xfun_0.5              
[41] stringr_1.4.0          XML_3.98-1.17          DEoptimR_1.0-8         zlibbioc_1.28.0       
[45] MASS_7.3-51.1          aroma.light_3.12.0     hms_0.4.2              RColorBrewer_1.1-2    
[49] memoise_1.1.0          gridExtra_2.3          ggplot2_3.1.0          biomaRt_2.38.0        
[53] rpart_4.1-13           latticeExtra_0.6-28    stringi_1.3.1          RSQLite_2.1.1         
[57] genefilter_1.64.0      pcaPP_1.9-73           foreach_1.4.4          checkmate_1.9.1       
[61] rlang_0.3.1            pkgconfig_2.0.2        bitops_1.0-6           lattice_0.20-38       
[65] purrr_0.3.0            htmlwidgets_1.3        bit_1.1-14             tidyselect_0.2.5      
[69] robust_0.4-18          plyr_1.8.4             magrittr_1.5           R6_2.4.0              
[73] Hmisc_4.2-0            fit.models_0.5-14      DBI_1.0.0              pillar_1.3.1          
[77] foreign_0.8-71         survival_2.43-3        RCurl_1.95-4.11        nnet_7.3-12           
[81] tibble_2.0.1           crayon_1.3.4           progress_1.2.0         locfit_1.5-9.1        
[85] grid_3.5.2             data.table_1.12.0      blob_1.1.1             digest_0.6.18         
[89] xtable_1.8-3           R.utils_2.8.0          munsell_0.5.0
RNAseq normalization CQN conditionalquartilenormalization debugging • 1.5k views
ADD COMMENT
1
Entering edit mode
@kasper-daniel-hansen-2979
Last seen 18 months ago
United States

What is the output of summary(GC_content[,1])

Best, Kasper

ADD COMMENT
0
Entering edit mode
> summary(GC_content[,1])
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
     28    1286    2517    3221    4292  107966     240 

Are NAs a problem here? I didn't of that... Ok, got rid of NAs and it ran. Thank you!

ADD REPLY
0
Entering edit mode

Well, it shows that you do in fact have NA's in the length vector, contrary to what you said. You could remove those 25 genes or figure out what their length is and enter it into the GC_content object.

Best, Kasper

ADD REPLY

Login before adding your answer.

Traffic: 543 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6