Hi Everyone, I would like to use the CQN package to normalize my RNAseq data prior to network analysis. I've used edgeR and Deseq2 for differential expression analysis. I am following the (CQN vignette), and can run the analysis using their toy data set, but I am getting an error when I try to run it on my own.
Here are the components I need for the analysis: 1. gene by sample count table 2. vector of size factors for each sample 3. a matrix containing length and GC-content for each gene.
I use the DGEList function from EdgeR to make my count table (countsDGE) and calculate the normalization factor for the samples.
countsDGE_cqn <- countsDGE$counts
> str(countsDGE_cqn)
int [1:22053, 1:24] 394 1 63 946 1003 1142 1072 3444 77 13 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:22053] "497097" "19888" "20671" "27395" ...
..$ : chr [1:24] "MSC_21d_IP_1" "MSC_21d_IP_2" "MSC_21d_IP_3" "E12_14d_IP_1" ...
> str(countsDGE$samples$norm.factors)
num [1:24] 0.8 0.704 0.783 0.785 0.778 ...
I use the getGeneLengthAndGCContent function from the EDASeq package to get the length and GC content matrix.
GC_content <- getGeneLengthAndGCContent(rownames(countsDGE_cqn), "mm10", mode="org.db")
str(GC_content)
num [1:22053, 1:2] 3634 9747 4340 4203 4128 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:22053] "497097" "19888" "20671" "27395" ...
..$ : chr [1:2] "length" "gc"
Then I call the CQN function :
cqn.countsDGE <- cqn(counts = countsDGE_cqn, lengths = GC_content[,1],
x = GC_content[,2], sizeFactors = countsDGE$samples$norm.factors,
verbose = TRUE)
I get the following error, which makes no sense to me:
Error in if (any(lengths <= 0)) stop("argument 'lengths' need to be greater than zero") : missing value where TRUE/FALSE needed
All my arguments have length greater than zero. Any help would be greatly appreciated!
> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.3
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] splines parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] BSgenome.Mmusculus.UCSC.mm10_1.4.0 BSgenome_1.50.0
[3] rtracklayer_1.42.1 TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.4
[5] GenomicFeatures_1.34.3 org.Mm.eg.db_3.7.0
[7] AnnotationDbi_1.44.0 EDASeq_2.16.3
[9] ShortRead_1.40.0 GenomicAlignments_1.18.1
[11] Rsamtools_1.34.1 Biostrings_2.50.2
[13] XVector_0.22.0 scales_1.0.0
[15] cqn_1.28.1 quantreg_5.38
[17] SparseM_1.77 preprocessCore_1.44.0
[19] nor1mix_1.2-3 mclust_5.4.2
[21] edgeR_3.24.3 limma_3.38.3
[23] DESeq2_1.22.2 SummarizedExperiment_1.12.0
[25] DelayedArray_0.8.0 BiocParallel_1.16.6
[27] matrixStats_0.54.0 Biobase_2.42.0
[29] GenomicRanges_1.34.0 GenomeInfoDb_1.18.2
[31] IRanges_2.16.0 S4Vectors_0.20.1
[33] BiocGenerics_0.28.0 WGCNA_1.66
[35] fastcluster_1.1.25 dynamicTreeCut_1.63-1
loaded via a namespace (and not attached):
[1] colorspace_1.4-0 hwriter_1.3.2 htmlTable_1.13.1 base64enc_0.1-3
[5] rstudioapi_0.9.0 MatrixModels_0.4-1 bit64_0.9-7 mvtnorm_1.0-8
[9] codetools_0.2-16 R.methodsS3_1.7.1 doParallel_1.0.14 impute_1.56.0
[13] DESeq_1.34.1 robustbase_0.93-3 geneplotter_1.60.0 knitr_1.21
[17] Formula_1.2-3 annotate_1.60.0 cluster_2.0.7-1 GO.db_3.7.0
[21] R.oo_1.22.0 BiocManager_1.30.4 httr_1.4.0 rrcov_1.4-7
[25] compiler_3.5.2 backports_1.1.3 assertthat_0.2.0 Matrix_1.2-15
[29] lazyeval_0.2.1 prettyunits_1.0.2 acepack_1.4.1 htmltools_0.3.6
[33] tools_3.5.2 gtable_0.2.0 glue_1.3.0 GenomeInfoDbData_1.2.0
[37] dplyr_0.8.0.1 Rcpp_1.0.0 iterators_1.0.10 xfun_0.5
[41] stringr_1.4.0 XML_3.98-1.17 DEoptimR_1.0-8 zlibbioc_1.28.0
[45] MASS_7.3-51.1 aroma.light_3.12.0 hms_0.4.2 RColorBrewer_1.1-2
[49] memoise_1.1.0 gridExtra_2.3 ggplot2_3.1.0 biomaRt_2.38.0
[53] rpart_4.1-13 latticeExtra_0.6-28 stringi_1.3.1 RSQLite_2.1.1
[57] genefilter_1.64.0 pcaPP_1.9-73 foreach_1.4.4 checkmate_1.9.1
[61] rlang_0.3.1 pkgconfig_2.0.2 bitops_1.0-6 lattice_0.20-38
[65] purrr_0.3.0 htmlwidgets_1.3 bit_1.1-14 tidyselect_0.2.5
[69] robust_0.4-18 plyr_1.8.4 magrittr_1.5 R6_2.4.0
[73] Hmisc_4.2-0 fit.models_0.5-14 DBI_1.0.0 pillar_1.3.1
[77] foreign_0.8-71 survival_2.43-3 RCurl_1.95-4.11 nnet_7.3-12
[81] tibble_2.0.1 crayon_1.3.4 progress_1.2.0 locfit_1.5-9.1
[85] grid_3.5.2 data.table_1.12.0 blob_1.1.1 digest_0.6.18
[89] xtable_1.8-3 R.utils_2.8.0 munsell_0.5.0
Are NAs a problem here? I didn't of that... Ok, got rid of NAs and it ran. Thank you!
Well, it shows that you do in fact have NA's in the length vector, contrary to what you said. You could remove those 25 genes or figure out what their length is and enter it into the GC_content object.
Best, Kasper