I am learning to use DEXSeq to analyze RNA-seq data after knocking down a splicing factor. I have 3 samples from the shCTRL, and 6 samples from 2 different shRNAs targeting the same splicing factor: shRNA1, shRNA4 (3 replicates for each shRNA). Two shRNAs targeting the same splicing factor were used because I wanted to eliminate the off-target effect of shRNAs. I am not sure if I should put all the 9 samples together to do the analysis or I should compare shRNA1 with shCTRL and shRNA4 with shCTRL separately. The separate comparisons (shRNA1 vs. shCTRL, shRNA4 vs. shCTRL) ran well. The sample tables are like the following.
sampleAnnotation(dxd1)
DataFrame with 6 rows and 4 columns
sample condition libType sizeFactor
<factor> <factor> <factor> <numeric>
1 CTRL_1 control paired-end 1.14881507491493
2 CTRL_2 control paired-end 1.12577220439625
3 CTRL_3 control paired-end 1.11761713391514
4 Treatment1_1 knockdown1 paired-end 0.839327417001361
5 Treatment1_2 knockdown1 paired-end 0.992697092731002
6 Treatment1_3 knockdown1 paired-end 0.907206458911359
sampleAnnotation(dxd4)
DataFrame with 6 rows and 4 columns
sample condition libType sizeFactor
<factor> <factor> <factor> <numeric>
1 CTRL_1 control paired-end 1.04077845798234
2 CTRL_2 control paired-end 1.01594922129081
3 CTRL_3 control paired-end 1.00979043835092
4 Treatment4_1 knockdown4 paired-end 0.930994837364004
5 Treatment4_2 knockdown4 paired-end 1.03814641210704
6 Treatment4_3 knockdown4 paired-end 1.04888449050392
I noticed that the size factors are different for CTRL1, CTRL2, and CTRL_3 in the 2 analyses. Is this right?
Then I put all the 9 samples together and set the sample table like this:
sampleTable2
condition shRNA
CTRL_1 control shCTRL
CTRL_2 control shCTRL
CTRL_3 control shCTRL
Treatment1_1 knockdown shRNA1
Treatment1_2 knockdown shRNA1
Treatment1_3 knockdown shRNA1
Treatment4_1 knockdown shRNA4
Treatment4_2 knockdown shRNA4
Treatment4_3 knockdown shRNA4
I ran the following commands
dxd = DEXSeqDataSetFromHTSeq(countfiles = countFiles, sampleData = sampleTable2, design = ~ sample + exon + condition:exon, flattenedfile = flattenedFile)
formulaFullModel = ~ sample + exon + shRNA:exon + condition:exon
formulaReducedModel = ~ sample + exon + shRNA:exon
dxd = estimateSizeFactors(dxd)
dxd = estimateDispersions(dxd, formula = formulaFullModel)
dxd = testForDEU(dxd, reducedModel = formulaReducedModel, fullModel = formulaFullModel)
I had the following error: Error in nbinomLRT(x, reduced = reducedModelMatrix, full = fullModelMatrix) : less than one degree of freedom, perhaps full and reduced models are not in the correct order
Can anyone tell what is wrong with the setting of formulaFullModel and formulaReducedModel. And statistically, which way is better for the analysis: analyze separately and overlap the splice events or put all the nine samples together and analyze them?
Thanks!
sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] DEXSeq_1.32.0 RColorBrewer_1.1-2
[3] AnnotationDbi_1.48.0 DESeq2_1.26.0
[5] SummarizedExperiment_1.16.1 DelayedArray_0.12.2
[7] matrixStats_0.56.0 GenomicRanges_1.38.0
[9] GenomeInfoDb_1.22.0 IRanges_2.20.2
[11] S4Vectors_0.24.3 Biobase_2.46.0
[13] BiocGenerics_0.32.0 BiocParallel_1.20.1
loaded via a namespace (and not attached):
[1] bitops_1.0-6 bit64_0.9-7 progress_1.2.2
[4] httr_1.4.1 tools_3.6.3 backports_1.1.5
[7] R6_2.4.1 rpart_4.1-15 Hmisc_4.3-1
[10] DBI_1.1.0 colorspace_1.4-1 nnet_7.3-13
[13] tidyselect_1.0.0 gridExtra_2.3 prettyunits_1.1.1
[16] curl_4.3 bit_1.1-15.2 compiler_3.6.3
[19] htmlTable_1.13.3 scales_1.1.0 checkmate_2.0.0
[22] genefilter_1.68.0 askpass_1.1 rappdirs_0.3.1
[25] Rsamtools_2.2.3 stringr_1.4.0 digest_0.6.25
[28] foreign_0.8-76 rmarkdown_2.1 XVector_0.26.0
[31] base64enc_0.1-3 jpeg_0.1-8.1 pkgconfig_2.0.3
[34] htmltools_0.4.0 dbplyr_1.4.2 htmlwidgets_1.5.1
[37] rlang_0.4.5 rstudioapi_0.11 RSQLite_2.2.0
[40] hwriter_1.3.2 acepack_1.4.1 dplyr_0.8.5
[43] RCurl_1.98-1.1 magrittr_1.5 GenomeInfoDbData_1.2.2
[46] Formula_1.2-3 Matrix_1.2-18 Rcpp_1.0.4
[49] munsell_0.5.0 lifecycle_0.2.0 stringi_1.4.6
[52] yaml_2.2.1 zlibbioc_1.32.0 BiocFileCache_1.10.2
[55] grid_3.6.3 blob_1.2.1 crayon_1.3.4
[58] lattice_0.20-40 Biostrings_2.54.0 splines_3.6.3
[61] annotate_1.64.0 hms_0.5.3 locfit_1.5-9.1
[64] knitr_1.28 pillar_1.4.3 geneplotter_1.64.0
[67] biomaRt_2.42.0 XML_3.99-0.3 glue_1.3.2
[70] evaluate_0.14 latticeExtra_0.6-29 data.table_1.12.8
[73] png_0.1-7 vctrs_0.2.4 gtable_0.3.0
[76] openssl_1.4.1 purrr_0.3.3 assertthat_0.2.1
[79] ggplot2_3.3.0 xfun_0.12 xtable_1.8-4
[82] survival_3.1-11 tibble_2.1.3 memoise_1.1.0
[85] cluster_2.1.0 statmod_1.4.34
Thank you very much! Now I understand it better. Also, thank you for this nice tool!