How to set model formulae in DEXSeq when 2 shRNAs were used to target a same gene
1
0
Entering edit mode
zliu ▴ 10
@zliu-23226
Last seen 4.1 years ago

I am learning to use DEXSeq to analyze RNA-seq data after knocking down a splicing factor. I have 3 samples from the shCTRL, and 6 samples from 2 different shRNAs targeting the same splicing factor: shRNA1, shRNA4 (3 replicates for each shRNA). Two shRNAs targeting the same splicing factor were used because I wanted to eliminate the off-target effect of shRNAs. I am not sure if I should put all the 9 samples together to do the analysis or I should compare shRNA1 with shCTRL and shRNA4 with shCTRL separately. The separate comparisons (shRNA1 vs. shCTRL, shRNA4 vs. shCTRL) ran well. The sample tables are like the following.

sampleAnnotation(dxd1)

DataFrame with 6 rows and 4 columns
sample  condition    libType        sizeFactor
<factor>   <factor>   <factor>         <numeric>
1    CTRL_1    control paired-end  1.14881507491493
2    CTRL_2    control paired-end  1.12577220439625
3    CTRL_3    control paired-end  1.11761713391514
4 Treatment1_1 knockdown1 paired-end 0.839327417001361
5 Treatment1_2 knockdown1 paired-end 0.992697092731002
6 Treatment1_3 knockdown1 paired-end 0.907206458911359

sampleAnnotation(dxd4)

DataFrame with 6 rows and 4 columns
 sample  condition    libType        sizeFactor
 <factor>   <factor>   <factor>         <numeric>
1    CTRL_1    control paired-end  1.04077845798234
2    CTRL_2    control paired-end  1.01594922129081
3    CTRL_3    control paired-end  1.00979043835092
4 Treatment4_1 knockdown4 paired-end 0.930994837364004
5 Treatment4_2 knockdown4 paired-end  1.03814641210704
6 Treatment4_3 knockdown4 paired-end  1.04888449050392

I noticed that the size factors are different for CTRL1, CTRL2, and CTRL_3 in the 2 analyses. Is this right?

Then I put all the 9 samples together and set the sample table like this:

sampleTable2

         condition  shRNA
CTRL_1      control shCTRL
CTRL_2      control shCTRL
CTRL_3      control shCTRL
Treatment1_1 knockdown shRNA1
Treatment1_2 knockdown shRNA1
Treatment1_3 knockdown shRNA1
Treatment4_1 knockdown shRNA4
Treatment4_2 knockdown shRNA4
Treatment4_3 knockdown shRNA4

I ran the following commands

dxd = DEXSeqDataSetFromHTSeq(countfiles = countFiles, sampleData = sampleTable2, design = ~ sample + exon + condition:exon, flattenedfile = flattenedFile)

formulaFullModel = ~ sample + exon + shRNA:exon + condition:exon

formulaReducedModel = ~ sample + exon + shRNA:exon

dxd = estimateSizeFactors(dxd)

dxd = estimateDispersions(dxd, formula = formulaFullModel)

dxd = testForDEU(dxd, reducedModel = formulaReducedModel, fullModel = formulaFullModel)

I had the following error: Error in nbinomLRT(x, reduced = reducedModelMatrix, full = fullModelMatrix) : less than one degree of freedom, perhaps full and reduced models are not in the correct order

Can anyone tell what is wrong with the setting of formulaFullModel and formulaReducedModel. And statistically, which way is better for the analysis: analyze separately and overlap the splice events or put all the nine samples together and analyze them?

Thanks!

sessionInfo()

R version 3.6.3 (2020-02-29)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] DEXSeq_1.32.0               RColorBrewer_1.1-2         
[3] AnnotationDbi_1.48.0        DESeq2_1.26.0              
[5] SummarizedExperiment_1.16.1 DelayedArray_0.12.2        
[7] matrixStats_0.56.0          GenomicRanges_1.38.0       
[9] GenomeInfoDb_1.22.0         IRanges_2.20.2             
[11] S4Vectors_0.24.3            Biobase_2.46.0             
[13] BiocGenerics_0.32.0         BiocParallel_1.20.1        

loaded via a namespace (and not attached):
[1] bitops_1.0-6           bit64_0.9-7            progress_1.2.2        
[4] httr_1.4.1             tools_3.6.3            backports_1.1.5       
[7] R6_2.4.1               rpart_4.1-15           Hmisc_4.3-1              
[10] DBI_1.1.0              colorspace_1.4-1       nnet_7.3-13           
[13] tidyselect_1.0.0       gridExtra_2.3          prettyunits_1.1.1     
[16] curl_4.3               bit_1.1-15.2           compiler_3.6.3        
[19] htmlTable_1.13.3       scales_1.1.0           checkmate_2.0.0       
[22] genefilter_1.68.0      askpass_1.1            rappdirs_0.3.1        
[25] Rsamtools_2.2.3        stringr_1.4.0          digest_0.6.25         
[28] foreign_0.8-76         rmarkdown_2.1          XVector_0.26.0        
[31] base64enc_0.1-3        jpeg_0.1-8.1           pkgconfig_2.0.3       
[34] htmltools_0.4.0        dbplyr_1.4.2           htmlwidgets_1.5.1     
[37] rlang_0.4.5            rstudioapi_0.11        RSQLite_2.2.0         
[40] hwriter_1.3.2          acepack_1.4.1          dplyr_0.8.5           
[43] RCurl_1.98-1.1         magrittr_1.5           GenomeInfoDbData_1.2.2
[46] Formula_1.2-3          Matrix_1.2-18          Rcpp_1.0.4            
[49] munsell_0.5.0          lifecycle_0.2.0        stringi_1.4.6         
[52] yaml_2.2.1             zlibbioc_1.32.0        BiocFileCache_1.10.2  
[55] grid_3.6.3             blob_1.2.1             crayon_1.3.4           
[58] lattice_0.20-40        Biostrings_2.54.0      splines_3.6.3         
[61] annotate_1.64.0        hms_0.5.3              locfit_1.5-9.1        
[64] knitr_1.28             pillar_1.4.3           geneplotter_1.64.0    
[67] biomaRt_2.42.0         XML_3.99-0.3           glue_1.3.2            
[70] evaluate_0.14          latticeExtra_0.6-29    data.table_1.12.8      
[73] png_0.1-7              vctrs_0.2.4            gtable_0.3.0          
[76] openssl_1.4.1          purrr_0.3.3            assertthat_0.2.1      
[79] ggplot2_3.3.0          xfun_0.12              xtable_1.8-4          
[82] survival_3.1-11        tibble_2.1.3           memoise_1.1.0         
[85] cluster_2.1.0          statmod_1.4.34
DEXSeq • 775 views
ADD COMMENT
1
Entering edit mode
Alejandro Reyes ★ 1.9k
@alejandro-reyes-5124
Last seen 4 months ago
Novartis Institutes for BioMedical Reseā€¦

Hi @zliu,

Thank you for your detailed report. You are getting an error because you are getting dependent terms in the model matrix (the matrix is not full rank). For more information, I'd recommend you to check this section of the DESeq2 vignette, which explains this problem in detail: https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#model-matrix-not-full-rank

However, if you run DEXSeq with the following formulae to get those exons that where the different siRNAs have a consistent effect in their inclusion:

formulaFullModel = ~ sample + exon + condition:exon
formulaReducedModel = ~ sample + exon

Both ways of analyzing the data are statisticall valid. The approach of analysing them separately and comparing the results would enable you to evaluate both consistent exon usage differences (by taking the overlap) as well as which exon usage differences are only siRNA-specific (and very likely off target effects). The second approach would only look at the former.

Alejandro

ADD COMMENT
0
Entering edit mode

Thank you very much! Now I understand it better. Also, thank you for this nice tool!

ADD REPLY

Login before adding your answer.

Traffic: 566 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6