Dear Swish/Fishpond team,
I quantified transcripts from a crop plant using Salmon and inferential replicates. Then I wanted to use swish to detect differentially expressed genes across conditions in a pairwise manner. Using multidimensional scaling, I found a substantial batch effect among biological replicates. So I intended to use the “covariate” argument in swish but an error was returned to me;
note: less permutations are available than requested
2 are available
Error in base::rowSums(x, na.rm = na.rm, dims = dims, ...) :
'x' must be an array of at least two dimensions
my experimental design is the following:
SampleName | Strain | Batch |
---|---|---|
sample1 | 1 | A |
sample2 | 2 | A |
sample3 | 1 | B |
sample4 | 2 | B |
sample5 | 1 | C |
sample6 | 2 | C |
For swish, I ran the following:
y <- swish(y, x = "Strain", cov= "Batch")
(the command runs fine when cov is omitted)
So apparently it concerns the number of batches but according to the Fishpond vignette, I understood that it was able to handle “Two groups with two or more batches".
I will appreciate any help!
Thank you in advance, Best regards, Thomas
Session details:
R version 4.0.2 (2020-06-22)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux
Matrix products: default
BLAS/LAPACK: /ibm/gpfs0/home/t.dugedebernonville/envInferentialReplicates/lib/libopenblasp-r0.3.10.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] fishpond_1.4.0 SummarizedExperiment_1.18.2
[3] DelayedArray_0.14.0 matrixStats_0.57.0
[5] Biobase_2.48.0 GenomicRanges_1.40.0
[7] GenomeInfoDb_1.24.0 IRanges_2.22.1
[9] S4Vectors_0.26.0 BiocGenerics_0.34.0
[11] tximport_1.16.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.4.6 plyr_1.8.6 compiler_4.0.2
[4] pillar_1.4.6 XVector_0.28.0 bitops_1.0-6
[7] tools_4.0.2 zlibbioc_1.34.0 jsonlite_1.7.1
[10] lifecycle_0.2.0 tibble_3.0.3 gtable_0.3.0
[13] lattice_0.20-41 pkgconfig_2.0.3 rlang_0.4.7
[16] Matrix_1.2-18 GenomeInfoDbData_1.2.3 stringr_1.4.0
[19] dplyr_1.0.2 generics_0.0.2 vctrs_0.3.4
[22] gtools_3.8.2 tidyselect_1.1.0 grid_4.0.2
[25] qvalue_2.20.0 glue_1.4.2 R6_2.4.1
[28] reshape2_1.4.4 purrr_0.3.4 ggplot2_3.3.2
[31] magrittr_1.5 splines_4.0.2 scales_1.1.1
[34] svMisc_1.1.0 ellipsis_0.3.1 abind_1.4-5
[37] colorspace_1.4-1 stringi_1.5.3 RCurl_1.98-1.2
[40] munsell_0.5.0 crayon_1.3.4
Ok thanks Michael, your reproducible example worked nicely.
I think it is related to the number of samples per batch. In my design, I only have 2 sample per batch:
Oh it looks like I just need better errors reporting here. Swish requires biological replicates (n > 1) within batch for the stratified two-sample test.
If you had more batches, you could run this as a paired analysis perhaps, but 3 won't be sufficient. Looks like Swish won't be an option here. You can consult the Swish paper and choose alternative methods. E.g. for gene-level analysis, we found that a number of tools performed well across a range of sample sizes, with only moderate loss of FDR control for the genes with highest inferential uncertainty.
Ok thanks for the reply!
otherwise, do you think I can still use swish to detect DEG in my case, testing only the strain effect as:
even if I only have 3 biological replicates per strain? I can use a more traditional approach (such as edgeR) , but I would to take advatage of 20 inferential replicates I ran for each sample.
If you are focused on gene level, I would use edgeR/DESeq2/limma with the design
~batch + strain
. I think the uncertainty is mostly important for isoform level analysis.