Swish with covariate for batch effect
1
0
Entering edit mode
@thomasdugedebernonville-24352
Last seen 4.1 years ago

Dear Swish/Fishpond team,

I quantified transcripts from a crop plant using Salmon and inferential replicates. Then I wanted to use swish to detect differentially expressed genes across conditions in a pairwise manner. Using multidimensional scaling, I found a substantial batch effect among biological replicates. So I intended to use the “covariate” argument in swish but an error was returned to me;

note: less permutations are available than requested
2 are available
Error in base::rowSums(x, na.rm = na.rm, dims = dims, ...) :
  'x' must be an array of at least two dimensions

my experimental design is the following:

SampleName Strain Batch
sample1 1 A
sample2 2 A
sample3 1 B
sample4 2 B
sample5 1 C
sample6 2 C

For swish, I ran the following:

y <- swish(y, x = "Strain", cov= "Batch")

(the command runs fine when cov is omitted)

So apparently it concerns the number of batches but according to the Fishpond vignette, I understood that it was able to handle “Two groups with two or more batches".

I will appreciate any help!

Thank you in advance, Best regards, Thomas

Session details:

R version 4.0.2 (2020-06-22)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux

Matrix products: default
BLAS/LAPACK: /ibm/gpfs0/home/t.dugedebernonville/envInferentialReplicates/lib/libopenblasp-r0.3.10.so

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8       LC_NAME=C
[9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] fishpond_1.4.0              SummarizedExperiment_1.18.2
[3] DelayedArray_0.14.0         matrixStats_0.57.0
[5] Biobase_2.48.0              GenomicRanges_1.40.0
[7] GenomeInfoDb_1.24.0         IRanges_2.22.1
[9] S4Vectors_0.26.0            BiocGenerics_0.34.0
[11] tximport_1.16.0

loaded via a namespace (and not attached):
[1] Rcpp_1.0.4.6           plyr_1.8.6             compiler_4.0.2
[4] pillar_1.4.6           XVector_0.28.0         bitops_1.0-6
[7] tools_4.0.2            zlibbioc_1.34.0        jsonlite_1.7.1
[10] lifecycle_0.2.0        tibble_3.0.3           gtable_0.3.0
[13] lattice_0.20-41        pkgconfig_2.0.3        rlang_0.4.7
[16] Matrix_1.2-18          GenomeInfoDbData_1.2.3 stringr_1.4.0
[19] dplyr_1.0.2            generics_0.0.2         vctrs_0.3.4
[22] gtools_3.8.2           tidyselect_1.1.0       grid_4.0.2
[25] qvalue_2.20.0          glue_1.4.2             R6_2.4.1
[28] reshape2_1.4.4         purrr_0.3.4            ggplot2_3.3.2
[31] magrittr_1.5           splines_4.0.2          scales_1.1.1
[34] svMisc_1.1.0           ellipsis_0.3.1         abind_1.4-5
[37] colorspace_1.4-1       stringi_1.5.3          RCurl_1.98-1.2
[40] munsell_0.5.0          crayon_1.3.4

swish fishpond • 1.5k views
1
Entering edit mode
@mikelove
Last seen 9 hours ago
United States

Can you check this reproducible example and see if it works on your end?

library(fishpond)
library(SummarizedExperiment)
# make sim data for 30 samples:
y <- makeSimSwishData(n=30)
y$batch <- factor(rep(rep(1:3,each=5),2))
y <- scaleInfReps(y)
y <- labelKeep(y)
y <- swish(y, x="condition", cov="batch")

I don't get an error here with two-group condition and three-group batch covariate.

ADD COMMENT
0
Entering edit mode

Ok thanks Michael, your reproducible example worked nicely.

I think it is related to the number of samples per batch. In my design, I only have 2 sample per batch:

> y$Strain
[1] 11a 14b 11a 14b 11a 14b
Levels: 11a 14b
> y$Batch
[1] A A B B C C
Levels: A B C
> y <- swish(y, x = "Strain", cov= "Batch")
note: less permutations are available than requested
2 are available
Error in base::rowSums(x, na.rm = na.rm, dims = dims, ...) :
  'x' must be an array of at least two dimensions
ADD REPLY
1
Entering edit mode

Oh it looks like I just need better errors reporting here. Swish requires biological replicates (n > 1) within batch for the stratified two-sample test.

If you had more batches, you could run this as a paired analysis perhaps, but 3 won't be sufficient. Looks like Swish won't be an option here. You can consult the Swish paper and choose alternative methods. E.g. for gene-level analysis, we found that a number of tools performed well across a range of sample sizes, with only moderate loss of FDR control for the genes with highest inferential uncertainty.

ADD REPLY
0
Entering edit mode

Ok thanks for the reply!

ADD REPLY
0
Entering edit mode

otherwise, do you think I can still use swish to detect DEG in my case, testing only the strain effect as:

y <- swish(y, x = "Strain")

even if I only have 3 biological replicates per strain? I can use a more traditional approach (such as edgeR) , but I would to take advatage of 20 inferential replicates I ran for each sample.

ADD REPLY
1
Entering edit mode

If you are focused on gene level, I would use edgeR/DESeq2/limma with the design ~batch + strain. I think the uncertainty is mostly important for isoform level analysis.

ADD REPLY

Login before adding your answer.

Traffic: 481 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6