I was wondering if anybody knows of a package or some method to simulate RNA-seq sequencing reads from samples where we have difference in the variance of gene expression or more importantly, splicing as opposed to the classical difference in mean gene expression or mean splicing values.
I was thinking about getting read counts for transcripts using some of the distributions from the extraDistr package, and feeding these read counts to polyester.
Something like:
library("extraDistr")
library("polyester")
# different read counts simulated somehow
count <- c(10, 50, 100)
# different alpha values depending on uniform random splicing or a dominant transcript
alphas <- c(1, 2, 10)
# simulate read counts
count_random <- rdirmnom(10, count, alphas)
# run polyester with the simulated counts
simulate_experiment_countmat(fasta = NULL, gtf = NULL, seqpath = NULL,
count_random, outdir = ".", paired = TRUE, seed = NULL, ...)
Anybody ever tried something similar? Are there tools for simulating reads with differential variance?
Hey, polyester author here -- that's exactly why we made the function
simulate_experiment_countmat
! It's for generating reads with whatever kind of differential expression you want -- if you can make a matrix of it, you can simulate RNA-seq with that pattern.When polyester was written, there wasn't a specific tool for simulating any kind of differential expression (means or variances), but it's been like six years, so something new may have come along in that time! I will let others chime in if they know of a better approach, but can confirm this is the type of situation we were thinking about when we wrote simulate_experiment_countmat.