I’m using the Splatter to generate single cell simulated data. I need to have a variability in samples which means that expression levels of genes should change across samples. I have 100 samples, 20 genes, 5 cell types and my code to generate single cell data is :
vcf <- mockVCF(n.samples = 100)
gff <- mockGFF(n.genes = 20)
params.group <- newSplatPopParams(batchCells =100,#Number of cells in each batch.
similarity.scale = 8,
eqtl.group.specific = 0.6,
de.prob = rep(0.8,5),#Probability that a gene is differentially expressed in a group. Can be a vector.
de.facLoc = 0.5, #Location (meanlog) parameter for the differential expression factor log-normal distribution. Can be a vector.
de.facScale = 0.5,#Scale(sdlog)parameterforthedifferentialexpressionfactorlog-normaldis- tribution. Can be a vector.
group.prob = c(0.4,0.3,0.1,0.1,0.1))#Probability that a cell comes from a group
sim.sc.gr <- splatPopSimulate(vcf = vcf, gff = gff, params = params.group, sparsify = FALSE)
I have two question: 1-Is there any other way that I can generate 100 samples? 2-Also, I want that gene expression level of genes change between individual samples. for example, gene expression level of Gene1 in celltypeA for sample1 should be different from gene expression level of Gene1 in celltypeA for sample 2 and etc. Is there anyway I can have this property?
Q1: The splatPopSimulate function will simulate scRNA-seq data for every individual in the provided vcf. So by specifying mockVCF(n.samples = 100) and providing that output to splatPopSimulate, you will generate data for 100 samples. Note that the mockVCF function is quite basic, to generate variant data that has more realistic LD/population structure consider using something like HAPGEN2 or sim1000G. Alternatively you can provide splatPop with genotype data from real donors using data from public repositories (e.g., GTEx).
Q2: The code you provided above should be doing exactly this. You can confirm by inspecting the gene means that are simulated for each individual for each gene for each cell-group. For example:
You can also see exactly what celltype specific DE effects are being added by inspecting the rowData:
Thanks for using splatter and let us know if you have more questions.
Thank you for your reply.
For example, If I want to increase the variance of the vector of metadata(sim.sc.gr)$Simulated_Means$Group1[1, 1:100], or metadata(sim.sc.gr)$Simulated_Means$Group1[2, 1:100], or metadata(sim.sc.gr)$Simulated_Means$Group1[3, 1:100], and etc, how should I change the parameters?