Hello,
I am interested in finding differentially spliced genes using the diffSplice
function from the limma
package.
For example, in the code below, Gene1 shows the most evidence of differential splicing, with Exon1 being significantly more highly expressed than the other Gene1 exons, when comparing group 2 to group 1. However, for such a differentially spliced gene, I would then like to select a representative pair of exons that capture this splicing difference as much as possible. In this case, Exon1 would be one of the pair, but which other exon would be best as a reference?
If another exon were significantly more lowly expressed, then that would probably be the obvious choice. Instead, one could naïvely look at the exon _t_-statistics and choose the exon with largest negative value, which would be Exon2 for Gene1. For Gene21, which also happens to show some differential splicing, the pair with the highest and lowest _t_-statistic values are Exon1 and Exon3 respectively.
Does this approach make sense or is there perhaps a better way?
Thanks!
Jamie.
library(limma)
## Genes
Gene <- paste("Gene", 1:100, sep="")
Gene <- rep(Gene, each=10)
Exon <- paste("Exon", 1:10, sep="")
Gene.Exon <- paste(Gene, Exon, sep=".")
genes <- data.frame(GeneID=Gene, Gene.Exon=Gene.Exon)
## Design
group <- factor(rep(1:2, each=3))
design0 <- model.matrix(~0 + group)
design <- model.matrix(~group)
## Expression
set.seed(12)
sd <- 0.3*sqrt(4/rchisq(1000,df=4))
set.seed(12)
E <- matrix(rnorm(1000*6,sd=sd),1000,6)
## Increase expression of Exon1 of Gene1 in group 2 only
E[1,4:6] <- E[1,4:6] + 2
## EList
y <- new("EList", list(E = E, genes = genes, design = design))
## lmFit
fit <- lmFit(y,design)
## Splice
ds <- diffSplice(fit, geneid="GeneID")
# Total number of exons: 1000
# Total number of genes: 100
# Number of genes with 1 exon: 0
# Mean number of exons in a gene: 10
# Max number of exons in a gene: 10
topSplice(ds, number = 3)
# GeneID NExons P.Value FDR
# 10 Gene1 10 8.034967e-07 8.034967e-05
# 210 Gene21 10 1.124678e-04 5.623389e-03
# 1000 Gene100 10 2.015570e-04 6.718565e-03
## t-tests for exons
tt <- topSplice(ds, test = "t", n = Inf)
tt <- tt[order(-tt$t), ]
tt.by.gene <- split(tt, tt$GeneID)
## Exon pairs by gene
exon.pairs <- lapply(tt.by.gene, function(x) x[c(1, nrow(x)),])
exon.pairs$Gene1
# GeneID Gene.Exon logFC t P.Value FDR
# 1 Gene1 Gene1.Exon1 3.347710 6.251525 8.927741e-08 8.927741e-05
# 2 Gene1 Gene1.Exon2 -1.414739 -2.641889 1.097982e-02 3.786144e-01
exon.pairs$Gene21
# GeneID Gene.Exon logFC t P.Value FDR
# 201 Gene21 Gene21.Exon1 0.9521223 2.593235 1.244185e-02 0.40134996
# 203 Gene21 Gene21.Exon3 -1.7805751 -4.849639 1.249642e-05 0.00624821
sessionInfo()
# R version 3.6.2 (2019-12-12)
# Platform: x86_64-apple-darwin15.6.0 (64-bit)
# Running under: macOS Mojave 10.14.6
#
# Matrix products: default
# BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
# LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
#
# locale:
# [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
#
# attached base packages:
# [1] stats graphics grDevices utils datasets methods base
#
# other attached packages:
# [1] limma_3.42.0
#
# loaded via a namespace (and not attached):
# [1] compiler_3.6.2
That is fair enough. Sensible is always reassuring to hear! Thanks very much for your input Gordon.
Best regards, Jamie.