Dear all,
I have a multifactorial experimental design (3 mutants and 2 treatments) am trying to obtain the difference of differences for exon usage. I can obtain p-values but not the fold-change of expression that is attributable to the interaction between mutant and treatment. I hope this makes sense. Here is an equivalent problem with artificial count matrix c_mat (500 genes, 2 exons per gene, 2 replicates per mutant/treatment combination, no significant differences).
library(DEXSeq)
set.seed(1)
c_mat <- matrix(rpois(n = 12000, lambda = rep(rgamma(100,1)*100, 12)), ncol = 12)
sampleData=data.frame(row.names = colnames(c_mat),
mut=rep(c('mutA','mutB','WT'), each=4),
treat=rep(c('trt','ctrl'),each=2))
colnames(c_mat)=paste(sampleData$mut, sampleData$treat, rep(1:2,6), sep = '_')
groupID=rep(paste0('gene_',1:500), each=2)
featureID=rep(c('exon1','exon2'),500)
rownames(c_mat) <- paste0(groupID,featureID)
design <- formula( ~ sample + exon + mut:exon + treat:exon + mut:treat:exon )
redMod <- formula( ~ sample + exon + mut:exon + treat:exon )
dxd <- DEXSeqDataSet( c_mat, sampleData, design,
featureID, groupID )
dxd = estimateSizeFactors( dxd )
dxd = estimateDispersions( dxd )
dxdTEST1 = testForDEU( dxd, fullModel = design, reducedModel = redMod)
dxdTEST1 <- estimateExonFoldChanges( dxdTEST1, fitExpToVar = 'mut' )
dxrTEST1 = DEXSeqResults( dxdTEST1 )
dxrTEST1_df<-as.data.frame( dxrTEST1 )
So that gives a count matrix with the following column names of 3 mutants and 2 treatments, 2 replicates each:
> colnames(c_mat)
[1] "mutA_trt_1" "mutA_trt_2" "mutA_ctrl_1" "mutA_ctrl_2" "mutB_trt_1" "mutB_trt_2"
[7] "mutB_ctrl_1" "mutB_ctrl_2" "WT_trt_1" "WT_trt_2" "WT_ctrl_1" "WT_ctrl_2"
Above code attempts to test a full model containing mut:treat:exon
against a reduced model lacking this term. But I guess I would like to also run estimateExonFoldChanges
with fitExpToVar = 'mut:treat:exon'
, however it obviously only accepts colnames of sampleData
as values for fitExpToVar
so it only extracted log2fold changes of the primary effect of 'mut' in this case:
> head(dxrTEST1_df)
groupID featureID exonBaseMean dispersion stat pvalue padj
gene_1:exon1 gene_1 exon1 14.83689 0.004420707 2.3227484 0.3130557 0.9864358
gene_1:exon2 gene_1 exon2 184.90061 0.003685373 2.3671791 0.3061777 0.9864358
gene_2:exon1 gene_2 exon1 185.42570 0.015539932 0.2054466 0.9023766 0.9864358
gene_2:exon2 gene_2 exon2 79.05609 0.015624023 0.2048190 0.9026598 0.9864358
gene_3:exon1 gene_3 exon1 126.16461 0.001680639 2.3476782 0.3091777 0.9864358
gene_3:exon2 gene_3 exon2 118.44545 0.001682963 2.3471523 0.3092590 0.9864358
mutA mutB WT log2fold_mutB_mutA log2fold_WT_mutA genomicData
gene_1:exon1 6.897828 7.442706 7.656828 0.22690889 0.311865556
gene_1:exon2 22.540960 22.372108 22.309067 -0.02860662 -0.039317188
gene_2:exon1 22.494594 22.285767 22.511180 -0.03543656 0.002806783
gene_2:exon2 15.806041 16.032684 15.788190 0.04786188 -0.003790942
gene_3:exon1 19.402213 19.269096 19.196305 -0.02463073 -0.038144879
gene_3:exon2 18.685657 18.823240 18.899005 0.02595844 0.040201064
countData.1 countData.2 countData.3 countData.4 countData.5 countData.6
gene_1:exon1 13 13 10 16 19 20
gene_1:exon2 183 176 194 196 180 192
gene_2:exon1 186 196 164 201 163 220
gene_2:exon2 81 73 82 77 90 84
gene_3:exon1 140 132 121 126 128 118
gene_3:exon2 124 111 114 124 117 111
countData.7 countData.8 countData.9 countData.10 countData.11 countData.12
gene_1:exon1 9 14 13 18 17 17
gene_1:exon2 194 182 196 162 168 208
gene_2:exon1 166 181 176 210 193 181
gene_2:exon2 81 69 95 64 80 78
gene_3:exon1 121 132 109 127 137 131
gene_3:exon2 118 125 126 123 104 132
Is there some way to extract the estimate of the interaction effect size? FYI running R version 3.4.1, DEXSeq_1.22.0, DESeq2_1.16.1
Thanks a lot in advance,
Stuart
(edited to clarify that I set estimateExonFoldChanges
to report primary effect of mut
here as a placeholder)
Hi Alejandro,
Thanks a lot, that's really helpful.
Just a related question on the results object, as I want to make sure I genuinely understand how the log2fold-change is calculated: if I set gene_1:exon1 to be upregulated ~10-fold specifically in mut=='A' and treat='trt' by inserting some higher counts in these cells:
c_mat[1,1:2]=c(138, 145)
and follow your advice above to get a results object, it gives a log2fold change (mutA, trt vs ctrl) of 3.058975:Do the values given under 'mutA.ctrl' column represent exon usage (normalized to gene expression change) or exon expression? Have they been transformed somehow (VST)? Most importantly I could not understand how the 4 expression values in gene1 mutA (exon1 & 2; trt & ctrl) are combined to arrive at 3.058975, could you shed some light on it? Sorry if this question is somewhat vague, I have looked in the DEXseq docs and paper but I am not sure I am quite grasping how it is done.
Thanks very much
Hi Stuart. The four values would correspond to exon usage values after applying a variance stabilizing transformation. The exon fold changes are estimated from these values but before being variance-stabilized transformed, this is the reason why they can't be estimated from the four exon usage values.
Ah, I see now, thanks for the clarification. Actually I just realised that
mcols(dxrTEST1)$description
gives some of this information, but it's useful to know where and when VST is applied too.