allele specific expression in edgeR/DESeq
1
1
Entering edit mode
gthm ▴ 30
@gthm-8377
Last seen 5.6 years ago
spain

This question is an extension to the previously posted question. Allele Specific Expression in DESeq2 or EdgeR

I have a paired data set, treated vs untreated samples from each individual.

ind1_T,ind1_U, ind2_T,ind2_U, ind3_T,ind3_U

I would like to perform differential allele specific analysis of SNPs i.e SNPs that show a significant allelic imbalance between Treated vs Untreated. I would like to know how to model this in DESeq/edgeR ?

SNP ind1_T ind1_U ind2_T ind2_U ind3_T ind3_U
kgp1876518_REF_G 0 0 0 0 0 0
kgp1876518_ALT_A 0 0 0 0 0 0
kgp1876747_REF_A 0 77 0 55 0 55
kgp1876747_ALT_G 0 35 0 30 0 30
kgp1877878_REF_C 77 0 55 10 60 13
kgp1877878_ALT_T 34 0 25 5 28 8
edger deseq2 deseq • 1.5k views
ADD COMMENT
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 8 hours ago
The city by the bay

I would suggest that you arrange your data so that each row contains all the counts for both alleles of each SNP, for all treated/untreated samples from all individuals. This means that you would get 12 counts per gene in total (4 counts per individual). You could then set up your design like:

allele <- factor(rep(c("alt", "ref"), 6), level=c("ref", "alt"))
individual <- factor(rep(1:3, each=4))
treatment <- factor(rep(c("T", "T", "U", "U"), 3))
design <- model.matrix(~0 + individual:treatment + allele:treatment)

This design is a bit complicated, so bear with me. If we look at colnames(design), we get:

[1] "individual1:treatmentT" "individual2:treatmentT" "individual3:treatmentT"
[4] "individual1:treatmentU" "individual2:treatmentU" "individual3:treatmentU"
[7] "treatmentT:allelealt"   "treatmentU:allelealt"  

The first six coefficients represent the blocking factors for each individual/treatment combination, i.e., the baseline log-expression of the reference allele in each individual/treatment combination. The last two coefficients represent the log-fold change of the alternative allele over the reference (i.e., allele-specific expression) in the treated or untreated samples. You can drop either of these two coefficients individually to test for ASE in each treatment type, or you can compare them to each other (e.g., by setting the contrast argument to c(0,0,0,0,0,0,-1,1) in glmLRT) to test for differential ASE between treatment types.

For the sake of completeness, I will also describe a more sophisticated model:

design <- model.matrix(~0 + individual:treatment + individual:allele + allele:treatment)

If we look at the colnames(design), we get:

 [1] "individual1:treatmentT" "individual2:treatmentT" "individual3:treatmentT"
 [4] "individual1:treatmentU" "individual2:treatmentU" "individual3:treatmentU"
 [7] "individual1:allelealt"  "individual2:allelealt"  "individual3:allelealt"
[10] "treatmentU:allelealt"  

The first six coefficients are as previously described; the next three coefficients represent the alternative/reference ASE in the treated sample of each individual; and the last coefficient represents the log-fold change in the ASE in the untreated over treated samples across individuals. In other words, the last coefficient can be dropped to test for differential ASE upon treatment. This model is more sophisticated as it allows for individual-specific ASE that might have inflated the dispersions in the previous model (thus leading to a decrease in power). However, the cost is that this model provides fewer residual d.f., which will also reduce power.

ADD COMMENT
0
Entering edit mode

Hi Aaron, thank you very much. Where can I read more about the designing the model from glm ? I would like to get thorough concepts in creating a design. Here, are we comparing the ratios of ref/alt as a pairwise manner between treated vs untreated ? 

ADD REPLY
0
Entering edit mode

The edgeR user's guide has several good examples about parametrisation, so that's a good place to start. For this particular analysis, you have several pairings; pairing between treated and untreated per individual, and pairing between alternative and reference. The second model uses the pairing between treated and untreated, whereas the first model does not. Both models use the pairing between alternative and reference within each treatment/individual combination.

ADD REPLY

Login before adding your answer.

Traffic: 1049 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6