EdgeR: Prior degrees of freedom effect on tag wise dispersion
1
1
Entering edit mode
Mr.RB ▴ 20
@mrrb-15633
Last seen 5.6 years ago

 

Background

​I'm currently trying to understand how RNA-seq analysis should be performed. I have read all manuals (+ the corresponding papers), however as I'm mostly concerned with bio-informatics regarding data cleaning and integration I find it hard to understand the statistics behind e.g. edgeR (If there are any recommendation (papers e.g.) to better understand these feel free to recommend some).

Why this question?

​As the prior degrees of freedom is a constant, and thus not automatically determined based on the data I was interested in its effects on the dispersion estimate. I read here that the prior is commonly calculated as: 

50/(#samples−#groups)

When I do this for my data: 50/(4-2) = 25.

As this is substantially higher than the default (=10) I decided to see the effects of the prior df on my data, for which I used the following code:

library(edgeR)

# loading data
groups <- as.factor(c("group1","group1","group2","group2"))
dg.list <- DGEList(counts=counts, group = groups)

# normalization
dg.list <- calcNormFactors(dg.list)

# dispersion estimates
dg.list <- estimateGLMCommonDisp(dg.list)
dg.list <- estimateGLMTrendedDisp(dg.list)
​
# effect of prior in tag wise dispersion estimates
design <- model.matrix(~0+group, data=dg.list$samples)
colnames(design) <- levels(dg.list$samples$group)

par(mfrow = c(3,2))
for (prior in c(10,15,20,25,30,35)) {
  dg.list <- estimateGLMTagwiseDisp(dg.list, design, prior.df = prior)
  fit <- glmFit(dg.list ,design)
  gof(fit, plot = T, main = paste0('prior: ',prior))
}

The produced plot

​The question

As can be seen in this plot the genes in the top right are under the line with the default (=10) but above the line with a prior df of 25. Of course I could just pick the prior df best fitting the line but I assume this would not be the best practice. Thus my question is: is there a way to estimate the 'best' prior df or should I just stick with the defaults?

 

 

R edgeR rnaseq rna-seq • 1.9k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 46 minutes ago
The city by the bay

There is no longer any need to set the prior d.f. manually. estimateDisp will automatically estimate a suitable prior d.f. for you, and has been doing so for several years. We are also recommending the QL framework:

A: edgeR glmLRT with test="F"?

https://f1000research.com/articles/5-1438/v2

... in which the prior d.f. for empirical Bayes shrinkage of the QL dispersions is also automatically estimated. There has also been work performed in making the prior d.f. estimation more robust to outliers, see:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5373812/

ADD COMMENT
0
Entering edit mode

Dear Aaron, thankyou for your reply! I did not know about the estimateDisp wrapper for the common, trend and tagwise dispersion estimates. 

 

ADD REPLY
0
Entering edit mode

estimateDisp() and the associated publications are introduced in the first chapter of the edgeR User's Guide. estimateDisp() is the first piece of code introduced in Section 1.4 "Quick Start".

ADD REPLY
0
Entering edit mode

Do you have any recommendation to papers/websites/videos to understand why you should calculate these dispersions, and how these are used in the GLM fitting? 

ADD REPLY
0
Entering edit mode

Section 1.2 of the edgeR user's guide contains brief summaries of the relevant papers.

ADD REPLY

Login before adding your answer.

Traffic: 799 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6