Question

edgeR: estimating dispersion for nested designs

0

Entering edit mode

Mauve • 0

@mauve-7320

Last seen 10.2 years ago

Norway

Hi,

I have an RNA-seq experiment that is similar to section 3.5 in the edgeR user’s guide, i.e. a nested paired approach, and I have used this approach to analyze my own data. Briefly, the experiment involves 60 RNA samples, corresponding to two groups of bacterial strains (commensal (C) and disease-causing (D)); each group consisting of 15 different strains; either strain treated (IND) and not treated with a chemical (CTR). My questions are about estimating dispersion in this type of scenario (which is skipped in the user’s guide):

Can I correctly estimate common/trended and tagvise dispersion using estimateGLMCommonDisp /estimateGLMTrendedDisp and estimateGLMTagwiseDisp (relative to a design matrix), even though there are no true biological replicates?
How do I calculate the prior degrees of freedom in this case?

Any help will be greatly appreciated.

edger nested estimatedispersions • 2.2k views

ADD COMMENT • link 10.2 years ago Mauve • 0

1

Entering edit mode

See Section 2.10 of the edgeR User's Guide "What to do if you have no replicates"

ADD REPLY • link 10.2 years ago Gordon Smyth 52k

0

Entering edit mode

So in the section 3.5 example, which option was used?

ADD REPLY • link 10.2 years ago Mauve • 0

0

Entering edit mode

The Section 3.5 example has replicates. There are 18 samples, and the design matrix has only 12 columns, so there are 6 residual df for estimating the dispersion. Hence all the edgeR glm dispersion estimation methods work.

PS. Please be careful to post follow-up questions as comments rather than answers. I have moved our interchange so far to be comments on your original question.

ADD REPLY • link 10.2 years ago Gordon Smyth 52k

0

Entering edit mode

Mauve • 0

@mauve-7320

Last seen 10.2 years ago

Norway

Thank you for clarifying, I was under the impression that real biological replicates were required to estimate dispersion correctly, but I realize now that as I am only interested in group-level differences in expression between CTR and IND the strains can be considered as replicates.

ADD COMMENT • link 10.2 years ago Mauve • 0

score 3 · Accepted Answer · 2015-02-03

I think that you may have misunderstood the example in Section 3.5. You seem to be assuming that it has no replicates, but there are 18 samples, and the design matrix has only 12 columns, so there are 6 residual df for estimating the dispersion. Hence all the edgeR glm dispersion estimation methods work.

You have not entirely explained the purpose of your experiment. Do you want to find genes DE between IND and CTR, treating the different strains as biological replicates? In that case, your experiment is like Section 3.5 and you do have replicates.

If you want to find genes that are DE between IND and CTR for each strain separately, then you don't have replicates, and I don't think you can do any formal statistical analysis. Just compute fold changes.