Including multiple covariates in edgeR
2
0
Entering edit mode
@ebmahmoudi-14994
Last seen 4.2 years ago

I want to do gene DE for my cohort (40 cases vs 20 controls) with several covariates (age, sex, subject, RNA integrity number) in edgeR. However, I get the following warning in estimateDisp step (see below). I have read the edgeRNA manual and I assume this is due to fitting a lot of models (or lack of replicates) ? If so, however, I dont know which of suggested choices (2-4 ) is the best alternative? and how to do this (for exmple option 3). Thanks

design <- model.matrix(~ subject + age + sex + RIN + treatment)

Warning message: In estimateDisp.default(y = y$counts, design = design, group = group, : No residual df: setting dispersion to NA

edger R limma • 3.8k views
ADD COMMENT
0
Entering edit mode

If you are using edgeR, please don't also tag DESeq2, it is a waste of developer time to get these emails.

ADD REPLY
0
Entering edit mode

Thanks Michael Love for your warning. (I thought people may have DESeq2 solutions for this)

ADD REPLY
2
Entering edit mode
Yunshun Chen ▴ 890
@yunshun-chen-5451
Last seen 1 day ago
Australia

The error message indicates that there is no degrees of freedom for you design matrix. I guess your subjects have many different ages and you incorporated age into the design matrix as a factor with many levels. You could try using a spline curve with 3-5 knots to represent age, which would save you many degrees of freedom. This approach is described in the limma user's guide Section 9.6.2.

ADD COMMENT
0
Entering edit mode

Thanks Yunshun Chen. I read this part in limma, but not sure if I need to merge this with edgeR finally? does it detect differences between case and control adjusting for age ? And should I include other covariates (sex, RIN, etc) into this model ?

ADD REPLY
1
Entering edit mode
@gordon-smyth
Last seen 13 minutes ago
WEHI, Melbourne, Australia

You have plenty of replicates but you have formed the design matrix incorrectly. It is never appropriate to create a design matrix with more than 60 columns, as you have done here.

I assume from your question that you have 60 patients in all and treatment is a factor with two levels (case and control). You actually need:

design <- model.matrix(~ age + sex + treatment)

You can't include patient in the model when every patient is different! You could add RIN, but you may get perfectly good results without that complication. Only do that if you are an expert.

As Yunshun Chen has pointed out, it is important to ensure that age and RIN are numeric covariates (and not factors). You could try Yunshun's suggestion of a spline trend with age, but I suggest you try the simple model above first. Again, the spline curve is for experts.

ADD COMMENT
0
Entering edit mode

Thanks Gordon Smyth. Gordon Smyth, Yes I have 60 DIFFERENT patients at 2 levels of a treatment. As my RNA was a bit degraded across samples, I thought adding RIN (numeric covar) could help to control for the degradation, BUT what do you mean by not making it COMPLICATED? Then how would I control for the degradation levels without this as covar?

ADD REPLY

Login before adding your answer.

Traffic: 545 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6