Question

EdgeR - blocking for multiple factors at once - Errors

0

Entering edit mode

D ▴ 10

@d-16116

Last seen 4.1 years ago

UK

Hi all,

I'm performing a DE analysis on RNAseq data using EdgeR.

I have a number of samples each with: untreated, treatmentA, treatmentB

My samples are split across two batches. My metadata therefore looks something like this :

Treatment	Patient	Batch
UT	1	1
A	1	1
B	1	1
UT	2	1
A	2	1
B	2	1
UT	3	2
A	3	2
B	3	2
UT	4	2
A	4	2
B	4	2

I am only interested in the differences between treatment A and UT, and B and UT so I want to block for both patient and batch. However, if I make a design matrix like this:

design.mat <- model.matrix(~0+treatment+batch+sample)

groupUT groupA groupB Batch2 Patient1 Patient2 Patient3

1 1 0 0 0 0 0 0

2 0 1 0 0 0 0 0

3 0 0 1 0 0 0 0

4 1 0 0 0 0 0 1

5 0 1 0 0 0 0 1

6 0 0 1 0 0 0 1

7 0 0 1 1 0 1 0

8 0 1 0 1 0 1 0

9 1 0 0 1 0 1 0

10 0 0 1 1 1 0 0

11 0 1 0 1 1 0 0

12 1 0 0 1 1 0 0

Which looks roughly like I was expecting, however when I come to estimateGLMCommonDisp/fit a GLM I get this error message:

"Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset, :
Design matrix not of full rank. The following coefficients not estimable:
Patient2"

Is it possible to block for two factors at once with my experimental design, and if so, how do I format my design matrix?

I think my issue is that I do not have every patient represented in both batches (i.e. I wouldn't be able to batch correct using COMBAT with this design). If that is the case, would blocking just for "patient" be sufficient as the between patient differences would encompass the between batch differences as well?

Many thanks in advance

Dean

EdgeR multiple factor design blocked design • 1.4k views

ADD COMMENT • link updated 6.5 years ago by Aaron Lun ★ 28k • written 6.5 years ago by D ▴ 10

score 3 · Accepted Answer · 2018-11-01

3

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 3 hours ago

The city by the bay

You can't block on both batch and patient, because the latter is nested within the former. But as you've guessed, there is no need to do so; any batch effects will be simply absorbed by the patient terms, so there's no need for a separate batch term. This will not compromise your ability to compare between treatments.

ADD COMMENT • link 6.5 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thanks! Much appreciated.

Dean

ADD REPLY • link 6.5 years ago D ▴ 10