EdgeR model formula design matrix
2
0
Entering edit mode
aiko • 0
@aiko-7523
Last seen 4.5 years ago
Canada

I have a problem in model formula for design matrix. The data looks like:

ID Disease Treatment Time
1 disease1 P 0
1 disease1 T 0
1 disease1 P 2
1 disease1 T 2
2 disease2 P 0
2 disease2 T 0
3 disease1 P 0
3 disease1 T 0
3 disease1 P 2
3 disease1 T 2

I would like to have a design matrix that allows me to do contrast in the paired/blocking samples like:

Disease1_P0vsDisease1_T0

Disease1_P0vs_Disease2_P0

How should I set the model formula? Thanks for your help!

edger design matrix paired samples • 1.5k views
ADD COMMENT
0
Entering edit mode

What is ID? Is this the patient identifier? Or some batch number? You need to provide some context to these numbers.

ADD REPLY
0
Entering edit mode

Hi Aaron, thanks for your reply.

ID is sample ID (patient identifier). Do you need additional info?

Thanks

ADD REPLY
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 15 hours ago
The city by the bay

For your first contrast, I would do something like:

Group <- paste0(Disease, Treatment, Time)
design <- model.matrix(~ 0 + factor(ID) + Group)
design <- design[,!grepl("P0", colnames(design))] # get to full rank.

The first three coefficients represent the patient effects and are not interesting. The next three coefficients represent the log-fold changes of P2, T0 and T2 over P0 for disease 1, "averaged" over patients 1 and 3. The last coefficient represents the log-fold change of T0 over P0 for disease 2.

So, the first contrast is fairly straightforward; just drop coefficient 5 (i.e., Groupdisease1T0).

Your second contrast is much harder as a comparison between disease states involves a comparison between patients. This means we can't block on ID in the design matrix, as your patients are nested within diseases; thus any ID blocking factor will absorb the disease effect. The only solution is to take only one sample from each patient (i.e., the P0 sample) and fit a separate model to those samples. Otherwise, the correlations between samples from the same patient will compromise error control in edgeR.

The alternative is to switch to voom-limma and use duplicateCorrelation() instead.

ADD COMMENT
0
Entering edit mode
aiko • 0
@aiko-7523
Last seen 4.5 years ago
Canada

Hi Aaron, thanks for your reply.

ID is sample ID (patient identifier). Do you need additional info?

Thanks 

 

ADD COMMENT
1
Entering edit mode

Don't post an answer, unless you're answering your own question.

ADD REPLY
0
Entering edit mode

Dear Aaron, To compare Disease1P2vsDisease1T2, is correct to do:

Groupdisease1T2-Groupdisease1P2

//Thanks

ADD REPLY

Login before adding your answer.

Traffic: 657 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6