Question

EdgeR - Model matrix for complex model similar to user guide 3.5 example - but more complex

1

Entering edit mode

D ▴ 10

@d-16116

Last seen 3.7 years ago

UK

Hi all,

I'm trying to analyse some RNAseq data which has been generated using an experimental design not dissimilar to the example in the user guide (section 3.5, p41), but with some differences that are making applying that model difficult.

My study design:

> Pheno_mat
   disease Patient Treatment
1  healthy       1      None
2  healthy       1    Drug_A
3  healthy       1    Drug_B
4  healthy       2      None
5  healthy       2    Drug_A
6  healthy       2    Drug_B
7  disease       3      None
8  disease       3    Drug_A
9  disease       3    Drug_B
10 disease       4      None
11 disease       4    Drug_A
12 disease       4    Drug_B
13 disease       5      None
14 disease       5    Drug_A
15 disease       5    Drug_B

I want to find: The impact of drugA/B on healthy and diseased samples. The difference in the impact of DrugA/B in disease vs healthy.

The reason I can't apply the model near exactly as shown in 3.5 is that I do not have a balanced number of disease/healthy samples, so revising the targets names to 1-3 in both disease groups doesn't work, I still have design matrix that is not full rank:

> Pheno_mat
   disease Patient Treatment Patient_fix
1  healthy       1      None           1
2  healthy       1    Drug_A           1
3  healthy       1    Drug_B           1
4  healthy       2      None           2
5  healthy       2    Drug_A           2
6  healthy       2    Drug_B           2
7  disease       3      None           1
8  disease       3    Drug_A           1
9  disease       3    Drug_B           1
10 disease       4      None           2
11 disease       4    Drug_A           2
12 disease       4    Drug_B           2
13 disease       5      None           3
14 disease       5    Drug_A           3
15 disease       5    Drug_B           3

So, what are my options? 1: I can't follow the example as I have a different number of patients in healthy/disease, so my design matrix isn't full rank 2: Disease state is explained by patient ID, but I can't use patient:disease then handle disease state in the contrasts because I get n=1 in each group and can't estimate dispersions 3: If I simply fit a nested Disease+Drug model I ignore that my patient data are paired...

Any suggestions would be gratefully received!

Thanks,

Dean

edger limma • 1.2k views

ADD COMMENT • link updated 5.4 years ago by James W. MacDonald 67k • written 5.4 years ago by D ▴ 10

0

Entering edit mode

I’ve changed the tags. Please don’t add additional packages to the question because it emails all the developers when you tag a post. If you have an edgeR question it’s appropriate to tag with “edgeR”.

ADD REPLY • link 5.4 years ago Michael Love 43k

0

Entering edit mode

Hi,

I figured fitting appropriate multifactor GLM was common to all packages, rather than specific to edgeR.

Apologies nonetheless.

ADD REPLY • link 5.4 years ago D ▴ 10

0

Entering edit mode

This is true, but asking for the attention of multiple package developers when you are planning to use edgeR is additional burden on our time. We have limited time for support and maintenance of the packages, I actually do it mostly out of work hours because it’s not supported effort.

ADD REPLY • link 5.4 years ago Michael Love 43k

score 3 · Accepted Answer · 2019-07-12

3

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 1 hour ago

United States

The only reason the matrix isn't full rank is because you have coefficients that don't estimate anything.


> phenomat <- data.frame(disease = factor(rep(c("healthy","disease"), c(6,9))), patient = factor(rep(c(1:2,1:3), each = 3)), treatment = factor(rep(c("Drug_A","Drug_B","None"), 5)))
> design <- model.matrix(~disease + disease:patient + disease:treatment, phenomat)
> which(colSums(design) == 0L)
diseasehealthy:patient3 
                      6 

## Note that there isn't a healthy patient 3!

> design <- design[,-6]
> library(limma)
> is.fullrank(design)
[1] TRUE

ADD COMMENT • link 5.4 years ago James W. MacDonald 67k

2

Entering edit mode

To follow up a bit, I would favor the much simpler formulation:

# Note the different 'patient' compared to James' post.
phenomat2 <- data.frame(
    disease = factor(rep(c("healthy","disease"), c(6,9))), 
    patient = factor(rep(c(1:5), each = 3)), 
    treatment = factor(rep(c("Drug_A","Drug_B","None"), 5)))

design2 <- model.matrix(~0 + patient + disease:treatment, phenomat2)
design2 <- design2[,!grepl("None", colnames(design2))]

The first 5 coefficients represent the log-expression of each untreated patient. The last 4 coefficients represent the log-fold change upon treatment with each drug in each disease condition. You can then compare them to zero to test the actual effect of the drug, or compare them to each other to test for differences in effects between disease conditions or between drugs.

In terms of the actual linear model, this is equivalent to James' approach; it's just that the columns have been shuffled around and added to each other to go from design to design2, such that the interpretation of the coefficients changes. I find the coefficients in design2 to be easier to interpret, though design is probably more of the classical way of doing things.

ADD REPLY • link 5.4 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thanks both, very helpful!

Kind regards,

Dean

ADD REPLY • link 5.4 years ago D ▴ 10