Paired RNA-seq experiment with 3 factors using duplicateCorrelation
1
0
Entering edit mode
aec ▴ 90
@aec-9409
Last seen 4.6 years ago

Dear all,

I have a complex design for limma-voom with paired samples and 3 factors:

ID
INDIVIDUAL
TREATMENT
GROUP
RESPONSE
1
1
Pre
A
Good
2
1
Post
A
Good
3
2
Pre
A
Good
4
2
Post
A
Good
5
3
Pre
A
Good
6
3
Post
A
Good
7
4
Pre
B
Good
8
4
Post
B
Good
9
5
Pre
B
Bad
10
5
Post
B
Bad
11
6
Pre
A
Bad
12
6
Post
A
Bad
13
7
Pre
A
Good
14
7
Post
A
Good
15
8
Pre
A
Bad
16
8
Post
A
Bad
17
9
Pre
B
Good
18
9
Post
B
Good
19
10
Pre
A
Good
20
10
Post
A
Good

I need both within and between patients comparisons so I am using the limma 'duplicateCorrelation' function.

The problem is that I am a bit confused to extract coefficients for these multiple comparisons:

1.a) Pre : Group A vs Group B

1.b) Post:  Group A vs Group B

2.a) Group A : Pre vs Post

2.b) Group B: Pre vs Post

3.a) Pre, Group A : Good vs Bad

3.b) Pre, Group B : Good vs Bad

4.a) Post, Group A: Good vs Bad

4.b) Post, Group B: Good vs Bad

5.a) Good, Group A: Pre vs Post

5.b) Good, Group B: Pre vs Post

6.a) Bad, Group A: Pre vs Post

6.b) Bad, Group B: Pre vs Post

Here the code:

TREATMENT=as.factor(info$TREATMENT)

GROUP=as.factor(info$GROUP)

RESPONSE=as.factor(info$RESPONSE)

INDIVIDUAL=as.factor(info$INDIVIDUAL)

design=model.matrix(~0+TREATMENT+GROUP+RESPONSE)

v=voom(y,design)

cor=duplicateCorrelation(v,design,block=INDIVIDUAL)

fit=lmFit(v,design,block=INDIVIDUAL, correlation=cor$consensus)

  How can I extract the desired coefficients with this model ?

paired samples multiple factor design limma voom • 1.3k views
ADD COMMENT
0
Entering edit mode

Split the tags for limma and voom, otherwise people won't get notified.

ADD REPLY
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 16 hours ago
The city by the bay

To do all of the comparisons you've described, there's one simple solution:

EVERYTHING <- factor(paste0(TREATMENT, ".", GROUP, ".", RESPONSE))
design <- model.matrix(~0 + EVERYTHING)
colnames(design) <- levels(EVERYTHING)

... and then run the same limma commands as you've described. Each coefficient in this design represents the average expression of a particular combination. Then you just compare between specific combinations using makeContrasts, which covers comparisons 3a to 6b. For 1a, you can compare the average expression across all pre-groupA combinations to that across all pre-group B combinations - see the last contrast in Section 3.2.3 of the edgeR user's guide for an example. The same reasoning applies for the other comparisons up to 2b.

ADD COMMENT
0
Entering edit mode

As always, thanks for your advices Aaron,

Following your answer, here are my contrasts:

design=model.matrix(~0+condition)  ###condition is the combined TREATMENT.GROUP.RESPONSE factors

colnames(design)
​
#[1] "Post.GroupA.Bad" "Post.GroupA.Good"
#[3] "Post.GroupB.Bad"   "Post.GroupB.Good"
#[5] "Pre.GroupA.Bad"  "Pre.GroupA.Good"
#[7] "Pre.GroupB.Bad"    "Pre.GroupB.Good"

v=voom(y,design)

cor=duplicateCorrelation(v,design,block=info$INDIVIDUAL)

fit=lmFit(v,design,block=info$INDIVIDUAL, correlation=cor$consensus)

cm=makeContrasts(
GroupBvsGroupAforPRE=((Pre.GroupB.Bad+Pre.GroupB.Good)/2)-((Pre.GroupA.Bad+Pre.GroupA.Good)/2) ,
GroupBvsGroupAforPOST=((Post.GroupB.Bad+Post.GroupB.Good)/2)-((Post.GroupA.Bad+Post.GroupA.Good)/2),
PostvsPreforGroupA=((Post.GroupA.Bad+Post.GroupA.Good)/2)-((Pre.GroupA.Bad+Pre.GroupA.Good)/2),
PostvsPreforGroupBs=((Post.GroupB.Bad+Post.GroupB.Good)/2)-((Pre.GroupB.Bad+Pre.GroupB.Good)/2),
BadvsGoodforPreGroupA=Pre.GroupA.Bad-Pre.GroupA.Good,
BadvsGoodforPreGroupB=Pre.GroupB.Bad-Pre.GroupB.Good,
BadvsGoodforPostGroupA=Post.GroupA.Bad-Post.GroupA.Good,
BadvsGoodforPostGroupB=Post.GroupB.Bad-Post.GroupB.Good,
PostvsPreforGoodGroupA=Post.GroupA.Good-Pre.GroupA.Good,
PostvsPreforGoodGroupBs=Post.GroupB.Good-Pre.GroupB.Good,
PostvsPreforBadGroupA=Post.GroupA.Bad-Pre.GroupA.Bad,
PostvsPreforBadGroupBs=Post.GroupB.Bad-Pre.GroupB.Bad,
levels=design)

Are they correct?

ADD REPLY
0
Entering edit mode

Looks fine to me.

ADD REPLY
0
Entering edit mode

even though, it is hard to interpret the results:

 GroupBvsGroupAforPRE GroupBvsGroupAforPOST PostvsPreforGroupA
-1                    0                     0                   0
0                 25911                 25909               25911
1                     0                     2                   0
   PostvsPreforGroupB PostvsPre PostvsPreforGood PostvsPreforBad
-1                  0         0              325               0
0               25909     25910            25463           25911
1                   2         1              123               0
   BadvsGoodforPre BadvsGoodforPost BadvsGoodforPreGroupA BadvsGoodforPreGroupB
-1              65                0                      1                    2
0            25828            25911                  25910                25905
1               18                0                      0                    4
   BadvsGoodforPostGroupA BadvsGoodforPostGroupB PostvsPreforGoodGroupA
-1                       0                     0                       0
0                    25911                 25911                   25911
1                        0                     0                       0
   PostvsPreforGoodGroupB PostvsPreforBadGroupA PostvsPreforBadGroupB
-1                    952                      0                     0
0                   24266                  25911                 25911
1                     693                      0                     0

DE genes found in PostvsPreforGood overlap with the ones of PostvsPreforGoodGroupB, so all the genes found in PostvsPreforGood are due to GroupB ( for PostvsPreforGoodGroupA we have 0 DE genes).

Moreover, how can it be we have more than 1645 DE genes for PostvsPreforGoodGroupB but for the rest of the comparisons we have almost nothing. I would expect to find something for example for BadvsGoodforPostGroupB.

 

ADD REPLY
1
Entering edit mode

All "post versus pre" comparisons are more powerful than the "bad vs good" comparisons, because the former are performed within each patient. While duplicateCorrelation allows limma to perform cross-patient comparisons, it needs to account for the patient-to-patient variability, which results in reduced power. So there may well be a lot of (true) DE genes in your BadvsGoodforPostGroupB comparison; however, your experimental set-up doesn't have enough power to detect them. Similarly, it's hard to compare the number of DE genes from the comparisons between averages with that from the comparisons between groups, because there's a different number of libraries involved - this will inevitably affect the power of each comparison.

P.S. If you're interested in identifying genes where the treatment response differs between good/bad patients, you could do something like this:

con <- makeContrasts((Post.GroupB.Good - Pre.GroupB.Good) -
    (Post.GroupB.Bad - Pre.GroupB.Bad), levels=design)

This computes the treatment log-fold change within patients, and then compares the log-fold changes between good/bad patients. In this manner, we avoid making direct comparisons between patients, which results in the power issue that I mentioned above.

ADD REPLY

Login before adding your answer.

Traffic: 565 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6