Splitting a dataset and control sample batch
1
0
Entering edit mode
mat.lesche ▴ 110
@matlesche-6835
Last seen 6 months ago
Germany

The experiment design is the following. There are two groups (NR and RE) and each comes in triplicates. These triplicates were grown on a dish. At timepoint 0 (T0) samples were collected. Afterwards, they were treated (RAP) or not (Co) and at timepoint 6 (T6), samples were collected again. Here is the condition table and a PCA (with and without samples correction) for the overview.

Donor
Cond Time C
17-0117

NR_Co

T0 T0_NR_Co
17-0077
NR_Co T0 T0_NR_Co
15-0019
NR_Co T0 T0_NR_Co
14-0162
RE_Co T0 T0_RE_Co
16-0384
RE_Co T0 T0_RE_Co
16-0343
RE_Co T0 T0_RE_Co
17-0117
NR_Co T6 T6_NR_Co
17-0077
NR_Co T6 T6_NR_Co
15-0019
NR_Co T6 T6_NR_Co
14-0162
RE_Co T6 T6_RE_Co
16-0384
RE_Co T6 T6_RE_Co
16-0343
RE_Co T6 T6_RE_Co
17-0117
NR_RAP T6 T6_NR_RAP
17-0077
NR_RAP T6 T6_NR_RAP
15-0019
NR_RAP T6 T6_NR_RAP
14-0162
RE_RAP T6 T6_RE_RAP
16-0384
RE_RAP T6 T6_RE_RAP
16-0343
RE_RAP T6 T6_RE_RAP

https://ibb.co/eeqc2U

https://ibb.co/d0xDbp

https://ibb.co/bRCDbp

For the following questions I decided to do a grouping of Cond and Time because it’s best to answer the following questions

dds$C ← merge(dds$Time, dds$Cond)

a) Are there no differences between T6_NR_Co and T6_NR_RAP?

b) Are there no differences between T6_RE_Co and T6_RE_RAP?

c) Are there no differences between T6_NR_Co and T6_RE_Co?

d) Are there no differences between T6_NR_RAP and T6_RE_RAP?

The problem here is, that I can’t control for the samples itself because the design ~Id+C causes an error “Error in checkFullRank(modelMatrix) :”.

I can only use ~ C which I think is not appropriate

Therefore I was wondering if it would be best to split the data set into RE samples and NR samples? This would make it possible to answer a) and b) and use the whole dataset for c) and d)

Just as a confirmation if I want to look at the effect of the treatment on the two groups I would need to use the interaction design: Type + Treatment + Type:Treatment

And my last question is for the following

If I do a comparison of

e) T6_NR_Co vs T0_NR_Co

f) T6_NR_RAP vs T0_NR_Co

I get about e) 1,500 genes and f) 2,000 DEGs. An overlap tells me that 50% are DE in e) and f), 15 % are only in e) and the rest in f). But a comparison of T6_NR_RAP vs T6_NR_Co gives me 0 DEGs which means there is no difference between RAP and Co for T6, even though e) and f) show DEGs. I have to mention as well that for T6_NR_RAP vs T6_NR_Co the pvalue histogram shows a curve towards 1 and the padj values are all identical being close to 1.

What would be the best design and contrast to ask for differences that only come from RAP over time? Could I only use the 35% from the overlap between e) and f), even though these are not DEGs for T6_NR_RAP vs T6_NR_Co.

Thanks

Mathias

deseq2 interactions batch effect correction grouping variable • 1.5k views
ADD COMMENT
0
Entering edit mode

Can you explain what you mean by "control for the samples itself"? You mean controlling for donor as listed above?

ADD REPLY
0
Entering edit mode

Yes. Sorry I meant Donor and not samples. The comparison T6_RE_Co vs T6_NR_RAP should need a the design Donor + C because the same Donors are in both Treatments.

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 12 hours ago
United States

hi Mathias,

To help see the structure, I recoded the levels of these factors:

   Donor Group   Cond Time
1      1     1  NR_Co   T0
2      2     1  NR_Co   T0
3      3     1  NR_Co   T0
4      4     2  RE_Co   T0
5      5     2  RE_Co   T0
6      6     2  RE_Co   T0
7      1     3  NR_Co   T6
8      2     3  NR_Co   T6
9      3     3  NR_Co   T6
10     4     4  RE_Co   T6
11     5     4  RE_Co   T6
12     6     4  RE_Co   T6
13     1     5 NR_RAP   T6
14     2     5 NR_RAP   T6
15     3     5 NR_RAP   T6
16     4     6 RE_RAP   T6
17     5     6 RE_RAP   T6
18     6     6 RE_RAP   T6

Now you can more easily see why you can't have donor and group in the design together, because they are linearly dependent. For example, Group 2 + 5 + 6 = Donor 4 + 5 + 6.

There are some comparisons you can make, with fixed effects, for example comparing group 3 to group 1 while controlling for Donor, or additionally, comparing the group 3 vs 1 effect and the group 4 vs 2 effect.

However, some of your desired comparisons are not possible with fixed effects, while controlling for donor, e.g. comparing group 2 to group 1. 

I'd recommend you use limma-voom and the duplicateCorrelation, which will allow you to mark which samples belong to which donor, and analyze the entire dataset for all your desired contrasts.

ADD COMMENT
0
Entering edit mode

Hi Michael,

Thanks for the answer. I have look at the limma-voom which will be very interesting. In the mean time, would be the following a valid approach: I try to answer the question with fixed effect with the whole data sets and question like group 2 to group 1 is done with a reduced data set that only contains the necessary samples? 

And do you have a comment on the question below the comparisons e) and f) in the first post?

Thanks

Mathias

ADD REPLY
0
Entering edit mode

Yes, you can split the dataset to answer the questions where controlling for individual is not possible with fixed effects.

For your last question, I think the important thing is the following: failure to reject the null hypothesis cannot be interpreted as anything like accepting the null. In other words, this is not the case: "comparison of T6_NR_RAP vs T6_NR_Co gives me 0 DEGs which means there is no difference between RAP and Co for T6"

ADD REPLY

Login before adding your answer.

Traffic: 619 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6