Normalizing batch effects using technical replicates for DGE analysis between two conditions from 2 different runs
1
0
Entering edit mode
@annacotannacot-20795
Last seen 28 days ago
United States

I have two RNA-seq experiments performed in different batches several months apart. Both experiments include technical replicates of the same 10 samples from condition A (with identical sample names: A1, A2, ..., A10). Exactly same control samples were used.

Experiment 1: 10 samples from condition A and 10 samples from condition B.

Experiment 2: 10 samples from condition A and 10 samples from condition C.

How can I use the technical replicates (condition A) to normalize for batch effects and perform differential gene expression analysis between conditions B and C?

I have processed my reads with kallisto so far and tximport.

I appreciate all your help!

limma DESeq2 edgeR • 779 views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 1 day ago
WEHI, Melbourne, Australia

Just fit a model ~condition + experiment where condition has three levels A, B and C. Ideally, use limma and include duplicate correlation between the repeat sample pairs (A1, A2, ... A10).

ADD COMMENT
0
Entering edit mode

Thank you Gordon! Apologies if this is a too basic question-I'm a biologist and want to ensure I understand correctly. Here is my metadata and a script- is it what you suggested?

Here is my metadata:

   sample  condition experiment mouse
1    A1_1 Knockout_1          1     1
2    A2_1 Knockout_1          1     2
3    A3_1 Knockout_1          1     3
4    A4_1 Knockout_1          1     4
5    A5_1 Knockout_1          1     5
6    A6_1 Knockout_1          1     6
7    A7_1 Knockout_1          1     7
8    A8_1 Knockout_1          1     8
9    A9_1 Knockout_1          1     9
10  A10_1 Knockout_1          1    10
11     B1       Ctrl          1    11
12     B2       Ctrl          1    12
13     B3       Ctrl          1    13
14     B4       Ctrl          1    14
15     B5       Ctrl          1    15
16     B6       Ctrl          1    16
17     B7       Ctrl          1    17
18     B8       Ctrl          1    18
19     B9       Ctrl          1    19
20    B10       Ctrl          1    20
21   A1_2 Knockout_1          2     1
22   A2_2 Knockout_1          2     2
23   A3_2 Knockout_1          2     3
24   A4_2 Knockout_1          2     4
25   A5_2 Knockout_1          2     5
26   A6_2 Knockout_1          2     6
27   A7_2 Knockout_1          2     7
28   A8_2 Knockout_1          2     8
29   A9_2 Knockout_1          2     9
30  A10_2 Knockout_1          2    10
31     C1 Knockout_2          2    21
32     C2 Knockout_2          2    22
33     C3 Knockout_2          2    23
34     C4 Knockout_2          2    24
35     C5 Knockout_2          2    25
36     C6 Knockout_2          2    26
37     C7 Knockout_2          2    27
38     C8 Knockout_2          2    28
39     C9 Knockout_2          2    29
40    C10 Knockout_2          2    30

Here is the script I am using for analysis:

dge <- DGEList(counts = counts)
dge <- calcNormFactors(dge)
design <- model.matrix(~condition + experiment, data = metadata)
voom_data <- voom(dge, design)
dupcor <- duplicateCorrelation(voom_data, design, block = metadata$mouse)
fit <- lmFit(voom_data, design, block = metadata$mouse, correlation = dupcor$consensus.correlation)
fit <- eBayes(fit)
contrast.matrix <- makeContrasts(
  A_vs_B = conditionA,
  C_vs_B = conditionC,
  C_vs_A = conditionC - conditionA,
  levels = design
)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)
results_A_vs_B <- topTable(fit2, coef = "A_vs_B", adjust.method = "fdr", number = Inf)
results_C_vs_B <- topTable(fit2, coef = "C_vs_B", adjust.method = "fdr", number = Inf)
results_C_vs_A <- topTable(fit2, coef = "C_vs_A", adjust.method = "fdr", number = Inf)
ADD REPLY
0
Entering edit mode

Please come back to the support site and view your post - I've fixed it for you. If you want to show people your targets file, just paste that in rather than the code someone could use to generate it. Also, when you post the targets file or any code, please either put a triple backtick (the upper left key on a QWERTY keyboard) before and after the code, or highlight the block of code and click the CODE button that is immediately above the dialog box you type in.

While you are composing your post you can always see what it will look like by scrolling down and looking at the presentation box right below the dialog box. If it looks like a bunch of gibberish, that's what your post will look like, so it gives you a chance to edit for clarity.

ADD REPLY
0
Entering edit mode

Thank you James, that really helped! I tried CODE enter code here but I must have dome it incorrectly, as it showed all in red.

ADD REPLY
0
Entering edit mode

Oh, right. There's two ways to enter code. If you are in the middle of a sentence and you want to include a function name, you wrap the name in backticks. And if you are typing a sentence and click CODE, it just types 'enter code here' with backticks. That's why it was red, because code in a sentence should be red like that.

But if you want a block of code, you can add a line of three backticks, then put in the code, and then another three backticks. An alternative (which is what the CODE button does if you are on a new line) is to indent by four spaces. Everything that is indented will have the code typeface.

ADD REPLY
0
Entering edit mode

That looks basically correct, although the terms that you are inputing to makeContrasts do not match the condition names in the targets file, so I assume the targets info you have shown here is not your real data.

You would find it easier to use voomLmFit, which automates the process of using voom with duplicateCorrelation, and has some additional advantages.

ADD REPLY
0
Entering edit mode

Thank you! This is actually quite close to my real dataset. I simplified the names to make it easier as an example, but I have now corrected it so that it is more useful and clearer for others to follow.

fit <- voomLmFit(dge, design, block = metadata$mouse)
>First intra-block correlation  -0.06414855
>Final intra-block correlation  -0.06414003

I am not sure how to interpret this, as I expected to see a rather strong positive correlation. I have double-checked the metadata, and everything seems correct. However, the data were generated several months apart, potentially on different machines. Any comments or suggestions would be greatly appreciated!

ADD REPLY
0
Entering edit mode

Since the intra-block correlation is negative, it would be best to remove block from the model.

My experience with genetically identical lab mice at the WEHI is that repeat observations on the same mouse are virtually the same as observations from different mice, so don't lead to strong intra-mouse correlations.

ADD REPLY

Login before adding your answer.

Traffic: 565 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6