Question

Analyzing a RNA-Seq dataset with paired samples and 2 batches

0

Entering edit mode

ege.dedeoglu • 0

@egededeoglu-20387

Last seen 5.7 years ago

Hello everyone,

I have been tackling this issue for some time now. I have a dataset which has 20 patients and 40 samples (the samples are paired). There are two timepoints that I am checking differential expression for (eg. TP1 & TP2). However 11 of these patients are in one sequencing batch (B1) and the remaining 9 are in another batch (B2). In DESeq2 Vignette in the section "Model matrix not full rank" there is a part titled "Group-specific condition effects, individuals nested within groups" which explains how you can control for this type of situation. I have tried applying the design matrix as follows: (batch + batch:ba.pi + batch:condition) where ba.pi is the distinguisher of the individuals in the nested loop, but I got the same error. I have discarded 2 random samples from B1 and tried it again whicm made it work. I contemplated using ComBat to the normalized log2 converted DESeq2 counts seperately batch correct the two conditions and then do a t-test between the two groups but I am not sure if this is applicable. I would appreciate any input regarding this matter. The author's have seperately normalized the batches and worked with the common DEGs.

Regards,

Ege

deseq2 sva combat • 1.5k views

ADD COMMENT • link updated 5.7 years ago by Michael Love 43k • written 5.7 years ago by ege.dedeoglu • 0

score 0 · Answer 1 · 2019-08-26

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 7 days ago

United States

If you have a design where you control patient baseline, and the paired samples per patient are all within batches, you don't need to control for batch effects with a term in the design. The patient baseline takes care of it.

ADD COMMENT • link 5.7 years ago Michael Love 43k

0

Entering edit mode

I really could not understand what you are saying as English is not my first language. The batches contain 22 and 18 samples respectivly. Both the patients samples (TP1 & TP2) are in the same batch. My first design was :( ~patient + condition) but it gave me a distribution graph as the batches are seperated rather than the timepoints, however the time-points are seperated within the batches.

ADD REPLY • link 5.7 years ago ege.dedeoglu • 0