Question

DESeq2 Design Matrix

0

Entering edit mode

jmannhei ▴ 20

@d7247bd3

Last seen 2.5 years ago

United States

Hi all,

I have somewhat of a complex DEG analysis experimental design and want to make sure I am setting it up right because I have gotten some strange results.

Description of the Data

I have three different treatment A,B,C
It is somewhat a paired situation. Meaning some samples of the samples come from the same patient but not all. i.e. Patient X might have two different samples but the two different samples differ by treatment. However, only of a fraction of the samples are paired
The samples have additional batch effects because they were processed in different batches
The samples come from different tissues i.e. lung, breast etc

My objective is to get differentially expressed genes between treatments A and C, while controlling for Patient ID, Batch, and tissue.

Based on my understanding of linear models I think the design matrix should like design~ patient ID + Batch + Tissue + Treatment as follows

dds<-DESeqDataSetFromMatrix(countData=counts, colData=coldata, design= ~ Patient ID + Batch + Tissue + Treatment)

where coldata is the indicator matrix. I get some weird results in the sense that the genes are heavily biased one way, the number of genes upregulated and downregulated are not remotely close to even. Additionally some of the genes that pop out are heavily biased towards certain tissues. I figure this could just be a result of the fact the data is not spread evenly across treatment, tissues, and Patients and perhaps is the best I can do. However, I also wanted to make sure my approach was correct in setting up the experiment or if there might be a better way to do things. Additionally, I keep samples from treatment B even though I am not looking for and DEGs in B but my thought are since these correspond to different control conditions its best to leave those in to better estimate the effects of each control on each gene. Thanks

DESeq2 • 808 views

ADD COMMENT • link updated 2.7 years ago by Michael Love 43k • written 2.7 years ago by jmannhei ▴ 20

score 0 · Answer 1 · 2022-08-25

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 9 minutes ago

United States

For questions about how to design the statistical analysis and interpret the results, I recommend collaborating or consulting with a statistician or someone familiar with linear models in R. I have to reserve my time on the support site for software related questions.

ADD COMMENT • link 2.7 years ago Michael Love 43k