Question

Design matrix design for the differentially expression analysis

1

Entering edit mode

riya ▴ 10

@riya-22831

Last seen 5.0 years ago

Hi, I am pretty new to the RNA seq data analysis and I need some advice on the design matrix formulation:

This is my sample file with a column group id which I made to group all the variables in one to distinguish one sample from another and then use it in the design matrix:

Sample	Cell		Type		Concentration	Time(hrs)	GroupID
1	AB1		Control		0	5	AB1Control0_5
2	AB1		Control		0	5	AB1Control0_5
3	AB1		Control		0	5	AB1Control0_5
4	AB1		Treatment		5	5	AB1Treatment5_5
5	AB1		Treatment		5	5	AB1Treatment5_5
6	AB1		Treatment		5	5	AB1Treatment5_5
7	ST1		Control		0	5	ST1Control0_5
8	ST1		Control		0	5	ST1Control0_5
9	ST1		Control		0	5	ST1Control0_5
10	ST1		Treatment		5	5	ST1Treatment5_5
11	ST1		Treatment		5	5	ST1Treatment5_5
12	ST1		Treatment		5	5	ST1Treatment5_5
13	AB1		Control		0	8	AB1Control0_8
14	AB1		Control		0	8	AB1Control0_8
15	AB1		Control		0	8	AB1Control0_8
16	AB1		Treatment		8	8	AB1Treatment8_8
17	AB1		Treatment		8	8	AB1Treatment8_8
18	AB1		Treatment		8	8	AB1Treatment8_8
19	ST1		Control		0	8	ST1Control0_8
20	ST1		Control		0	8	ST1Control0_8
21	ST1		Control		0	8	ST1Control0_8
22	ST1		Treatment		8	8	ST1Treatment8_8
23	ST1		Treatment		8	8	ST1Treatment8_8
24	ST1	Treatment		8		8	ST1Treatment8_8

Contrasts I need

for my contrast matrix I want to compare: 1) AB1.Treatment at concentration 5 and time point 5 vs AB1.control at concentration 0 and time point 5 2)AB1.Treatment at concentration 8 and time point 8 vs AB1.control at concentration 0 and time point 8 3) ST1.Treatment at concentration 5 and time point 5 vs ST1.control at concentration 0 and time point 5 4)ST1.Treatment at concentration 8 and time point 8 vs ST1.control at concentration 0 and time point 8

design matrix I used:

dds = DESeqDataSetFromMatrix(countData = Countdata, colData = Metadata, design = ~ GroupID)

results(dds,contrast=c("GroupID","AB1Treatment55","AB1Control05")) results(dds,contrast=c("GroupID","ST1Treatment55","ST1Control05")) results(dds,contrast=c("GroupID","AB1Treatment88","AB1Control08")) results(dds,contrast=c("GroupID","ST1Treatment88","ST1Control08"))

and when I run resultnames(dds)I see some contrast I don't need and not the ones I need : for example:: GroupIDAB1Treatment55vsAB1Control05 GroupIDST1Treatment55vsAB1Control05 ( this is not what I want) but I want GroupIDST1Treatment55vsST1Control05.

Also I get the results from this but I see some only 1 as adj values for all the genes in some contrasts. so , is my design matrix right? Could somebody help me on this ??

Please tell me if I am doing wrong somewhere

Thanks in advance!

deseq2 • 829 views

ADD COMMENT • link updated 5.0 years ago by Michael Love 43k • written 5.0 years ago by riya ▴ 10

0

Entering edit mode

Could you please update your post ot fix the formatting? Perhaps specifying the formatting of the table you listed to be a "code sample" (the 100110 button in the tool bar, or just directly edit the formatting using markdown syntax).

Also, you've shown us the design of your experiment, but you didn't show us the results() call you made to extrat the statistics you are after. Note that getting adjust pvalues hammered to all 1 (or close to it) isn't all that uncommon (even though it can be surprising):

Plotting a histogram of your nominal p-values is often a good diagnostic tool after you run your analyses.

ADD REPLY • link 5.0 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Thanks for the tip Steve! I want to know if adding another column(here GroupID) in your data to club all the variables together good enough to then use it in design matrix ??because as far as I understood , the whole idea of design matrix is to distinguish one sample from another based on the conditions and in my case I want to consider all these conditions which are varying in the experiment and do the treatment vs their corresponding controls.

ADD REPLY • link 5.0 years ago riya ▴ 10

0

Entering edit mode

If you want to compare one subset of samples to another subset of samples, giving them unique labels in the GroupID column is usually the best way to do that. If you want to compare all the controls to all the treated, and you want the software to model the differences introduced by there being two time points in addition to the larger question of differences between treatment, that's when you do a design of ~ treatment + day

ADD REPLY • link 5.0 years ago swbarnes2 ★ 1.4k

score 0 · Answer 1 · 2020-02-04

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

hi Riya,

I'd recommend to work with a statistician or someone familiar with linear models. We have a lot of documentation in the vignette and ?results but I don't have extra time these days for statistical consulting on the support site. I have to reserve my time for software related questions.

ADD COMMENT • link 5.0 years ago Michael Love 43k