Hi, I am pretty new to the RNA seq data analysis and I need some advice on the design matrix formulation:
This is my sample file with a column group id which I made to group all the variables in one to distinguish one sample from another and then use it in the design matrix:
Sample | Cell | Type | Concentration | Time(hrs) | GroupID | |||
---|---|---|---|---|---|---|---|---|
1 | AB1 | Control | 0 | 5 | AB1Control0_5 | |||
2 | AB1 | Control | 0 | 5 | AB1Control0_5 | |||
3 | AB1 | Control | 0 | 5 | AB1Control0_5 | |||
4 | AB1 | Treatment | 5 | 5 | AB1Treatment5_5 | |||
5 | AB1 | Treatment | 5 | 5 | AB1Treatment5_5 | |||
6 | AB1 | Treatment | 5 | 5 | AB1Treatment5_5 | |||
7 | ST1 | Control | 0 | 5 | ST1Control0_5 | |||
8 | ST1 | Control | 0 | 5 | ST1Control0_5 | |||
9 | ST1 | Control | 0 | 5 | ST1Control0_5 | |||
10 | ST1 | Treatment | 5 | 5 | ST1Treatment5_5 | |||
11 | ST1 | Treatment | 5 | 5 | ST1Treatment5_5 | |||
12 | ST1 | Treatment | 5 | 5 | ST1Treatment5_5 | |||
13 | AB1 | Control | 0 | 8 | AB1Control0_8 | |||
14 | AB1 | Control | 0 | 8 | AB1Control0_8 | |||
15 | AB1 | Control | 0 | 8 | AB1Control0_8 | |||
16 | AB1 | Treatment | 8 | 8 | AB1Treatment8_8 | |||
17 | AB1 | Treatment | 8 | 8 | AB1Treatment8_8 | |||
18 | AB1 | Treatment | 8 | 8 | AB1Treatment8_8 | |||
19 | ST1 | Control | 0 | 8 | ST1Control0_8 | |||
20 | ST1 | Control | 0 | 8 | ST1Control0_8 | |||
21 | ST1 | Control | 0 | 8 | ST1Control0_8 | |||
22 | ST1 | Treatment | 8 | 8 | ST1Treatment8_8 | |||
23 | ST1 | Treatment | 8 | 8 | ST1Treatment8_8 | |||
24 | ST1 | Treatment | 8 | 8 | ST1Treatment8_8 |
Contrasts I need
for my contrast matrix I want to compare: 1) AB1.Treatment at concentration 5 and time point 5 vs AB1.control at concentration 0 and time point 5 2)AB1.Treatment at concentration 8 and time point 8 vs AB1.control at concentration 0 and time point 8 3) ST1.Treatment at concentration 5 and time point 5 vs ST1.control at concentration 0 and time point 5 4)ST1.Treatment at concentration 8 and time point 8 vs ST1.control at concentration 0 and time point 8
design matrix I used:
dds = DESeqDataSetFromMatrix(countData = Countdata, colData = Metadata, design = ~ GroupID)
results(dds,contrast=c("GroupID","AB1Treatment55","AB1Control05")) results(dds,contrast=c("GroupID","ST1Treatment55","ST1Control05")) results(dds,contrast=c("GroupID","AB1Treatment88","AB1Control08")) results(dds,contrast=c("GroupID","ST1Treatment88","ST1Control08"))
and when I run resultnames(dds)I see some contrast I don't need and not the ones I need : for example:: GroupIDAB1Treatment55vsAB1Control05 GroupIDST1Treatment55vsAB1Control05 ( this is not what I want) but I want GroupIDST1Treatment55vsST1Control05.
Also I get the results from this but I see some only 1 as adj values for all the genes in some contrasts. so , is my design matrix right? Could somebody help me on this ??
Please tell me if I am doing wrong somewhere
Thanks in advance!
Could you please update your post ot fix the formatting? Perhaps specifying the formatting of the table you listed to be a "code sample" (the 100110 button in the tool bar, or just directly edit the formatting using markdown syntax).
Also, you've shown us the design of your experiment, but you didn't show us the
results()
call you made to extrat the statistics you are after. Note that getting adjust pvalues hammered to all 1 (or close to it) isn't all that uncommon (even though it can be surprising):Plotting a histogram of your nominal p-values is often a good diagnostic tool after you run your analyses.
Thanks for the tip Steve! I want to know if adding another column(here GroupID) in your data to club all the variables together good enough to then use it in design matrix ??because as far as I understood , the whole idea of design matrix is to distinguish one sample from another based on the conditions and in my case I want to consider all these conditions which are varying in the experiment and do the treatment vs their corresponding controls.
If you want to compare one subset of samples to another subset of samples, giving them unique labels in the GroupID column is usually the best way to do that. If you want to compare all the controls to all the treated, and you want the software to model the differences introduced by there being two time points in addition to the larger question of differences between treatment, that's when you do a design of ~ treatment + day