So I have the following samples for differential expression analysis and I'm hoping to see see if my design matrix makes sense. There are cell samples from three different donors each gone through 2 different cell culturing processes and 5 different treatments. The goal is to look at the differences between different treatments and also between different processes as well. Samples that gone through process A have data for all 5 treatments, while samples that gone through process B only have data for 2 of the 5 treatments. Is the design matrix here the right construction? Thanks a lot!
sampleInfo <- read_csv(<samplemanifest_csvfile>,col_names=TRUE Donor <- factor(sampleInfo$Donor) Treatment <- factor(sampleInfo$Treatment) Process <- factor(sampleInfo$Process) design <- model.matrix(~0+Treatment+Process+Donor)
Donors | Process | Treatment |
P01 | A | 1 |
P01 | A | 2 |
P01 | A | 3 |
P01 | A | 4 |
P01 | A | 5 |
P02 | A | 1 |
P02 | A | 2 |
P02 | A | 3 |
P02 | A | 4 |
P02 | A | 5 |
P03 | A | 1 |
P03 | A | 2 |
P03 | A | 3 |
P03 | A | 4 |
P03 | A | 5 |
P01 | B | 2 |
P01 | B | 5 |
P02 | B | 2 |
P02 | B | 5 |
P03 | B | 2 |
P03 | B | 5 |
Thank you very much! Yes, I was actually reading your previous answer earlier and thought about using what you suggested here ( similar to before as well). The only thing though, is that the scientist also wanted to look at differences "between processes". So I thought maybe I should make the design matrix in a way that I can make contrast that I can directly analyze the differences between Process A and Process B... thus making the design matrix: model.matrix(~0+Treatment+Process+Donor).
- Does this mean that this way the contrast (Process A-Process B) I'm not separating treatments and looking all the treatments together?
- If I use model.matrix(~0+ProcTreat+Donor) , kind of following the question above, would you think it is more correct to look at differences between processes within the same treatment?
On a side note, I see that most of the variability here actually came from different donors.
thanks very much again.
Yes, it is generally more meaningful to compare Processes for the same Treatment.
Comparing Process B to Process A using your old model was confounding differences between processes with differences between Treatments, because Treatment 1 was not used with Process B. There were other problems as well.
There is no such thing as "separating treatments". You can't compare processes as if treatments didn't exist.
Thank you very much for your help!