[ Crossposted from Biostars: https://www.biostars.org/p/348420/#348421 ]
Hello, very new to RNA-Sequencing, R and DESeq2 so the question might be too simple or repeated so I apologize in advance.
I am working with 24 RNA-Sequencing samples, 8 conditions with 3 biological replicates for each. A part of my data matrix is shown below.
Each sample represents one of three phenotypes. A wild-type phenotype, and two altered phenotypes. I would like to compare the wild-type phenotype with each of the other phenotypes, and the other two phenotypes with each other. I am confused as to what design formula I should use to generate my DESeq object. I have read the vignettes and beginner's guide but I lack the background to fully understand how the design formulas work. I have included part of my coldata below.
I would like to compare NB and CW, and NB and CCW. I aim to find genes that are differentially expressed as phenotype changes from NB to CW and from NB to CCW. Then I would like to compare CW and CCW. I would like to do this pair-wise for each sample pair and retain that information in the output. So far, I have been doing this by splitting my matrix into pairs of samples and running each pair individually but that is incredibly tedious and was wondering if I could accomplish the same results by using the complete expression matrix and coldata. I have tried only using chirality in the design formula (design = ~ Chirality) and set NB as the reference lever using the relevel() function, but upon using the resultsNames(dds) function I get,
"Intercept" "Chirality_CW_vs_CCW" "Chirality_NB_vs_CCW"
However, I believe I am looking for NB vs CW and NB vs CCW. Also, I have not included the Cell.Type in the design formula but I assume it should be included since I want to compare the samples pair-wise. Any help regarding design formulae for this particular case, understanding design formulae in R, in general, or other comments are greatly appreciated.
Thanks,
Tasnif Rahman
Hi Michael, thank you for your response. I have included the full coldata object.
> as.data.frame(coldata)
Chirality Cell.Type
S11 NB RUES2
S12 NB RUES2
S13 NB RUES2
S22 CW D5 Cardiac Mesoderm
S23 CW D5 Cardiac Mesoderm
S24 CW D5 Cardiac Mesoderm
S31 CCW D5 Neural Induction
S32 CCW D5 Neural Induction
S33 CCW D5 Neural Induction
S41 CCW D3 Endoderm
S42 CCW D3 Endoderm
S43 CCW D3 Endoderm
S52 CW SB431542 24hr
S53 CW SB431542 24hr
S54 CW SB431542 24hr
S63 CCW Wnt3a 24hr
S64 CCW Wnt3a 24hr
S65 CCW Wnt3a 24hr
S71 CCW D7 MH
S72 CCW D7 MH
S73 CCW D7 MH
S82 CW IWP2 24hr
S83 CW IWP2 24hr
S84 CW IWP2 24hr
There is no correlation between the cell type and the Chirality. The only reason to include cell type would be to get pairwise data for NB vs CW and NB vs CCW for each sample vs the REUS2 cell type, and then further to identify DE genes that overlap between samples for NB vs CW and NB vs CCW respectively. We are trying to identify gene targets that are differentially expressed that might lead to the two different phenotypes (Chirality: CW and CWW) vs the control phenotype (Chirality: NB).
Thank you!
Tasnif Rahman
The cell type looks nested within each chirality. From scanning the column, I don't see anywhere where you could do a blocked or paired analysis. Can you explain what kind of pairing you are imagining?
Yeah, they are basically different treatment conditions that pluripotent stem cells are treated with, and each condition leads to a CW or CWW bias. The stem cells themselves have no bias. I would like to compare the gene expression of groups with CW bias to the group with no bias. Do you reckon I should just use "NB" and "CW" as the ~condition and then do another comparison with "NB" and "CCW"? I was thinking of identifying genes differentially expressed between each CW cell type and the NB cell type, and then looking at genes that show similar differential expression between each of those comparisons to identify genes differentially expressed between the NB and CW conditions. So the pairs for NB vs CW would be REUS2 vs Cardiac Mesoderm, REUS2 vs SB431542 and REUS2 vs IWP2 24hr, and similar for the CCW groups. Since the different cell types are cells treated with drugs targetting different pathways, or induction of different lineage differentiations, I am a bit concerned about grouping all CW groups together for the comparison, since each would have its own gene expression profile for genes not involved in the phenotypic bias of interest. The goal however, is to identify gene expression patterns that are common between all biased samples with so I guess that should not matter. Any insights would be appreciated, and thank you for your time, Michael.
Best,
Tasnif
To generalize, let me call NB, CW, and CWW just A, B and C, levels of "condition". You can compare C to A and B to A in one design, you don't need to rerun DESeq(), you just use the "contrast" argument of results().
That doesn't deal with the correlations among the cell type though. You can't account for these using fixed effects. The method for doing this (to compare across groups of samples, but account for various levels that are nested within) is the duplicateCorrelation() function in limma.
As far as doing post-hoc tests to find which cell type, and looking at more than just C vs A and B vs A, you should make sure that whatever post-hoc test you do, you control the false positives. We do not have functionality for this in DESeq2, so you may want to discuss with a statistician.
I understand. How should I set up the single design for C vs A and B vs A? Can I have more than one contrast comparison in one command? Also, I only used "design = ~ Chirality" to generate the dds, should that be okay with such comparisons? Thank you again, really appreciate the help.
My answer is that this experiment, with the questions you want to answer, can’t really be analyzed with fixed effects and so you need to use limma.