I have a question about the design that i am using for a question that i am interested in. Basically i have three genotypes (LL, OP and M), each of which have been subjected to 2 treatments (5 and 33) with 6 biological replicates per treatment. Here is the sample dataframe for the experiment
sample trt gt rep
1 5_OP_1 5 OP 1
2 5_OP_2 5 OP 2
3 5_OP_3 5 OP 3
4 5_OP_4 5 OP 4
5 5_OP_5 5 OP 5
6 5_OP_6 5 OP 6
7 33_OP_1 33 OP 1
8 33_OP_2 33 OP 2
9 33_OP_3 33 OP 3
10 33_OP_4 33 OP 4
11 33_OP_5 33 OP 5
12 33_OP_6 33 OP 6
13 5_M_1 5 M 1
14 5_M_2 5 M 2
15 5_M_3 5 M 3
16 5_M_4 5 M 4
17 5_M_5 5 M 5
18 5_M_6 5 M 6
19 33_M_1 33 M 1
20 33_M_2 33 M 2
21 33_M_3 33 M 3
22 33_M_4 33 M 4
23 33_M_5 33 M 5
24 33_M_6 33 M 6
25 5_LL_1 5 LL 1
26 5_LL_2 5 LL 2
27 5_LL_3 5 LL 3
28 5_LL_4 5 LL 4
29 5_LL_5 5 LL 5
30 5_LL_6 5 LL 6
31 33_LL_1 33 LL 1
32 33_LL_2 33 LL 2
33 33_LL_3 33 LL 3
34 33_LL_4 33 LL 4
35 33_LL_5 33 LL 5
36 33_LL_6 33 LL 6
Now i am interested in transcripts that are differentially expressed in genotype LL at either 5 or 33 treatment with respect to the other two genotypes (OP and M). Here is what i did and i want to make sure that i did correct.
# Load the DESeq2 package library("DESeq2") # Load the file data <- read.csv("data_51_full.txt", sep = "\t") # Create a df with sample, treatment, genotype and replicate information samples <- data.frame( sample = names(data), trt = sub("(5|33)_(OP|M|LL)_([1-6])","\\1",names(data)), gt = sub("(5|33)_(OP|M|LL)_([1-6])","\\2",names(data)), rep = sub("(5|33)_(OP|M|LL)_([1-6])","\\3",names(data)) ) samples # Count matrix input dds <- DESeqDataSetFromMatrix(countData = data, colData = samples, design = ~ 1) # Filtering to remove rows with 0 reads dds <- dds[ rowSums(counts(dds)) > 1, ] # design design(dds) <- ~ trt + gt + trt:gt # Run DESeq2 dds <- DESeq(dds) # Print the coefficients resultsNames(dds) #[1] "Intercept" "trt_5_vs_33" "gt_M_vs_LL" "gt_OP_vs_LL" "trt5.gtM" "trt5.gtOP" # The treatment effect for genotype I (LL) (the main effect) res <- results(dds, contrast=c("trt","5","33")) mcols(res)$description #[1] "mean of normalized counts for all samples" "log2 fold change (MLE): trt 5 vs 33" #[3] "standard error: trt 5 vs 33" "Wald statistic: trt 5 vs 33" #[5] "Wald test p-value: trt 5 vs 33" "BH adjusted p-values"
Thanks Michael for your response. There are several questions that i want to ask
Please let me know if this is not clear and i will try to elaborate further...
Here is the whole code that i have written to answer the above questions in my capacity. Please find the same....
hi,
So first, let me recommend that you update to the latest version of DESeq2 which is 1.10. You can check with this:
I think the best design for you, to address the different questions you'd like to answer is:
~ trt + trt:gt
This is essentially the same design as the one you have, but makes it easier to perform the contrasts that I think you have described.
There is not a single contrast which gives the answer of DE of e.g. (C vs A) or (B vs A), but you can accomplish this by taking the union of the FDR limited sets from two pairwise comparisons. You can compare OP vs LL for treatment 5 with the following:
And you can compare M vs LL for treatment 5 with:
Likewise, you would swap in trt33 in the name for the same question in treatment 33.
If you want to ask if the OP vs LL effect is different in treatment 33 vs treatment 5, you form the contrast:
I'm not exactly sure how to answer some of your questions in 2. Remember, as I said previously, you can't estimate the treatment effect alone (only compare across treatments) because you do not have control samples.
Thanks Micheal. I have a question/clarification. Isn't the
results(dds, name="trt5.gtOP")
andresults(dds, name="trt5.gtM")
the same? The only difference being the log fold changes? (Similarly for the 33 treatment). In addition, i am getting the same set of genes inresults(dds, contrast=list("trt33.gtOP", "trt5.gtOP"))
andresults(dds, contrast=list("trt33.gtM", "trt5.gtM"))
. I have set up this modeldds <- nbinomLRT(dds, full = design(dds), reduced = ~ cond)
for the above all .. Please correct me if anything i am missing/misinterpreting. ThanksIf you use DESeq() then the results() calls I posted will work.
If you have run nbinomLRT() instead, you'll have to specify that you want Wald tests for these contrasts by setting
test="Wald"
in results().Thanks Michael again. I have incorporated the changes you recommended and here is the code for the same. Can you please look at it and let me know what you think.
Thanks Michael. Very much appreciated for your help!