Deseq2 Coefficients from resultsNames() not reflecting all coefficient possibilities.
1
0
Entering edit mode
@2e7d4d2a
Last seen 8 months ago
United States

I am unsure about the output of resultsNames() function, and want to make sure everything else looks ok.

My overall biological question is to perform differential expression analysis between control and mutant (ctrl vs dmut) when I have N=3--ie 3 independent experiments where I collected one control and one mutant sample during each independent experiment--therefore collecting 6 total samples. I am also trying to account for variation that would be caused by different experiment dates. Therefore my design = ~ ind.n + group (with ind.n representing unique experiments). For this sample there are 3 independent ind.n values since I did 3 experiments.

I used the following "metadata_ls_age_2" for deseq2 analysis, subsetted for the specific timepoint I'm having trouble with. (in addition to a matched count matrix called counts_ls_age

metadata_ls_age_2$'120'

$`120`
                                                            group_sample_id                  sample_name group age sample_number ind.n
120-1-CTL-library-GRO2404A3_120_ctrl   120-1-CTL-library-GRO2404A3_120_ctrl  120-1-CTL-library-GRO2404A3  ctrl 120             1     1
120-1-DMUT-library-GRO2404A4_120_dmut 120-1-DMUT-library-GRO2404A4_120_dmut 120-1-DMUT-library-GRO2404A4  dmut 120             1     1
120-2CTL-LIBRARY-GRO2455A7_120_ctrl     120-2CTL-LIBRARY-GRO2455A7_120_ctrl   120-2CTL-LIBRARY-GRO2455A7  ctrl 120             2     2
120-2DMUT-LIBRARY-GRO2455A8_120_dmut   120-2DMUT-LIBRARY-GRO2455A8_120_dmut  120-2DMUT-LIBRARY-GRO2455A8  dmut 120             2     2
120-3-CTL-Library-GRO2554A5_120_ctrl   120-3-CTL-Library-GRO2554A5_120_ctrl  120-3-CTL-Library-GRO2554A5  ctrl 120             3     3
120-3-DMUT-Library-GRO2554A6_120_dmut 120-3-DMUT-Library-GRO2554A6_120_dmut 120-3-DMUT-Library-GRO2554A6  dmut 120             3     3
                                      ind.s
120-1-CTL-library-GRO2404A3_120_ctrl      1
120-1-DMUT-library-GRO2404A4_120_dmut     2
120-2CTL-LIBRARY-GRO2455A7_120_ctrl       3
120-2DMUT-LIBRARY-GRO2455A8_120_dmut      4
120-3-CTL-Library-GRO2554A5_120_ctrl      5
120-3-DMUT-Library-GRO2554A6_120_dmut     6

# include your problematic code here with any corresponding output 
# please also include the results of running the following in an R session 

sessionInfo( )

Here is the script I used to run deseq2

i <- "120"
 idx <- which(names(counts_ls_age) == i)
  age_counts <- counts_ls_age[[idx]]
  age_metadata <- metadata_ls_age_2[[idx]]
  #check that group_count and group_metadata capture information related to the same group. Make sure that columns of count matrix match the rows of the metadata.  # Check contents of extracted objects
  age_counts[1:6, 1:6]
  head(age_metadata)
  # Double-check that both lists have same names
  all(colnames(age_counts)==rownames(age_metadata)) 
  #[1] TRUE
  # Create DESeq2 object. We want to measure the effect of group at each specific age. They did cell type and then designed by group. So we'll do age and then group. To look at sibCTL vs dmut at each age.
  dds <- DESeqDataSetFromMatrix(age_counts, 
                                colData = age_metadata, 
                                design = ~ind.n + group)


  set <- toString(paste0("dds",i))
  assign(set,dds) 
  object<-.Last.value

  # Transform counts for data visualization
  rld <- rlog(object, blind=TRUE)

 # Run DESeq2 differential expression analysis
  dds <- DESeq(object)

 # Check the coefficients for the comparison
  resultsNames(dds)

And here is the output of that final command.

[1] "Intercept"          "ind.n_2_vs_1"       "ind.n_3_vs_1"       "group_dmut_vs_ctrl"

I also ran this for other timepoints that only had two independent experiments, and got the following output.

[1] "Intercept"          "ind.n_2_vs_1"       "group_dmut_vs_ctrl"

This makes sense to me since I only have two independent experiments and therefore there should only be one comparison. However, for the first group (the one with three samples) , wouldn't we also expect an additional coefficient called "ind.n_2_vs_3" or ind.n_3_vs_2" ? Since I'm trying to account for variation across all 3 experiments, I can't imagine why that wouldn't be a coefficient. If I am correct and there is something wrong with these coefficients, how do I fix this?

Thank you!

DESeq2 • 1.1k views
ADD COMMENT
0
Entering edit mode
swbarnes2 ★ 1.4k
@swbarnes2-14086
Last seen 1 hour ago
San Diego

Ind.n 1 is the reference level, resultsNames gives you everything compared to the reference level. You can still use contrasts to compare 2 to 3, if you wanted to.

ADD COMMENT
0
Entering edit mode

Thank you for your response. It makes sense to me to include contrasts between all 3 independent experiments--because the first experiment is not inherently more important than the other two. Does that seem reasonable to you? If yes, can you explain how I would then add contrasts comparing experiments 2 and 3 (while retaining all the other contrasts that are already in the experiment)? I've never heard of this as a concern in any deseq2 vignette that I've read.

ADD REPLY
0
Entering edit mode

If you only have 6 samples, I don't think you can expect to model two different variables at once. Just compare groups and call it a day.

ADD REPLY
0
Entering edit mode

Interesting. In your opinion, how many samples would one need to collect in order to make it worthwhile to account for sample differences? I'm still curious as to how I would account for the variability between all 3 samples, and would really appreciate hearing how I can do that if it becomes necessary down the line. Thanks for your response!

ADD REPLY
0
Entering edit mode

I'm not sure you can 'account' for it, or that you need to. There is variation between sample dates. This is why you need replicates, but I don't think you really care how exactly the samples differ by experiment date.

ADD REPLY
0
Entering edit mode

Yes. So it doesn't really matter what day each experiment happened. It's more that the experiments were done on 3 separate dates. I have been doing design = ~experiment_date + group (with group being my control or mutant conditions), so that would allow me to account for variability between different experiments(ie different experiment dates. I would like to know how to not have a "reference level" of experiment 1, or maybe no reference level at all so that I can account for all possible permutations of samples as a source of variability. You mentioned that I can use contrasts comparing 2 to 3 as well. Can you explain how I would do that? I understand from what you're saying that you don't believe that I necessarily need to do that, but I would like to understand how to do that, because I have read elsewhere that I should be accounting for variability due to separate independent experiments. Thanks so much.

ADD REPLY

Login before adding your answer.

Traffic: 611 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6