DeSeq2 Model Problems with species, tissue, and tissue status nesting
2
0
Entering edit mode
tbiewerh • 0
@ce3ea609
Last seen 3 months ago
United States

Hi everyone,

A coworker used DESeq2 for a project we worked on together, but based on the results I'm not sure the contrasts performed were doing what we thought they were. Could y'all take a look and help me figure out what the contrasts are doing, and how I should change them so that they do what we intended them to?

We have RNA-seq data that has two different species, each with three tissues (pollen, style, and leaf), each with subcategories (ungerminated and germinated pollen, unpollinated and pollinated style, and several leaf stages). Given the nesting structure (tissue status in tissue, tissues in species) we made the following model matrix:


                id species tissue1 status status_nested
1    Slyc_polgerm1     lyc  pollen   germ             1
2    Slyc_polgerm2     lyc  pollen   germ             1
3    Slyc_polgerm3     lyc  pollen   germ             1
4  Slyc_polungerm1     lyc  pollen ungerm             2
5  Slyc_polungerm2     lyc  pollen ungerm             2
6  Slyc_polungerm3     lyc  pollen ungerm             2
7    Slyc_stygerm1     lyc   style   germ             1
8    Slyc_stygerm2     lyc   style   germ             1
9    Slyc_stygerm3     lyc   style   germ             1
10 Slyc_styungerm1     lyc   style ungerm             2
11 Slyc_styungerm2     lyc   style ungerm             2
12 Slyc_styungerm3     lyc   style ungerm             2
13   Spen_polgerm1     pen  pollen   germ             1
14   Spen_polgerm2     pen  pollen   germ             1
15   Spen_polgerm3     pen  pollen   germ             1
16 Spen_polungerm2     pen  pollen ungerm             2
17 Spen_polungerm3     pen  pollen ungerm             2
18   Spen_stygerm1     pen   style   germ             1
19   Spen_stygerm2     pen   style   germ             1
20   Spen_stygerm3     pen   style   germ             1
21 Spen_styungerm1     pen   style ungerm             2
22 Spen_styungerm2     pen   style ungerm             2
23 Spen_styungerm3     pen   style ungerm             2
24 Spen_polungerm1     pen  pollen ungerm             2
25   Slyc_leafP3r1     lyc    leaf     P3             1
26   Slyc_leafP3r2     lyc    leaf     P3             1
27   Slyc_leafP3r3     lyc    leaf     P3             1
28   Slyc_leafP4d1     lyc    leaf     P4             2
29   Slyc_leafP4d2     lyc    leaf     P4             2
30   Slyc_leafP4d3     lyc    leaf     P4             2
31   Slyc_leafP4p1     lyc    leaf     P4             2
32   Slyc_leafP4p2     lyc    leaf     P4             2
33   Slyc_leafP4p3     lyc    leaf     P4             2
34   Slyc_leafP5d1     lyc    leaf     P5             3
35   Slyc_leafP5d2     lyc    leaf     P5             3
36   Slyc_leafP5d3     lyc    leaf     P5             3
37   Slyc_leafP5p1     lyc    leaf     P5             3
38   Slyc_leafP5p2     lyc    leaf     P5             3
39   Slyc_leafP5p3     lyc    leaf     P5             3
40   Slyc_leafP6d1     lyc    leaf     P6             4
41   Slyc_leafP6d2     lyc    leaf     P6             4
42   Slyc_leafP6d3     lyc    leaf     P6             4
43   Slyc_leafP6p1     lyc    leaf     P6             4
44   Slyc_leafP6p2     lyc    leaf     P6             4
45   Slyc_leafP6p3     lyc    leaf     P6             4
46   Spen_leafP3r1     pen    leaf     P3             1
47   Spen_leafP3r2     pen    leaf     P3             1
48   Spen_leafP3r3     pen    leaf     P3             1
49   Spen_leafP4d1     pen    leaf     P4             2
50   Spen_leafP4d2     pen    leaf     P4             2
51   Spen_leafP4d3     pen    leaf     P4             2
52   Spen_leafP4p1     pen    leaf     P4             2
53   Spen_leafP4p2     pen    leaf     P4             2
54   Spen_leafP4p3     pen    leaf     P4             2
55   Spen_leafP5d1     pen    leaf     P5             3
56   Spen_leafP5d2     pen    leaf     P5             3
57   Spen_leafP5d3     pen    leaf     P5             3
58   Spen_leafP5p1     pen    leaf     P5             3
59   Spen_leafP5p2     pen    leaf     P5             3
60   Spen_leafP5p3     pen    leaf     P5             3
61   Spen_leafP6d1     pen    leaf     P6             4
62   Spen_leafP6d2     pen    leaf     P6             4
63   Spen_leafP6d3     pen    leaf     P6             4
64   Spen_leafP6p1     pen    leaf     P6             4
65   Spen_leafP6p2     pen    leaf     P6             4
66   Spen_leafP6p3     pen    leaf     P6             4

We defined this as 'meta' in the R file. A few notes on it. ungerm and germ for style is unpollinated and pollinated style tissue. I worry this may be an issue because its called the same for ungerminated and germinated pollen tissue. Also, one of the ungerminated pollen tissue samples from the 'pen' species is separated from the other, but this matches with the column order in the read counts dataframe.

Here's how we defined the associated model matrix:

model_matrix <- model.matrix(~species+tissue1+tissue1:status_nested+tissue1:species, meta)

We also modified the ensuing model matrix to remove combinations of factors that had no observations:

all.zero <- apply(model_matrix, 2, function(x) all(x==0))
idx <- which(all.zero)
model_matrix <- model_matrix[,-idx]

Then we defined the DESeq2 object and ran DESeq2:

df <- DESeqDataSetFromMatrix(countData = readCounts, tidy=T, colData = meta, design =~1)
dds <- DESeq(df, full = model_matrix)

Then we ran several contrasts to determine genes that were significantly different between or across certain tissues. I think this is where issues may have cropped up, but I'm having difficulty understanding what contrasts they actually ran and how to change them so that they are comparing what we intended to compare.

Here are the resultsNames() from our dds object:

[1] "Intercept"                    "speciespen"                   "tissue1pollen"               
 [4] "tissue1style"                 "tissue1leaf.status_nested2"   "tissue1pollen.status_nested2"
 [7] "tissue1style.status_nested2"  "tissue1leaf.status_nested3"   "tissue1leaf.status_nested4"  
[10] "speciespen.tissue1pollen"     "speciespen.tissue1style"

The first contrasts were trying to compare pollen expression to leaf expression, and style expression to leaf expression irrespective of species:

pollen_v_leaf <- results(dds, contrast = c(0,0,1,0,0,0.33,0,0,0,1,0))
style_v_leaf <- results(dds, contrast = c(0,0,0,1,0,0,0.33,0,0,0,1))

The next contrast was trying to compare expression between species across all tissues:

lyc_vs_pen <- results(dds, contrast = c(0,1,0,0,0,0,0,0,0,0.33,1))

Then we ran two contrasts to try and compare pollen expression between species and style expression between species:

pollen_species <- results(dds, contrast = c(0,0,0,0,0,0,0,0,0,1,0))
style_species <- results(dds, contrast = c(0,0,0,0,0,0,0,0,0,0,1))

The pollen species contrast was the first I noticed had issues, based on the log fold changes for certain genes being in opposite directions, despite changing expression a large amount in the same direction between species. Then I noticed that none of the log fold change results really made sense. I changed the pollen contrast to this to see if it would fix it:

pollen_speciesTest <- results(dds, contrast = c(0,0,1,0,0,0,0,0,0,1,0))

I changed it to this because I thought that adding the 1 in the third spot would possibly compare the pollen expression from one species ("tissue1pollen" from resultsNames for the third position) with the expression in the other species ("speciespen.tissue1pollen"). This resulted in the log fold change values being closer to ones I calculated for these species, but they still weren't exact, and many genes, especially those with considerable expression in none pollen tissues, were far off what the log fold change differences should have been for pollen.

Thank you for your time! If you have any idea about where things might be going wrong I would appreciate the assistance. If there's any other information you think would help in understanding the issue I will happily add it to this post.

DESeq2 R • 577 views
ADD COMMENT
1
Entering edit mode
tbiewerh • 0
@ce3ea609
Last seen 3 months ago
United States

Hi Everyone,

I figured out the issue I was having. I was getting strange log2FoldChange values because I wasn't actually contrasting what I thought I was. If you're getting negative log2FoldChange values for something that you know should be positive this may help you.

One feature of base DESeq is that it chooses the first row values in the model matrix as the control. This meant for me that everything I had, at least when I had only one value in my contrast list, was being compared to "specieslyc.tissue1pollen.status_nested1" or something similar. Not exactly what I wanted, especially when I was trying to compare style tissue expression between species.

To make it not use an intercept, and instead compare all values in a way you could easily pull out as contrasts, you can do this:

#First I united species and tissue into one column. 
#I saw this recommended elsewhere (comment by radlibcountryfan on this post: https://www.reddit.com/r/bioinformatics/comments/ydfofi/deseq2_design_consideration/).
#This makes comparing values a bit easier in my simplified model
metaTest <- meta %>%
  unite("species_tissue1", c(species, tissue1))

metaTest$species_tissue1 <- as.factor(metaTest$species_tissue1)

#Here's the important change that made all the values make sense. When defining the design add a ~0 at the beginning and that gets rid of the intercept (the control)
#This is useful because I don't have a control. I'm comparing tissues between species, not between treatments. 
#Found this on this post: https://support.bioconductor.org/p/9138737/
ddsMetaTest <- DESeqDataSetFromMatrix(countData = readCounts, tidy = T, colData = metaTest, design = ~0 + species_tissue1)

ddsMetaTestOut <- DESeq(ddsMetaTest)

#This helped me check that there was no control (Intercept), and that I could contrast anything I d*** well pleased without DESeq2 choosing a control for me. 
resultsNames(ddsMetaTestOut)

#Then I used the contrast function to compare the expression in the species_tissue1 column between pollen expression in one species to pollen expression in the other. 
resMetaTest <- results(ddsMetaTestOut, contrast = c("species_tissue1", "lyc_pollen", "pen_pollen"))

After I did this all the DESeq outputs matched perfectly with the expression changes between tissues. I have to go and figure out how to make sure the nesting is done correctly, because I learned all this through a simplified method that ignored my data's nesting, but this is a good place to start. Hopefully this helps anyone else wanting to compare between RNA samples that don't necessarily have a control.

If there's anything I did wrong, or you have an idea about how I should handle my data nesting issue, please do tell me so I can improve this post.

Cheers.

ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 10 hours ago
United States

I don't have any feedback for your post, because that's not how I would do it, so I didn't read it that closely and don't want to try to tell you how to do it that way. The important part for me is that you are using a conventional treatments-contrast parameterization, which while correct is suboptimal from the perspective of trying to extract the parameters of interest. I think the best example of what I mean can be found in the limma User's guide, in section 9.5.4 and 9.6.1.

The take-home message from those sections is that you can fit this parameterization, but there's no profit in doing so. It's much easier to construct group labels for all the groups, fit a cell means model and then compute whatever contrast you need. IIRC, there is something similar in the DESeq2 vignette, but I cannot find it at the moment.

Login before adding your answer.

Traffic: 942 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6