DESeq2: Subset Results from 2-Factor Design
1
1
Entering edit mode
vanbelj ▴ 30
@vanbelj-21216
Last seen 10 months ago
United States

I performed a Nascent RNAseq experiment, which yielded TotalCounts and NascentCounts (Tc) for each gene. I have two conditions (Control vs Tx), each with 3 biological replicates. I analyzed the data with DESeq2 using posts about ChIP-seq and Ribo-seq as a guide.

My results surprised me: the top genes with significant LFC are the result of changes in TotalCounts, while NascentCounts stayed the same. I would like to look at genes that show significant LFC due to changes in NascentCounts, while TotalCounts stay similar.

Is there a way for me to subset this data to accomplish this?

Alternatively, I could compare TotalCounts in an independent, one-factor, DESeq dataset. I could then extract the list of genes that showed no significant changes, and use this list to subset my two-factor dataset.

Thanks for any help.

P.S. I've simplified sample names and output results in the code below; there m

dds <- DESeqDataSetFromMatrix(countData = countDataBoth,
                               colData = LUT,
                               design = ~ Assay + Treatment + Assay:Treatment)

results[order(results$padj, -results$log2FoldChange),]

| Gene    | log2FoldChange | padj      |
|---------|----------------|-----------|
| YJR145C | -1.4184356     | 2.97E-109 |
| YMR116C | -1.5101331     | 4.91E-101 |
| YOL120C | -1.3099833     | 5.05E-96  |
| YNL209W | -1.3878594     | 6.14E-84  |
| YGL103W | -1.500131      | 1.71E-81  |

#If we look at the countData for these genes...

| Gene    | Control1_Total | Control2_Total | Control3_Total | Tx1_Total | Tx2_Total | Tx3_Total | Control1_Tc | Control2_Tc | Control3_Tc | Tx1_Tc | Tx2_Tc | Tx3_Tc |
|---------|----------------|----------------|----------------|-----------|-----------|-----------|-------------|-------------|-------------|--------|--------|--------|
| YJR145C | 15143          | 12567          | 13793          | 7243      | 7225      | 6210      | 1723        | 1382        | 1435        | 1682   | 1759   | 1320   |
| YMR116C | 19678          | 17564          | 18366          | 7738      | 7754      | 6832      | 2594        | 2041        | 2165        | 2188   | 2200   | 1720   |
| YOL120C | 24257          | 19979          | 21604          | 12131     | 11822     | 10011     | 2693        | 2167        | 2326        | 2606   | 2631   | 2004   |
| YNL209W | 17890          | 15172          | 16693          | 9371      | 10003     | 7861      | 2576        | 1997        | 2191        | 2698   | 2854   | 2065   |
| YGL103W | 38008          | 32378          | 35210          | 20185     | 21474     | 16466     | 2862        | 2353        | 2501        | 3393   | 3558   | 2515   |
DESeq2 • 1.7k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 15 hours ago
United States

If you use an interaction term, you find when the ratio is different. If you want to specify additionally that the ratios are different and that total counts are relatively flat across treatment, you can set total counts to the reference level of assay, and then subset the interaction table to focus on genes where the treatment effect (LFC) in total counts is near 0. The treatment effect for total is something like:

results(dds, name="Treatment_Tx_vs_Control")
ADD COMMENT
0
Entering edit mode

Hi Michael,

To set the reference level, you use relevel correct? I already do this for my Treatment group - is it possible to relevel on Assay at well?

How would you recommend I define "near 0" for the subset step? Also, do you have an easy way to extract the list of genes? My code is generating a TRUE/FALSE logical vector that lacks the gene names, and I can't figure out how to get results@rownames instead.

Thanks!

dds$Treatment <- relevel(dds$Treatment, ref = "Control")
dds$Assay <- relevel(dds$Assay, ref = "TotalReads")
results <- results(dds, alpha=0.05, name="Treatment_NAPP1_vs_Control")
resOrdered <- results[order(results$padj, -results$log2FoldChange),]

| Gene    | log2FoldChange | padj      |
|---------|----------------|-----------|
| YJR145C | -1.3270206     | 2.97E-109 |
| YMR116C | -1.6408737     | 4.91E-101 |
| YOL120C | -1.2775043     | 5.05E-96  |
| YNL209W | -1.1953939     | 6.14E-84  |
| YGL103W | -1.188924      | 1.71E-81  |

NoChange <- (results$log2FoldChange > -0.1 & results$log2FoldChange < 0.1)
ADD REPLY
0
Entering edit mode

You need to relevel before running DESeq().

You could use altHypothesis="lessAbs" (see vignette) for near 0.

results() tables are as long as the dds so you can do logical indexing:

idx <- res$xyz < 123
res2[idx,]

This works because res and res2 are the same number of rows.

ADD REPLY
0
Entering edit mode

Hi Michael. I ran in to some issues; tried to break it up with relevant code.

#Basic setup
dds <- DESeqDataSetFromMatrix(countData = countDataBoth,
                               colData = LUT,
                               design = ~ Assay + Treatment + Assay:Treatment)

dds$Treatment <- relevel(dds$Treatment, ref = "Control")
dds$Assay <- relevel(dds$Assay, ref = "TotalReads")

keep <- rowSums(counts(dds) >= 20) >= 6 #keep genes with ≥20 counts in ≥6 samples
dds_filtered <- dds[keep,]

I constructed my dds using a reduced model, which requires test="LRT". However, it looks like altHypothesis=lessAbs requires test="Wald'. Perhaps I do not need to be using a reduced model?

#Reduced model with LRT
dds_filtered1 <- DESeq(dds_filtered, test="LRT", reduced= ~ Assay + Treatment)
results1 <- results(dds_filtered1, alpha=0.05)
results2 <- results(dds_filtered1, alpha=0.05, name="Treatment_NAPP1_vs_Control", lfcThreshold=.1, altHypothesis="lessAbs")
>Error in results(dds_filtered, alpha = 0.05, name = "Treatment_NAPP1_vs_Control",  : 
  tests of log fold change above or below a theshold must be Wald tests.

Or, if I do need to use reduced model, should I construct two separate DEseq datasets? For instance:

dds_filtered2 <- DESeq(dds_filtered, test="Wald")
results2 <- results(dds_filtered2 , alpha=0.05, name="Treatment_NAPP1_vs_Control", lfcThreshold=.1, altHypothesis="lessAbs")

However, this data set still appears to contain genes where there are large changes in TotalReads between Treatment groups, even though the p-values are now 1. Did this comparison generate the genes that I should ignore? Was expecting it to generate the opposite (genes of interest: genes where TotalReads do not change between Treatment group).

resOrdered2 <- results2[order(results2$padj, -results2$log2FoldChange),] 
resOrdered2
| Gene      | log2FoldChange | padj |
|-----------|----------------|------|
| YOR049C   | 4.11179723     | 1    |
| YKL071W   | 2.97587545     | 1    |
| YMR175W   | 2.43048576     | 1    |
| YLR346C   | 2.40842914     | 1    |
| YMR175W-A | 2.38710327     | 1    |

#countDataBoth for these genes
| Gene      | Control1_Total | Control2_Total | Control3_Total | Tx1_Total | Tx2_Total | Tx3_Total | Control1_Tc | Control2_Tc | Control3_Tc | Tx1_Tc | Tx2_Tc | Tx3_Tc |
|-----------|----------------|----------------|----------------|-----------|-----------|-----------|-------------|-------------|-------------|--------|--------|--------|
| YOR049C   | 6              | 8              | 7              | 187       | 146       | 123       | 3           | 6           | 5           | 129    | 94     | 74     |
| YKL071W   | 156            | 100            | 131            | 1450      | 1441      | 929       | 128         | 85          | 116         | 889    | 849    | 548    |
| YMR175W   | 337            | 313            | 367            | 2300      | 2511      | 2063      | 16          | 12          | 17          | 54     | 103    | 65     |
| YLR346C   | 35             | 39             | 45             | 287       | 267       | 238       | 19          | 18          | 37          | 174    | 152    | 123    |
| YMR175W-A | 240            | 185            | 206            | 1384      | 1621      | 1132      | 14          | 14          | 14          | 55     | 79     | 46     |
ADD REPLY
1
Entering edit mode

Yes, you need to use Wald test if you want to use lfcThreshold and altHypothesis. So just DESeq(dds) is sufficient: no reduced, or test, etc. test="Wald" is the default so you don't need to specify.

However, this data set still appears to contain genes where there are large changes in TotalReads between Treatment groups, even though the p-values are now 1.

Just remember, you asked to find genes where there are not large changes. You set the alternative hypothesis that the LFC is less than in absolute value from 0.1. So the null hypothesis is that the LFC are larger than that. Large p-value --> large changes, small p-value --> LFC close to 0. You would then take the genes with padj < X for whatever your desired FDR bound is. These are the genes that are not changing by more than 0.1.

ADD REPLY
0
Entering edit mode

I think I was too stringent with my lfcThreshold, which was returning zero genes with a padj < 0.05. I increased lfcThreshold to 0.5 and now have >2000 genes with padj < 0.05. Is there an appropriate statistical test that can be run to determine the ideal lfcThreshold for a given DESeqResults?

results1 <- results(dds_filtered1 , alpha=0.05, name="Treatment_NAPP1_vs_Control", lfcThreshold=.1, altHypothesis="lessAbs")
> length(results1@rownames[results1$padj <= 0.05])
[1] 0

results2 <- results(dds_filtered1 , alpha=0.05, name="Treatment_NAPP1_vs_Control", lfcThreshold=.5, altHypothesis="lessAbs")
> length(results2@rownames[results2$padj <= 0.05])
[1] 2316
ADD REPLY
1
Entering edit mode

No the threshold should be biologically motivated. You can think about 2^X and think about what fold change is biological relevant (or irrelevant).

ADD REPLY

Login before adding your answer.

Traffic: 741 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6