Question

Differential gene expression: common changes in response to treatment across cell types

1

Entering edit mode

jason.wray ▴ 10

@jasonwray-14156

Last seen 7.1 years ago

Hi all,

We are using DESeq2 for differential gene expression analysis of RNA-seq data. We have three cell lines, in each we have performed shRNA-mediated knock-down of a transcription factor so we have control and knock-down for each cell line in triplicate. We would like to find the gene expression changes due to knock down that are common across the three cell lines.

> dds <- DESeqDataSetFromMatrix(countData = counts, colData = sampleData, design = ~ cell.line + shRNA)
> dds <- DESeq(dds)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing

> colData(dds)
DataFrame with 18 rows and 3 columns
    cell.line    shRNA sizeFactor
     <factor> <factor>  <numeric>
1           A      scr  0.8885007
2           A      scr  2.0095038
3           A      scr  1.9165557
4           A       kd  0.9561916
5           A       kd  1.4908854
...       ...      ...        ...
14          C      scr  0.9472558
15          C      scr  0.9923416
16          C       kd  0.4343480
17          C       kd  0.9292461
18          C       kd  0.3899622
> design(dds)
~cell.line + shRNA
> resultsNames(dds)
[1] "Intercept"        "cell.line_B_vs_A" "cell.line_C_vs_A" "shRNA_kd_vs_scr"

> results(dds)
log2 fold change (MLE): shRNA kd vs scr 
Wald test p-value: shRNA kd vs scr 
DataFrame with 27557 rows and 6 columns
                  baseMean log2FoldChange     lfcSE         stat      pvalue       padj
                 <numeric>      <numeric> <numeric>    <numeric>   <numeric>  <numeric>
ENSG00000000003   1.217422     0.00484854 1.1959788  0.004054035  0.99676536         NA

> res <- results(dds)
> res <- subset(res, padj < 0.05)
> res <- res[order(res$log2FoldChange),]
> res.dn <- head(rownames(res))

# plotting normalized counts for the genes from res.dn we get:

My understanding was that the design (~ cell.line + shRNA) would give me the overall effect of the shRNA taking into account differences between the cell lines. But when the counts are plotted it is clear that for many of the genes with padj < 0.05 the difference in expression is dominated by one cell line with low expression and/or little/no difference in expression for the others. How can we find the genes that respond in all three of the cell lines?

We can of course run a separate comparison for each cell line and then look for the common changes manually but I'd much prefer to incorporate it into the design from the beginning. Any advice on how to change the design or call results greatly appreciated.

deseq2 differential gene expression • 1.5k views

ADD COMMENT • link updated 7.1 years ago by Gavin Kelly ▴ 690 • written 7.1 years ago by jason.wray ▴ 10

score 1 · Answer 1 · 2017-10-11

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 7 hours ago

United States

To require a DE signal in an cell lines, there's not a single design formula, but you can look at the intersection of the three FDR bounded sets. You can do:

~cell.line + cell.line:shRNA

Then test,

resA <- results(dds, name="cell.lineA.shRNAkd")

and then look at

set <- resA$padj < x & ... & ...

ADD COMMENT • link 7.1 years ago Michael Love 43k

0

Entering edit mode

Thanks so much for the speedy response! I'll work through the method you suggest and see where it gets me. I'm not a statistician - is there a logical way to determine the padj cut-off to apply when using this approach? Can one use a less stringent cut off given that it's applied for three separate comparisons in this case?

ADD REPLY • link 7.1 years ago jasonpwray • 0

1

Entering edit mode

I wouldn't use a less stringent cutoff just because it's a three-way intersection. And I should say there isn't an unified FDR interpretation for the final set. It's just a post-hoc set, based of off three FDR bounded sets.

ADD REPLY • link 7.1 years ago Michael Love 43k

score 1 · Answer 2 · 2017-10-11

There are a couple of other approaches than Michael's that might also be worth a try, depending on the exact interpretation you want; if the replicates are technical replicates, then I think the estimates of variance using the interaction model would mainly reflect technical variation. You could do an LRT between full=~cell.line + shRNA and reduced = ~cell.line to look (colloquially) for global shRNA effects after normalising by cell.line. You'd still get potential domination of a single cell-line's effects, but it would be augmented by cell-line-independent changes as well. You could even remove things that were having a cell-line specific response from this list, by filtering out things that were significant in a cell.line*shRNA vs cell.line+shRNA LRT.

You could take it one stage further and do a likelihood ratio test between full=~shRNA and reduced=~1 which would be treating the cell lines as replicates (so you get the variance being a composite of technical and biological variation), but that's probably a step too far.

Or you could go down the limma/voom route, and model the between-cell-line variability separately from the within-line variability using the blocking that's available there, but perhaps there aren't enough cell-lines here to approximate well the diversity of cell-line populations.

Michael's approach will be the safest, in that you'll get transcripts that are changing in all three, whereas the LRT approach may pick transcripts that just fall outside the stringency in one of the lines.