Question

DeSeq2 design for sgRNA fold-change

0

Entering edit mode

XTR5 ▴ 10

@p1000

Last seen 3.5 years ago

United States

I am reanalyzing some CRISPR-Cas9 screening data to look for sgRNAs effective across cell lines.

countData:

enter image description here

colData:

enter image description here

Condition here corresponds to time (initial vs. final).

I can set up the DESeqData like this, where the fold-change result is from the effect of time:

DESeq2::DESeqDataSetFromMatrix(countData = counts, colData = colData, design = ~ condition)

Or I can try to correct for the fold-change that results from cell line:

DESeq2::DESeqDataSetFromMatrix(countData = counts, colData = colData, design = ~ cellType + condition)

The distribution of fold-changes is similar regardless of design choice, which makes sense as there should be a high degree of overlap across cell lines:

design = ~ condition:

enter image description here

design = ~ cellType + condition:

enter image description here

Is one design strategy recommended over the other? Thanks in advance.

I am thinking about this section in the FAQ: http://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#multi-factor-designs

"Experiments with more than one factor influencing the counts can be analyzed using design formula that include the additional variables...By adding variables to the design, one can control for additional variation in the counts." I think controlling for cell line may be useful here.

DESeq2 • 981 views

ADD COMMENT • link updated 4.1 years ago by Michael Love 43k • written 4.1 years ago by XTR5 ▴ 10

score 2 · Accepted Answer · 2021-03-17

Is one design strategy recommended over the other? Thanks in advance.

I think that you may want to consult with a local statistician about this (this website is more for technical issues relating to Bioconductor packages). The design formula should capture each of those components that you want to compare, statistically, in your data, and permit that one can adequately answer the hypothesis being posed. The design formula can also address issues of confounding and allow for covariate adjustment.

Do you have evidence that you need to adjust for cellType? Usually, one would have concrete evidence, like, output from a PCA bi-plot, or, some independent laboratory or statistical test that alluded to cellType-specific effects.

One can also use packages like sva in order to identify 'extraneous' / unknown effects that may exist in your data, and to adjust for these.

Kevin