I am looking for advice on how to appropriately subtract background signal from true signal before performing analysis with DESeq2.
I've performed a cut-and-run experiment, which is akin to ChIP-Seq and reveals protein-genome interactions. The technique results in background signal at locations where DNA is less protected, and this background can obscure signal from bonafide interactions - such as when performing DE analysis (I say DE, but we are talking about differential binding here).
My protein is recruited to DNA downstream of other proteins. To quantify the amount of non-specific background signal, I degraded an upstream protein and then performed cut-and-run. Signal remained throughout the genome - this represents non-specific background signal. Background is often large relative to a gain in signal in response to treatment, and so these changes are often obscured when performed DE analysis . Luckily, the amount of signal appears to be reproducible (low variation between replicates).
I quantified gene-counts with Salmon. For each gene, I intend to subtract the average background signal away from average total signal. However, I am unsure if this would be appropriate because Salmon does more than a simple read count.
After subtracting out background signal, I intend to use the counts data in DESeq2 analysis.
Thanks for any advice.
Hi Michael,
In the linked thread, four samples are mentioned: Input for A, Input for B, IP for A, IP for B
And a ratio of ratios if taken: (IP for B / Input for B) / (IP for A / Input for A)
However, what I have is more akin to: IP for A, IP for B, IP for C
, where C is the background. Would the following ratio of ratios be appropriate in this case? (A/C) / (B/C)
Intuitively I think this ratio is an appropriate way of taking the background into account, but I'm not sure if its valid for the math that is going on behind the scenes of DESeq2.
Thanks!
This has been asked before on the support site a few times, but I don't know how you'd find those threads.
If you have three groups, (A/C) / (B/C) is just a pairwise comparison of A and B, C drops away. There is no such thing as, 'comparison of B and A with respect to a group C'.
Hi Michael,
In comparing A and B directly, would DESeq2 identify the following as DE? For a given gene, assume counts of:
Sample A: 500 Sample B: 525 Sample C: 500
I'm trying to get an idea of the sensitivity of DESeq2 when background signal is high. Luckily I have low variation between replicates, which my aid in detection. I'm unsure if comparing 500 vs. 525 will flag as DE? My original thought was to subtract out average counts from background, to get:
Sample A: 0 Sample B: 25
, which I would expect DESeq2 to identify as DE (as long as counts pre-filter is set low enough).
With DESeq2 you cannot modify the counts. If you want to restrict the DE results between A and B with respect to C, you can do that by requiring that A > C and B > C with pairwise tests, but the above doesn't strike me as fitting into the DESeq2 paradigm.
Dear Michael,
I actually had a similar issue, although not related to Chip-Seq, but rather Slam-seq. Slam-Seq is a metabolic labeling technique to analyze transcription, resulting in newly made RNA are pulse labeled with a modified nucleotide, which is later detected in RNA-seq as a T->C mismatch. The readout is the count of T->C conversions in transcripts, which conversions have a similar distribution to read counts (as they come from reads). However, there is intrinsic level of background noise arising from sequencing errors, unspecific labeling, and other factors (collectively: noise). I was toying with two approaches: one that would include background noise as a contrast in the design formula, and the other one, that first estimates the level of background, and then subtract the estimated background counts from sample counts. The latter of course is fiddling with counts, which I would gladly avoid is possible.
Consider the following time series experiment with 2 time-points (t0, t1) and 2 conditions (control, treatment). In addition, we use a separate measurement to determine the background level, and we assume it's the same for all samples. The design:
would fit the purpose, since the background is identical for all of the samples (assumption I have to make). But neglecting the background noise is bothering me, especially that the magnitude of noise can be determined quite precisely, the noise is significant (let's say the conversion of every third gene is below the noise level) and even transcript-specific levels of noise can be determined and accounted for if needed (there are reasons to believe the noise is transcript-specific to some extent)
I may be utterly wrong in the following analogy, but it's a bit as if you had a count matrix, and some evil dwarf randomly added a number to each count, sampling from a (normal or other) distribution with known mean and parameters. Having no way to determine the exact counts, would you just subtract the mean noise from each count, or would you use the counts as they are, knowing that they are all affected in the same way by the added noise?
Cheers, Lech
Further discussion:
[DEseq2] How to properly add a noise coefficient to the experiment design formula