Hello Rob-
Each dot in the plot represents a consensus binding site. For the X axis, "concentration" refers to the mean (normalized) number of reads across all the samples for that binding site. This is reported as a log2 value, so as you go from left to right, the overall binding affinity (read density) is doubling. The dark blue are close to the origin represents sites that have very low binding affinity overall (ie very few ChIP-seq reads overlap those sites).
By default this is "Smoothed" plot, so instead of plotting each point (binding site), darker colors are used to show that there are a lot of points in that area. Lighter areas have few or no points. You can get a non-smoothed version of this plot setting bSmooth=FALSE
when calling dba.plotMA()
. You can get the same data expressed as a XY plot by setting bXY=TRUE
.
In your plot, the dark spot near the origin is a cluster of sites that have very low read counts and also don't change much. The main dark region shows sites with increasing binding activity (high X values) but not much change between conditions (Y close to 0). Both of the dense blue areas are shifted slightly below a fold change of 0 (Y axis), indicating a tendency to see more reads in the second sample group.
The red points are "significantly differentially bound" sites. The absolute values of the fold changes are greater than 2 (since the Y-axis is also on a log2 scale, indicates at least a 4-fold change in binding affinity). The red dots on the outer diagonal lines are usually sites that have no binding in one condition and substantial binding in the other condition.
A colleague of mine calls these "fish plots", as many of them (such as yours) remind her of tropical fish.
Cheers-
Rory
Thank you Rory, very much appreciated. One further question, what dictates the formation of the specifically diagonal nature of the outer lines of points? Also is there a particular way to specifically isolate those diagonal points, say in a BED file or some other output?
The diagonals form when there is no signal in one sample group, but there are a concentration of overlapping reads in the other sample group. In that case there is a linear relationship between the mean concentration (mean number of reads in all samples) and fold change (essentially the mean number of reads in the sample group where the signal is preset), and a linear relationship make a diagonal line.
If you get the result of a report using
dba.report()
, you can find the sites where the concentration in one sample group is very low (eg, < 2) but the absolute value of the log fold change is > 2.