Is it possible to subset peak regions in a dba object with counts prior to differential analysis? In particular, I would like to just analyze regions called in one or more of a subset of samples used for differential analysis in a larger dataset. Other subsetting would be helpful as well, ideally using a logical vector the same length as the number of peak regions in the dba object.
There is a sites parameter to some of the plotting functions to limit plots to a specific subset (using a logical vector), but no direct way to limiting an analysis using a logical vector of sites.
In the specific case you mention, where sites are limited to regions that overlap in a subset of samples, there are several ways to accomplish that. For example, you can generate a new DBA object with a subset of peaks by calling dba() with a mask of peaksets and specify a minOverlap (how many of those peaksets a regions should be called in). If you wanted to use these overlapping (or consensus) peaks for an analysis involving other peaksets, you can save the consensus peakset using dba.peakset() with bRetrieve=TRUE and supply that as the value for the peaks parameter in the call to dba.count(). Likewise, you can retrieve the global peakset and subset directly using a logical vector, then supply this to dba.count() using the peaks parameter.
If you want a different subsets for multiple contrasts, you would need to create different DBA objects for each contrast.
Thanks. Could you provide clarification on how to generate a new DBA object with a subset of peaks by calling dba() with a mask of peaksets and specify a minOverlap? It seems to me that if I do something like
filteredDBA <- dba(originalDBA, mask=somekindofmask, minOverlap=2)
then somekindofmask needs to be a sample mask (length is number of samples) and not a peak mask (length is number of regions) or I get subscript out of bounds. If I use a sample mask, I get a nice subset of samples as desired, but the minOverlap has no effect, all peak regions are retained. Worse, the called object gets dropped, so I can't use that to remove regions (I assume that the called object has info about whether the region was called in a particular sample, 1 is called, 0 is not). It would be helpful if I didn't have to recount after subsetting, since that is fairly slow.
You'll have to re-count for each different consensus peakset you use.
Here's an example using the sample data provided with DiffBind. In this case, let's say we want to use a consensus peakset that includes sites called in at least half of the six samples that are not based on the MCF7 cell line:
Thanks. Could you provide clarification on how to generate a new DBA object with a subset of peaks by calling dba() with a mask of peaksets and specify a minOverlap? It seems to me that if I do something like filteredDBA <- dba(originalDBA, mask=somekindofmask, minOverlap=2) then somekindofmask needs to be a sample mask (length is number of samples) and not a peak mask (length is number of regions) or I get subscript out of bounds. If I use a sample mask, I get a nice subset of samples as desired, but the minOverlap has no effect, all peak regions are retained. Worse, the called object gets dropped, so I can't use that to remove regions (I assume that the called object has info about whether the region was called in a particular sample, 1 is called, 0 is not). It would be helpful if I didn't have to recount after subsetting, since that is fairly slow.
You'll have to re-count for each different consensus peakset you use.
Here's an example using the sample data provided with
DiffBind
. In this case, let's say we want to use a consensus peakset that includes sites called in at least half of the six samples that are not based on the MCF7 cell line: