Hi Vince,
Sounds like maybe you have a use case for a pintersect method for
DNAStringSetList objects:
setMethod("pintersect", c("DNAStringSetList", "DNAStringSetList"),
function(x, y, ...)
{
if (length(x) != length(y))
stop("'x' and 'y' must have the same length")
ux <- unlist(x, use.name=FALSE)
uy <- unlist(y, use.name=FALSE)
string_id <- selfmatch(c(ux, uy))
ux_id <- string_id[seq_along(ux)]
uy_id <- string_id[seq_along(uy) + length(ux)]
ux_group <- rep.int(seq_along(x), elementLengths(x))
uy_group <- rep.int(seq_along(y), elementLengths(y))
m <- IRanges:::matchIntegerPairs(ux_group, ux_id, uy_group,
uy_id)
keep_idx <- which(!is.na(m))
ux <- ux[keep_idx]
ux_group <- ux_group[keep_idx]
ux_id <- ux_id[keep_idx]
sm <- IRanges:::selfmatchIntegerPairs(ux_group, ux_id)
keep_idx <- sm == seq_along(sm)
ux <- ux[keep_idx]
ux_group <- ux_group[keep_idx]
ans_skeleton <- PartitioningByEnd(ux_group, NG=length(x))
relist(ux, ans_skeleton)
}
)
Then:
> alleles1 <- DNAStringSetList("A", c("C", "A"), c("G", "A", "T"),
c("T", "G"))
> alleles2 <- DNAStringSetList(c("T", "A", "G"), c("A", "G"), "C",
c("G", "T"))
> pintersect(alleles1, alleles2)
DNAStringSetList of length 4
[[1]] A
[[2]] A
[[3]] A DNAStringSet instance of length 0
[[4]] T G
Should take about 20 sec. on your 12-million long vectors.
Then use elementLengths() on this result to get the lengths of the
intersecting sets.
HTH,
H.
On 10/15/2013 04:16 PM, Vince S. Buffalo wrote:
> Hi All,
>
> I have two vectors of alleles stored as DNAStringSetLists. For each
element
> in both lists, I need to find the length of the intersecting set.
Using
> mapply() and intersect() take too long, as does sapply(dna.set.list,
> as.character) (and then using mclapply or lapply to find intersect
on
> characters). Is there a fast way to do this? I have vectors ~12
million
> rows long.
>
> Vince
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319