Easy way to convert CharacterList to character, collapsing each element?
0
0
Entering edit mode
@herve-pages-1542
Last seen 1 day ago
Seattle, WA, United States
Hi Ryan, Michael, OK for unstrsplit. The generic is now in IRanges 1.21.18 (devel) with methods for ordinary list and CharacterList. There is also a method in Biostrings 2.31.6 (devel) for XStringSetList objects. See ?unstrsplit Cheers, H. On 12/16/2013 06:51 PM, Michael Lawrence wrote: > Btw, the name strunsplit is way better than my pasteCollapse. Maybe > tweak it to unstrsplit? Feels more like a verb. > > > > On Mon, Dec 16, 2013 at 4:16 PM, Hervé Pagès <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">> wrote: > > Hi Ryan, > > Here is one way to do this using Biostrings: > > library(Biostrings) > > strunsplit <- function(x, sep=",") > { > if (!is(x, "XStringSetList")) > x <- Biostrings:::XStringSetList("__B", x) > if (!isSingleString(sep)) > stop("'sep' must be a single character string") > > ## unlist twice. > unlisted_x <- unlist(x, use.names=FALSE) > unlisted_ans0 <- unlist(unlisted_x, use.names=FALSE) > > ## insert 'seq'. > unlisted_x_width <- width(unlisted_x) > x_partitioning <- PartitioningByEnd(x) > at <- cumsum(unlisted_x_width)[-end(__x_partitioning)] + 1L > unlisted_ans <- replaceAt(unlisted_ans0, at, value=sep) > > ## relist. > ans_width <- sum(relist(unlisted_x_width, x_partitioning)) > x_eltlens <- width(x_partitioning) > idx <- which(x_eltlens >= 2L) > ans_width[idx] <- ans_width[idx] + (x_eltlens[idx] - 1L) * > nchar(sep) > relist(unlisted_ans, PartitioningByWidth(ans_width)__) > } > > Then: > > > x <- CharacterList(A=c("id35", "id2", "id18"), B=NULL, C="id4", > D=c("id2", "id4")) > > strunsplit(x) > A BStringSet instance of length 4 > width seq names > [1] 13 id35,id2,id18 A > [2] 0 B > [3] 3 id4 C > [4] 7 id2,id4 D > > I'll add this to Biostrings. > > Cheers, > H. > > > > On 12/16/2013 03:04 PM, Ryan C. Thompson wrote: > > Hi all, > > I have some annotation data in a DataFrame, and of course since > annotations are not one-to-one, some of the columns are > CharacterList or > similar classes. I would like to know if there is an efficient > way to > collapse a CharacterList to a character vector of the same > length, such > that for elements of length > 1, those elements are collapsed with a > given separator. The following is what I came up with, but it is > very > slow for large CharacterLists: > > library(stringr) > library(plyr) > flatten.CharacterList <- function(x, sep=",") { > if (is.list(x)) { > x[!is.na <http: is.na="">(x)] <- laply(x[!is.na > <http: is.na="">(x)], str_c, collapse=sep, > .parallel=TRUE) > x <- as(x, "character") > } > x > } > > -Ryan > > _________________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/__listinfo/bioconductor > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: > http://news.gmane.org/gmane.__science.biology.informatics.__conductor > <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org=""> > Phone: (206) 667-5791 <tel:%28206%29%20667-5791> > Fax: (206) 667-1319 <tel:%28206%29%20667-1319> > > > _________________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/__listinfo/bioconductor > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: > http://news.gmane.org/gmane.__science.biology.informatics.__conductor <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> > > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
Annotation Cancer Biostrings IRanges Annotation Cancer Biostrings IRanges • 1.8k views
ADD COMMENT

Login before adding your answer.

Traffic: 939 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6