I'm having trouble splitting and unsplitting a DataFrame
, using the methods defined in IRanges. Here's an attempt at a minimal reprex.
library(IRanges)
df <- DataFrame(
a = seq_len(4L),
b = as.factor(rep(c("b", "a"), each = 2L)),
row.names = LETTERS[seq_len(4L)]
)
print(df)
DataFrame with 4 rows and 2 columns
a b
<integer> <factor>
A 1 b
B 2 b
C 3 a
D 4 a
split <- split(x = df, f = df[["b"]])
print(split)
SplitDataFrameList of length 2
$a
DataFrame with 2 rows and 2 columns
a b
<integer> <factor>
C 3 a
D 4 a
$b
DataFrame with 2 rows and 2 columns
a b
<integer> <factor>
A 1 b
B 2 b
This is all good and lets me manipulate the DataFrame
by a grouping factor
, similar to the approach in dplyr with group_by
. However, I'm having trouble coercing the split back to a standard DataFrame
via unsplit()
.
unlist()
will coerce back to DataFrame
but flips the row names, because we're not keeping track of our factor grouping:
unlist(split, use.names = FALSE)
DataFrame with 4 rows and 2 columns
a b
<integer> <factor>
C 3 a
D 4 a
A 1 b
B 2 b
Neither one of these approaches with unsplit()
seems to work:
unsplit(split, f = df[["b"]])
## Error in unsplit(split, f = df[["b"]]) :
## Length of 'unlist(value)' must equal length of 'f'
unsplit(split, f = split[, "b"])
## Error in `splitAsList<-`(`*tmp*`, f, drop = drop, value = value) :
## Length of 'value' must equal the length of a split on 'f'
See related S4 method definition:
getMethod(
f = "unsplit",
signature = "List",
where = asNamespace("IRanges")
)
The
stack()
function also gets close but doesn't unsplit back to the originalDataFrame
unmodified: