Correct way to split and unsplit a DataFrame
1
0
Entering edit mode
@mjsteinbaugh
Last seen 12 months ago
Cambridge, MA

I'm having trouble splitting and unsplitting a DataFrame, using the methods defined in IRanges. Here's an attempt at a minimal reprex.

library(IRanges)
df <- DataFrame(
    a = seq_len(4L),
    b = as.factor(rep(c("b", "a"), each = 2L)),
    row.names = LETTERS[seq_len(4L)]
)
print(df)
DataFrame with 4 rows and 2 columns
          a        b
  <integer> <factor>
A         1        b
B         2        b
C         3        a
D         4        a
split <- split(x = df, f = df[["b"]])
print(split)
SplitDataFrameList of length 2
$a
DataFrame with 2 rows and 2 columns
          a        b
  <integer> <factor>
C         3        a
D         4        a

$b
DataFrame with 2 rows and 2 columns
          a        b
  <integer> <factor>
A         1        b
B         2        b

This is all good and lets me manipulate the DataFrame by a grouping factor, similar to the approach in dplyr with group_by. However, I'm having trouble coercing the split back to a standard DataFrame via unsplit().

unlist() will coerce back to DataFrame but flips the row names, because we're not keeping track of our factor grouping:

unlist(split, use.names = FALSE)
DataFrame with 4 rows and 2 columns
          a        b
  <integer> <factor>
C         3        a
D         4        a
A         1        b
B         2        b

Neither one of these approaches with unsplit() seems to work:

unsplit(split, f = df[["b"]])
## Error in unsplit(split, f = df[["b"]]) : 
##   Length of 'unlist(value)' must equal length of 'f'
unsplit(split, f = split[, "b"])
## Error in `splitAsList<-`(`*tmp*`, f, drop = drop, value = value) : 
##   Length of 'value' must equal the length of a split on 'f'

See related S4 method definition:

getMethod(
    f = "unsplit",
    signature = "List",
    where = asNamespace("IRanges")
)
s4vectors iranges • 2.0k views
ADD COMMENT
0
Entering edit mode

The stack() function also gets close but doesn't unsplit back to the original DataFrame unmodified:

help(topic = "SplitDataFrameList", package = "IRanges")
stack(x = split, index.var = ".idx")
DataFrame with 4 rows and 3 columns
   .idx         a        b
  <Rle> <integer> <factor>
C     a         3        a
D     a         4        a
A     b         1        b
B     b         2        b
ADD REPLY
1
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States

Thanks, fixed in version 2.18.2, to appear.

ADD COMMENT
0
Entering edit mode

Perfect, thanks Michael!

ADD REPLY

Login before adding your answer.

Traffic: 531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6