I would like to use DataFrame
class to represent data.frame
with nested data frames. For example, a data frame that have a list of data frame as column (one data frame for each row).
library(S4Vectors) df <- DataFrame(a=c(1,2,3), b=c("a","b","c")) df
Outputs:
DataFrame with 3 rows and 2 columns a b <numeric> <character> 1 1 a 2 2 b 3 3 c
Now add a list of data frames as new column of DataFrame. These data frames may have different columns and number of rows.
df$c <- list(DataFrame(x=c(1,2)), DataFrame(x=1,y=2), DataFrame()) df
Outputs an error:
DataFrame with 3 rows and 3 columns Error in as.vector(x, mode = "character") : no method for coercing this S4 class to a vector
But it works:
df[2, 3] [[1]] DataFrame with 1 row and 2 columns x y <numeric> <numeric> 1 1 2 df[1, 3] [[1]] DataFrame with 2 rows and 1 column x <numeric> 1 1 2 2
However it returns a list of 1 element..
Is there a better way to work with nested data frames using Bioconductor base classes?
I wonder whether these nested-data-frame structures are really consistent with R's vectorization and end-user (including the person who creates these objects!) comprehension?
For me a more natural way to represent this (when all nested DataFrame have the same columns) would be a single data frame with column(s) describing the 'partitioning'
df$group
of rows into groups. Operations on columns (e.g., 'take the log of column x') are easily vectorized(df$logx <- log(df$x)
) and many group-wise operations can be efficiently implemented using the *List infrastructure (e.g., the mean of column x by group,mean(splitAsList(df$x, df$group))
).Even if the data frames have different structure, I do think that a 'tidy' data structure will in the end be more useful.
Thank you Martin for the comment. Actually, the nested data frames may have different shapes (number of columns and rows). This data I am working on came from web APIs (using httr and jsonlite packages). I will update my example.