Question

Reproducibility of MultiDataSet Comparison

0

Entering edit mode

Dario Strbenac ★ 1.6k

@dario-strbenac-5916

Last seen 5 hours ago

Australia

In the journal article explaining MultiDataSet, it is claimed that MultiDataSet is better than the competing MultiAssayExperiment because "... MultiAssayExperiment is being developed by a large number of contributors making package improvements (to MultiAssayExperiment) slow and tedious." I wrote to the maintainer of MultiDataSet with a couple of feature requests last month and didn't get any reply about them. Can the article's claim that MultiDataSet maintainers are more methodical than MultiAssayExperiment maintainers be reproduced?

MultiDataSet MultiAssayExperiment • 2.0k views

ADD COMMENT • link updated 7.6 years ago by Levi Waldron ★ 1.1k • written 7.6 years ago by Dario Strbenac ★ 1.6k

0

Entering edit mode

Dario, we apologize for not having replied your requests so far. The maintainer was on holidays. Today, he has read your questions and we are going to answer you in the coming days.

I'd like to take advantage of your post (and Levi’s ones) to clarify that in our manuscript we do not claim that MultiDataSet (MDS) is “better” than MultiAssayExperiment (MAE) because MAE is being developed by a large number of contributors. We just pointed out that this was a limitation when developing a package. Anyway, we are happy to see that after having written that MAE development was improved.

If you are interested in knowing how MDS and MAE compare with regard to time loading, memory usage, … we can write a post about that.

ADD REPLY • link 7.6 years ago juanr.gonzalez • 0

score 4 · Answer 1 · 2017-09-08

I can't comment on MultiDataSet, but I can say that MultiAssayExperiment was designed by a large number of contributors over more than two years. It was really challenging to design something that would be extensible to an open-ended set of contained data classes (including on-disk and remote data representations) for developers working independently without changing the MultiAssayExperiment code base, while still being easy for end users to do subsetting (select rows, columns, assays) and reshaping to formats compatible with most existing software (merge assays and specimen information into long or wide-format dataframes or lists of matrices). However, one developer, Marcel Ramos, is responsible for almost all of the code base, and each time the group agreed on something it has not taken him long to implement it!

score 4 · Answer 2 · 2017-09-08

Also, I'm quite proud of the internal graph representation of the specimen:assay relationships (sampleMap(MAE)). It's a novel approach to maintaining consistency between specimens and any number of assays with replicates, missing data, and arbitrary per-assay naming conventions, but still have harmonized subsetting and single-command integration of assays into long and wide format dataframes. In simple multi-assay experiments you don't have to think about it; in more complex experiments you just have to create a three-column dataframe that matches column names in each assay to row names of the biological specimen data.