Question

scater cbind for singlecellexperiment objects problem

0

Entering edit mode

santos22903 • 0

@santos22903-17230

Last seen 5.3 years ago

Hello,

I am trying to merge two SingleCellExperiment objects using scater cbind, but it refuses to go through since mean_counts in rowData are different. They would better be different if I am merging different objects. So, does it mean that my options are either deleting rowData and merging or do something like cbind(counts(object1),counts(object2)) and then make a SingleCellObject, using the resulting matrix as counts?

Thank you.

scater SingleCellExperiment • 1.4k views

ADD COMMENT • link updated 5.3 years ago by Aaron Lun ★ 28k • written 5.3 years ago by santos22903 • 0

score 0 · Answer 1 · 2019-11-26

but it refuses to go through since mean_counts in rowData are different.

Yes, and generally speaking, this is the correct behavior for a core function like cbind. As a programmer, I would like these core functions to be strict with their inputs to avoid ambiguities in their outputs. (IMO, the cbind for SummarizedExperiment objects is already too forgiving; it allows the rowData() to differ, which could potentially cause other problems.)

If you think about it, the alternatives are not great:

We could arbitrarily choose one mean_counts to keep, but this would be misleading later on; the retained mean_counts wouldn't really have anything to do with being the mean counts for the combined object.
We could append something to the column name, e.g., mean_counts.1 and mean_counts.2. I find this too messy for combining multiple objects as the extended fields can pile up, e.g., what happens if you then combine the combined object with another SCE with mean_counts? Does it become mean_counts.3 (difficult to implement safely) or mean_counts.1.1?
We could drop the offending columns altogether, which is probably the safest approach that doesn't give an error. But if any downstream code depended on having mean_counts, it would take a while for users to figure out that cbind() was the culprit. The current set-up errors out immediately so as to provide a direct indication of the problem.

That said, as an analyst, I can appreciate that it would be nice to have a more relaxed combining function. I have something like this in correctExperiments(..., PARAM=NoCorrectParam()) from batchelor, which simply combines multiple SCEs together, trying to merge together common column metadata fields if available but not stressing out if they're not. This may find a better home in a core package such as SummarizedExperiment, probably as some kind of combineByCol and combineByRow function.

So, does it mean that my options are either deleting rowData and merging

You just get rid of the offending columns and cbind afterwards. Or you can try the correctExperiments() thing above, which will auto-delete all row data if it's not the same and combine the resulting objects.