Clearer error messages in SingleCellExperiment cbind
1
0
Entering edit mode
bharris • 0
@bharris-12711
Last seen 4.6 years ago

When using cbind on two SingleCellExperiment Objects, errors where objects have different reduced Dims return a message that makes it appear that the issue is with the 'int_colData'. The error message should say that dimensions are incompatible in the reducedDims.

In reality it probably makes most sense to actually remove any reducedDims from cbinded objects since the space that would be plotted by combining coordinates from two different latent spaces make no sense.

singlecellexperiment sce scater • 1.7k views
ADD COMMENT
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 54 minutes ago
The city by the bay

The error message looks pretty clear to me:

library(SingleCellExperiment)
example(SingleCellExperiment, echo=FALSE) # generate 'sce'

sce2 <- sce
reducedDim(sce2, "PCA") <- reducedDim(sce2, "PCA")[,1,drop=FALSE]

cbind(sce, sce2)
## Error in value[[3L]](cond) :
##   failed to combine 'int_colData' in 'cbind(<SingleCellExperiment>)':
##   failed to rbind column 'reducedDims' across DataFrame objects:
 ##  failed to rbind column 'PCA' across DataFrame objects:
##   number of columns of matrices must match (see arg 2)

The first line about the int_colData is generated because that's how the reduced dimensions are stored internally, but the rest of the error message is pretty clear about what the problem is. Technically, we could edit the message to get rid of the int_colData line, but that would require us to abandon the auto-generated error messages that we get for free in the cbind,DataFrame-method implementation. Intercepting the errors and replacing them with custom messages would require a decent amount of work and I think the current state is informative enough.

In reality it probably makes most sense to actually remove any reducedDims from cbinded objects since the space that would be plotted by combining coordinates from two different latent spaces make no sense.

Dropping reduced dimensions would go under the definition of "surprising and unexpected behavior". cbind and other low-level operations should do what they're told - in this case, to stick objects together by column. Generally speaking, these operations should be consistent with the behavior of subsetting, so I should be able to do:

sce.first <- sce[,1:100]
sce.second <- sce[,-(1:100)]
re.sce <- cbind(sce.first, sce.second) # should be effectively the same as 'sce'.

Low-level operations should not make any judgements on the statistical/scientific sensibility, only on the coherency of the data structure. Indeed, in the case above, the reduced dimensions correspond to the same latent space, so it's eminently sensible to plot the result. More generally, I've had the need to store multiple t-SNEs for separate subsets of the same dataset; the most efficient approach is to just bind all the t-SNEs into a single reducedDim entry and inform downstream applications that the coordinates should only be plotted for one subset at a time.

ADD COMMENT

Login before adding your answer.

Traffic: 598 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6