Entering edit mode
Hi,
I am using scater runUMAP
to calculate scRNA-seq UMAP coord and plot UMAP plot.
I noticed that runUMAP uwot to calculate UMAP according to https://github.com/Alanocallaghan/scater/issues/76.
I find the X1 and X2 coord when choosing n_components = 3 is not same as X1 and X2 coord when choosing n_components = 2
> set.seed(123456)
> head(uwot::umap(iris, n_components = 3))
[,1] [,2] [,3]
[1,] -8.783597 -5.717427 2.266894
[2,] -9.552045 -6.461876 3.968688
[3,] -9.005832 -6.536717 3.583495
[4,] -9.136230 -6.481047 3.739561
[5,] -8.944820 -5.986025 2.252077
[6,] -9.329567 -5.138022 1.645643
> set.seed(123456)
> head(uwot::umap(iris, n_components = 2))
[,1] [,2]
[1,] -10.291664 -1.2414723
[2,] -9.610963 -3.0971170
[3,] -10.281700 -2.7837709
[4,] -10.078126 -2.9112540
[5,] -10.548049 -1.3866089
[6,] -10.114601 -0.3645576
So I have a question that If I want to show a 2D UMAP and 3D UMAP for same scRNA-seq data, should I use calculate UMAP separately using n_components = 2 and 3
or use two coord in n_components = 3
result when plotting 2D UMAP ?
It's to be expected the UMAP (and t-SNE) embeddings will differ greatly when created in 2D and 3D. That's because unlike PCA, if you calculate a 2D UMAP embedding what's happening isn't calculating a truncated version of the full UMAP, it's specifically trying to find a 2D representation that maximises the objective function (in UMAP and t-SNE, that's something like "preserving local neighbourhood structure"). When you make a 3D UMAP, you're giving it another dimension to find such a representation, and it's likely (nay, inevitable) that this will lead to both of the 2D representations contained in the 3D being totally different to a 2D UMAP of the same data. Therefore displaying 2 dimensions of a 3-dimensional embedding is likely (almost inevitable) to omit some representations of the structure that are captured in the third dimension.
Furthermore I actually find this idea of creating 3+ dimensional t-SNEs and UMAPs kind of silly. The whole point of these visualisation is to make some easier-to-interpret representation of this complex high-dimensional data, whether that be clusters or trajectories. If you make a 3+dimensional UMAP then you've still got a difficult-to-interpret space.
Thanks, I get it :)