There are many ways to create a "deep embedding"---by which I mean to project an input data point into a vector in a feature space, where this projection is learnt. To be useful, this vector should encode something semantic about the data.
In various places I've seen the embedding referred to as an unconstrained value in Euclidean space:
$$z \in \mathbb{R}^D$$
If it was truly unconstrained, there might be issues in training (e.g. divergence) so there will pretty much always be some kind of regularisation involved in learning the function.
Alternatively the embedding is been L2-normalised before any further consideration
$$z' = z / |z|_2$$
which is equivalent to saying the embedding represents data on the surface of a hypersphere
$$z' \in \textrm{SO}(D)$$
Which of these is common practice in current deep learning?
In other words, is it standard practice for people to L2-normalise their embeddings?
There are various topological or geometric ways to argue for one way or the other. I'd like to understand what is common practice now. (And, if possible, why that choice is dominant.)
Related literature: In the following examples of relevant papers, you'll see that hyperspherical embeddings seem to be common in some corners of AI work, but not all.
- "Understanding contrastive representation learning through alignment and uniformity on the hypersphere" (ICML 2020). This paper "provides a comprehensive overview of the arguments for representation learning on the hypersphere" including "Spherical representations are associated with more stable training". The first sentence of this paper tells us: "A vast number of recent empirical works learn representations with a unit L2 norm constraint, effectively restricting the output space to the unit hypersphere"
- "NormFace: L2 Hypersphere Embedding for Face Verification" (CVPR 2017) - I think this method was influential. It gives some geometric arguments for L2 normalising values. See also FaceNet mentioned in this related question.
- "Deep Metric Learning with Spherical Embedding" (NeurIPS 2020) -- even on page 1 it's already clear that they're assuming that deep embeddings "should" lie on a hypersphere.
...and yet, see these very recent papers:
- In this ICML 2024 poster the authors claim that "embeddings learning is mostly done in Euclidean space" and they consider hyperspheres instead.
- "nGPT: Normalized Transformer with Representation Learning on the Hypersphere" (NVIDIA, 2024). Claiming some novelty in the way they use normed values throughout. -- They specifically point out that "the norms of embedding vectors in the original Transformer are unconstrained", thus I guess this is true for most transformers, as a result of the foundational work?