2

Most of self-supervised learning methods (SimCLR, MoCo, BYOL, SimSiam, SwAV, MS BYOL, etc.) use an n-sphere hypersphere where the extracted features (after encoder + projection/prediction head) are distributed. The loss function then uses the features distributed on this hypersphere for its loss computation. Papers such as:

  • Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere, Tongzhou Wang et al.; ICML 2020
  • Align Representations with Base: A New Approach to Self-Supervised Learning, Shaofeng Zhang et al; CVPR 2022
  • Rethinking the Uniformity Metric in Self-Supervised Learning, Xianghong Fang et al.; ICLR 2024

and others show that these features are distributed all over the n-sphere for each class. What are the different ways in which we can measure the distribution of these embedded features in this hypersphere? Say, if I were to randomly choose a class from ImageNet/CIFAR-100 dataset, how can I measure the distribution of all images belonging to this class on this n-sphere?

Arun
  • 255
  • 2
  • 8

1 Answers1

2

Measuring the distribution of features on an n-sphere for a specific class can provide insight into how well a self-supervised learning method captures class-specific structure. The measures of such distribution can be simply some directional statistics (spherical statistics) such as circular mean and circular variance.

Directional statistics (also circular statistics or spherical statistics) is the subdiscipline of statistics that deals with directions, axes or rotations in $R^n$. More generally, directional statistics deals with observations on compact Riemannian manifolds including the Stiefel manifold... The most common measure of location is the circular mean... The most common measures of circular spread are: The circular variance... circular dispersion

For your 3 mentioned papers, the ICML 2020 contrastive representation learning paper introduces two key measures known as Alignment and Uniformity. Alignment measures how close features of positive pairs, such as augmentations of the same image, are on the hypersphere, where a lower alignment score indicates that positive pairs are closer together. Uniformity measures how uniformly the features are distributed across the hypersphere, where a higher uniformity score indicates that features are more evenly distributed on the hypersphere.

The CVPR 2022 paper extends the alignment concept and introduces a novel Base Alignment metric to ensure robust feature representation across transformations, which measures the alignment of features with a pre-defined base vector such as a reference direction, ensuring that features remain close to the base direction for improved robustness under data transformations and augmentations.

The ICLR 2024 paper critiques and refines the uniformity metric introduced in the ICML 2020 paper with a more nuanced Normalized Uniformity metric to address some nuances such as class-specific feature magnitude scale variations. This metric ensures that features are not only evenly distributed but also maintain scale consistency.

There're also Intra-Class Compactness and Inter-Class Separability measures. Intra-Class Compactness computes the average pairwise distance between features within a same class to measure how tightly the features of a class cluster together on the hypersphere. Inter-Class Separability computs the distance between the circular mean vectors of different classes.

For entropy-based learning, Spherical Entropy computes the entropy of the feature distribution within the class on the hypersphere. Lower entropy suggests tighter clustering, while higher entropy indicates a more uniform distribution.

cinch
  • 11,000
  • 3
  • 8
  • 17