5

The definition of multimodal learning and NFL theorem is clear to me. My question is, if model good at a specific field might perform badly in another field, is there any need to find out a multimodal model?

My current explanation is that for a model good at some fields, such as medical CV and NLP, it might fail in some other fields that they don't need to be good at, such as food images and recipes. However, this failure in food isn't important for the goal of training. Thus, it is necessary to push forward multimodal learning?

2 Answers2

3

No free lunch just proposes that to be good at something a model must be equally bad at something else.

It is alive and well in multi-modal models. Each of the modalities exhibit NFL to be good at their own modality. Finally, whatever model is integrating the output from these models likely has its own bias which makes it good at some task and bad at others.

No free lunch does not preclude the existence of multimodal models nor dues multimodal models disprove NFL.

foreverska
  • 2,347
  • 4
  • 21
2

No.

The NFL theorem is widely misunderstood and over-cited. It almost never applies to real-world problems, because real-world problems have a lot of structure. The real world is not pure chaos sampled uniformly at random from the space of all possibilities. To quote [1]:

Such a world would be hostile to inductive reasoning. The assumption that labelings are drawn uniformly ensures that training data is uninformative about unseen samples.

In contrast to this dismal outlook on machine learning, naturally occurring data involve structure that could be shared even across seemingly disparate problems. If we can design learning algorithms with inductive biases that are aligned with this structure, then we may hope to perform inference on a wide range of problems.

This can be formalized with ideas from algorithmic information theory like Kolmogorov complexity and algorithmic probability. Quoting [1] again:

While virtually all uniformly sampled datasets have high complexity, real-world problems disproportionately generate low-complexity data, and we argue that neural network models share this same preference, formalized using Kolmogorov complexity.

Notably, we show that architectures designed for a particular domain, such as computer vision, can compress datasets on a variety of seemingly unrelated domains.

Our experiments show that pre-trained and even randomly initialized language models prefer to generate low-complexity sequences.

Whereas no free lunch theorems seemingly indicate that individual problems require specialized learners, we explain how tasks that often require human intervention such as picking an appropriately sized model when labeled data is scarce or plentiful can be automated into a single learning algorithm.

These observations justify the trend in deep learning of unifying seemingly disparate problems with an increasingly small set of machine learning models.

References:

  1. The no free lunch theorem, Kolmogorov complexity, and the role of inductive biases in machine learning. ICML 2024 spotlight.
user76284
  • 375
  • 2
  • 15