The No Free Lunch Theorems

For Supervised Learning, Optimization, & Search




Sections




This article summarizes several key points about the no free lunch (NFL) theorems.

These are a set of mathematical results about the efficacy of methods in search, optimization, and supervised learning. Especially metaheuristics like evolutionary or nature-inspired swarm algorithms. Do they really “work”? What does it even mean to say that they “work”?




The Papers and What They Say

The no free lunch theorems are frequently brought up in online discussions, but often without any depth or mention of the nuances, which is part of my motivation for writing this article. These discussions tend to cite the NFL theorems as saying “there is no best algorithm” whenever someone asks if one algorithm is better than another. This is, of course, too simplistic of a description. What I am hoping to do here is to go just one step further and summarize the key points of each paper.

Another key reason for the existence of this article is that the authors specifically single out the existence of free lunches in coevolutionary algorithms.

Before we dig into the materials, I would like to point out that the authors of these theorems have a website dedicated to this topic. However, do note that some of their links are broken at the time of writing.




Supervised Learning

The very first paper on the NFL theorems was for supervised learning:

This paper is infamous for being “often cited but rarely read” because it consists of 52 pages and appears to be a rather difficult read. I must admit that I have not read it in its entirety as well. In fact, the author wrote a version much later that is a lot easier to read:

The author introduced a new “Extended Bayesian Formalism” framework and analyzed the expected error or cost \( \mathbb{E}(C | d) \) of a supervised learning algorithm, given training dataset \( d \).

Key Points

In English, this means that if we do not know anything about the objective function \( f \) that we are trying to learn, then any function is equally possible, and any two algorithms will have the same expected error on new unseen data points.




Two more papers were published in 1996 containing NFL theorems for search and optimization:

I have not read the one for search algorithms, but the reasoning should be similar to optimization, which can be seen as searching for the maximum or minimum of an objective function.

Key Points

In English, this means that any two algorithms will have the same probability of generating any particular outcome when averaged over all possible objective functions!




Coevolutionary Free Lunch

Finally, the authors wrote a paper much later showing the existence of free lunches for coevolutionary algorithms:

I was very excited to read this at first, but it turns out that this result has a massive caveat and is not the discovery of an actual free lunch.

Key Points

In English, this means that for algorithms whose payoffs depend on the opponent’s response, some algorithms can do better than others. Trivially, an algorithm that chooses to ignore the opponent’s moves will perform worse than those that try to get information from the opponent’s moves.

This Stack Exchange discussion has a very nice one-line summary of why free lunches exist in this setting:

“Lunches are free because the opponent is buying.”




When Can One Algorithm Outperform Another?

The NFL theorems say that the performance of all algorithms is the same when averaged over all possible objective functions. However, it is still possible for one algorithm to beat another in several ways.




Follow-Up Research

There are also numerous directions for future research.

Here is a review paper not written by the original authors, but one that I find very useful to keep up with the various developments in this area: