The Axiom of Choice
A hopefully less daunting, and somewhat technical walkthrough.
The axiom of choice is a fundamental mathematical assumption that has far reaching, and counterintuitive consequences. It is also a source of much philosophical debate and kerfuffle over the foundations of mathematics.
Despite this, I have never really come across what I would consider a clear and satisfactory explanation. This is even after doing a PhD in mathematics! On the other hand, there are many videos and articles, aimed at a less technical audience. But I feel that they tend to skip over too much.
Like the subtitle implies, I am going to try and provide an explanation that is as complete as possible, with as little mathematics as possible. For readers who are interested in the full gory technical details, the Stanford Encyclopedia of Philosophy has a great write up.
Set Theory and Russell’s Paradox
The axiom of choice is an axiom in set theory. This means that we need to talk about set theory and define what a set is before we can proceed.
Axiomatic set theory is often regarded as the standard foundation of mathematics. This is a list of rules that mathematicians use when working with objects call “sets”. All mathematical objects are sets, and these objects can be gathered together to form new sets. For example, each whole number can be viewed as a set, and we can gather all the even ones together, to form the set of even whole numbers.
Mathematicians were working with sets before they were axiomatized by Zermelo and Fraenkel. However, it was discovered that if we did not have a list of rules that carefully state what is allowed to be a set, we run into problems like the famous Russell’s Paradox. The paradox goes like this:

Let $ P $ be defined as the set of all sets that do not contain itself.

If $ P $ does not contain itself, then it fits the definition above and should be in $ P $ .

However, if $ P $ contains itself, the definition would be violated. So it cannot contain itself.

So, we are stuck with the unpleasant situation where $ P $ cannot contain itself, yet must also contain itself.
This is very similar to the sentence “this statement is false”. If the sentence is true, then it must be false. If it is false, then it must be true!
Choice Functions
Now that we have the concept of a set, we can go on to talk about choice functions. These choice functions are what the axiom is referring to. To define what a choice function is, we first start with a collection $ X $ of nonempty sets
Let $ \bigcup X $ be the union of all sets in collection $ X $ . Then, a choice function on $ X $ is a mapping $ f : X \rightarrow \bigcup X $ such that,
Intuitively, given a set $ X_i $ , the choice function $ f $ chooses an element $ f(X_i) $ from the set $ X_i $ .
The Axiom of Choice
Now we can formally state the axiom of choice.
I find that the axiom is much easier to explain with an equivalent formulation of the axiom, via the cartesian products of sets.
Cartesian Product of Sets
We define the Cartesian product of two sets $ X_1 $ and $ X_2 $ as
Elements of this product can be represented by 2tuples. Similarly, for an arbitrary number of sets $ \{ X_i \, \,  \, \, i \in I \} $ , their Cartesian product is
Elements of this more general definition are functions, and not just tuples.
The Axiom of Choice Again
With the definitions out of the way, we can state the equivalent version of the axiom of choice using cartesian products.
Now, let $ X = \{ X_i \, \mid \, i \in I \} $ for some index set $ I $ . Then, the cartesian product of the sets in $ X $ is
If this cartesian product is nonempty, then we can find at least one function $ f : I \rightarrow \bigcup_{i \in I} X_i $ , such that for all input values $ i $ , the output $ f(i) $ is an element in $ X_i $ .
This is essentially a way to “choose” an element from each $ X_i $ . In fact, this version of the axiom can be shown to be equivalent to the previous one.
The Crux of the Problem
Axiomatic set theory allows us to have nonempty sets that contain elements. Given one of these nonempty set $ X_i $ , we can always “choose” an element from it with these steps:
 The definition of a nonempty set means that such that $ \exists x $ such that $ x \in X_i $ .
 The rule of existential instantiation allows us to refer to an element $ y $ satisfying $ y \in X $ , without actually knowing what $ y $ is or how we got it.
 A choice function defined on $ X $ is then given by $ f(X) = y $ .
However, suppose that instead of a single nonempty set, we have a collection containing a countably infinite number of sets $ X = (X_1, X_2, ...) $ . We can attempt to apply the same reasoning to choose an element from $ X_1 $ . Then repeat to choose an element from $ X_2$ and so on. We can keep repeating until we reach some set $ X_n $ . This allows us to explicitly construct an element of $ X_1 \times X_2 \times ... \times X_n $ , and hence demonstrate that it is nonempty.
But here lies the crux of the problem: while we can do this for any finite $ n $ , there is no way to do this for all of the infinite number of sets in $ X $ . This is the crucial reason why the axiom of choice has to be an axiom.
On the other hand, assuming that the axiom of choice is true leads to counterintuitive consequences, such as the infamous BanachTarski paradox, which allows us to split an object into two copies that are identical to the original object.
Bonus Section : Application in Linear Algebra
This section is not required to understand the axiom of choice. Please feel free to skip it.
However, before we end this article, I would like to highlight an interesting application of the axiom of choice in linear algebra: it can be used to prove that every vector space has a basis. The key part of the proof applies Zorn’s Lemma, which is equivalent to the Axiom of Choice. A very brief summary of the proof goes like this.

Let $ X $ be the set of all linearly independent subsets of a vector space $ V $ .

$ X $ contains the empty set, and so is nonempty. It is partially ordered by inclusion $ \subseteq $ .

Every chain (totally ordered subsets) in $ X $ has an upper bound (union of all elements in the chain).

Every finite subset of this union has to be linearly independent. Hence, the upper bound is in $ X $ .

Apply Zorn’s Lemma to get a maximal element $ L \in X $ .

$ L $ must span $ X $ . If not, there will be a $ y \in X $ so that $ L \subset L \cup \{ y \} $ .
$ L $ is linearly independent and spans $ X $ . So, $ L $ is a basis for $ X $ .