Initially Pauli exclusion principle imposed for fermions was purely phenomenological conception introduced in order to explain experimental facts. The reason is that there is no well-defined theoretical description of the relation between the spin and the statistics in non-relativistic quantum mechanics (within which Pauli has formulated his principle initially). But it arises in relativistic quantum theory. I'll try to explain these things below.
Qualitatively about the statistics
In the 3-dimensional space there are only two possibilities for the behavior of the wave-function $\psi(\mathbf x_{1},\mathbf x_{2})$ of the two identical particles under adiabatical changing of their positions due to which the particles becomes to be interchanged. Under this action, the two-particle wave-function can only be changed as
$$
\tag 1 \psi(\mathbf x_{1},\mathbf x_{2}) \to \pm \psi(\mathbf x_{2},\mathbf x_{1})
$$
Note that this property follows from the topology of the relative phase space of two-particle state in the 3-dimensional space, and has nothing to do with any other arguments (in particular with experiment). This is purely theoretical argument.
Qualitatively about the spin
The spin is the quantity whose importance fundamentally emerges because of the Poincare symmetry of our world. Apart from its physical sense, it is the quantum number by which we mathematically characterize each (at least massive) particle due to its transformation properties under the Poincare group. Through the representations of the Poincare group, the description of the spin is related to the topology of the space-time.
The relation between the spin and statistics
So where these two conceptions, the spin and the statistics, meet each other? How the spin affects the statistics (and vice versa) and, in particular, how the Pauli exclusion principle follows from this? From the first point of view any relation between them is unnatural, at least from the point of view of topology. However, actually the relation exists.
The poincare invariance of the quantum theory requires that the hamiltonian density $\hat{H}(x)$ of the theory must commute with itself for space-like intervals:
$$
\tag 2 [\hat{H}(x),\hat{H}(y)] = 0 \quad \text{for} \ (x-y)^2<0
$$
The hamiltonian is composed from the field operators $\hat{\psi}_{a}(x), \hat{\psi}_{b}(y)$ quantized in terms of creation-destruction operators $\hat{a}(\mathbf p, s),\hat{a}^{\dagger}(\mathbf p, s)$ of the particles with arbitrary spin $s$. From the relation $(1)$ we know that the creation-destruction operators must obey
$$
\tag 3 [\hat{a}(\mathbf p, s),\hat{a}(\mathbf k, s)]_{\pm} =0,\quad [\hat{a}(\mathbf p, s),\hat{a}^{\dagger}(\mathbf k, s)]_{\pm} \sim \delta(\mathbf p -\mathbf k)
$$
Note that this statement is purely theoretical and has nothing to do with the phenomenology. By taking into account both $(2),(3)$, we obtain that
$$
\tag 4 [\hat{\psi}_{a}(x),\hat{\psi}_{b}(y)]_{\pm} = 0, \quad (x-y)^{2} < 0
$$
The structure of the field operators are completely fixed by their transformation properties and in particular by the spin value. The expression $(4)$ is the place where the spin and statistics meet each other. By treating it analytically, one obtains the Pauli exclusion principle.
Why there is no the relation between the spin and the statistics in non-relativistic quantum mechanics?
In the spirit of the statements written above it's not hard to understand why in non-relativistic physics the spin-statistics relation doesn't have theoretical base and can be only phenomenological. The reason is that in non-relativistic physics there is no requirement similar to $(2)$. Really, ortochronous Galilei group transformation, which ("plus" translations) represents the space-time symmetry in non-relativistic quantum mechanics, leaves unchanged chronological ordering in the S-operator. Contrary to the Galilei group, the Poincare group changes chronological ordering for space-like intervals. The latter is the underlying reason due to which we require $(2)$...