What is the theoretical justification for the Law of Equipartition of Energy?
Why are equal energies distributed in each degree of freedom even though sometimes they are completely different (like translation and rotation)?
What is the theoretical justification for the Law of Equipartition of Energy?
Why are equal energies distributed in each degree of freedom even though sometimes they are completely different (like translation and rotation)?
Let's start from an ideal gas.
A quick overview
You might know that the probability distribution of the speed $v_x$ along one direction is given by the Maxwell-Boltzmann distribution
$$P(v_x)=\sqrt{ {m \over 2\pi k_B T}}e^{-mv_x^2 \over 2 k_B T}$$ which is a gaussian distribution with variance $$\langle v_x^2 \rangle= {k_B T \over m}$$ Because the three unidimensional components of the speed are independent, i.e. $$\langle v^2 \rangle =\langle v_x^2+v_y^2+v_z^2\rangle={3\over 2}{k_B T\over m}$$ and of course for the kinetic energy
$$\langle K \rangle = {1\over 2}m\langle v^2\rangle={3 \over 2} k_B T$$
You can derive this expression from a purely mechanical description of a gas of particles following the idela gas law and it already shows that the answer to your question resides in the way the variable "mean squared speed" is distributed in such a system.
So, let's try to generalise it!
A "simple" statistical mechanics argument
[The following requires some statistical mechanics - if you don't know about ensebles etc. you might need to learn a bit more or go the end for a brief explanation]
Suppose you have a 1D Hamiltonian $H$ which depends on the position $q$ and the momentum $p=mv$ quadratically, so that
$$H=Aq^2+Bp^2$$ with $A$ and $B$ dimensional constants (for the kinetic energy, $B=1/2m$ of course). We choose the canonical ensemble (constant temperature, allowing heat flow with a reservoir) and we get that the probability distribution for the particles being in a state $(q, p)$ i.e. having a given momentum and position is given by
$$\rho(q, p)={1\over Z}exp(-H/k_B T)$$
(this, in turn, comes originally from the Ansatz from Boltzmann that the entropy can be written as $S=k_B ln(W)$ with $W$ the number of microstates). In this case, $Z$ is a normalisation factor such that the integral of $\rho$ is 1, i.e. $$Z=\int dq dp \rho(q, p)=\int dq dp {1\over Z}exp(-H/k_B T)$$
We are going to need the following remark later: notice that we can rewrite $Z$ as $$Z=\int dq dp {1\over Z}exp(-H/k_B T)=\int dq dp {1\over Z}exp(-(Aq^2+Bp^2)/k_B T)$$
and by splitting the exponential and separating the integrals $$Z=\int dq dp exp(-Aq^2/k_B T)exp(-Bp^2/k_B T)$$ $$\int dq exp(-Aq^2/k_B T)\int dp exp(-Bp^2/k_B T)$$ so that we have a $q$-dependent part and a $p$-dependent part $$Z=Z(q)Z(q)$$ with $Z(q)=\int dq exp(-Aq^2/k_B T)$ and $Z(p)=\int dp exp(-Bp^2/k_B T)$
What is the mean energy of the system? We need to average the energy (i.e. the Hamiltonian) using the distribution we found, i.e.
$$\langle H \rangle = \int dq dp H(p, q)\rho(q, p)$$ so that
$$\langle H \rangle = \int dq dp H(p, q){1\over Z}exp(-H/k_B T)$$
For the sake of simplicity, because $H=Aq^2+Bp^2$ and $\rho(q, p)={1\over Z}exp(-(Aq^2+Bp^2)/k_B T)={1\over Z}exp(-Aq^2/k_B T)exp(-Bp^2/k_B T)$ we can split the integral in two parts and let's focus only on the $Aq^2$ part of the Hamiltonian.
$$\langle E(q)\rangle=\int dq {1\over Z(q)} exp(-Aq^2 /k_B T)Aq^2 \int dp {1\over Z(p)}exp(-Bp^2 /k_B T)$$ where I seprated the $q$-dependent part of the integral from the $p$-dependent one. Notice also that of course that the mean energy is given $\langle E\rangle =\langle E(q)+E(p)\rangle$, where $E(p)$ comes from an identical term except we change $Aq^2$ with $Bp^2$. The integral runs over the full 1D volume, let's say it goes from $-\infty$ to $\infty$.
Surprisingly (?) that Integral is pretty easy to solve!
The $p$-dependent part is by itself normalised! By the definition of $Z(p)$ above we get $$\int dp {1\over Z(p)}exp(-Bp^2 /k_B T)=1$$ So we are only left with
$$\langle E(q)\rangle=\int dq {1\over Z(q)} exp(-Aq^2 /k_B T)Aq^2$$
Now, from the definition of $Z(q)$ we get [it's an easy Gaussian integral, you can check it ;) ] $$Z(q)=\int dq exp(-Aq^2/k_B T) = \sqrt{\pi k_B T / A }$$ so we get
$$\langle E(q)\rangle=\int dq {1\over \sqrt{\pi k_B T / A } } exp(-Aq^2 /k_B T)Aq^2$$
which again is easy to solve, because it is the integral of a squared quantity with Gaussian distribution i.e. it will give the variance of the Gaussian so that [If I did not make any mistake] $$\langle E(q)\rangle= {1/2} k_B T$$
And that's it: you can even do the same with the $p$-dependent part. Because we have a distribution which is an exponential of the hamiltonian, automatically all $p$ and $q$ term (squared) will give us a mean energy of ${1/2}k_B T$.
The same results holds in other ensebles (microcanonical, etc) with different probability distribution, but the very general result that given a Hamiltonian $H$ the averages you take of the form
$$\langle Aq^2 \rangle_H=\langle Bp^2 \rangle_H$$ always equal ${1\over2} k_B T$ is general, only require ergodicity i.e. the possibility of taking ensemble averages as time averages and ultimately go back to the Boltzmann formula $S=k_B ln(W)$.
Note that we "proved" it for a simple Hamiltonian of the kind $Aq^2+Bq^4$. This is the Hamiltonian one gets in the very general case where the system is close to the minimum of the energy landscape. Note also that a similar treatment for simple Hamiltonians of the kind $H\sim q^4$ for example would give similar results of the kind $\langle H\rangle =\alpha k_B T$ with a different prefactor $\alpha!=1/2$. So there really is nothing special about equipartition except the fact that squared terms are the most prominent ones usually!
An explanation without Stat. mech.
As a general summary, the point is that if you have an Hamiltonian of the kind $H=H(p, q, ..)=H_0+a_1 q+bp+a_2q^2+...$ you can get the mean energy corresponding to each term as $\langle a_nq^n\rangle$. If you make very general assumptions about the probability distribution governing your system, you find this very nice result that any term for which $n=2$ returns
$$\langle a_2q^2 \rangle=\langle b_2p^2 \rangle={1\over 2}k_B T$$
The reason for this resides in the way such mean values are computed and in some very general properties of the probabilty distribution. This all stems from the possibility of taking averages according to the number of microstates available, i.e. from Boltzmann's formula for the entropy.
A more "physical" explanation
Equipartition stems from a "coincidence" (or maybe the result of some approximations we made in Physics long time ago or some very general deep rule encoded in the rules of Physics!) giving to "squared" term of the Hamiltonian a bit of a privilege summed up by the fact that mean values of $x^n$ times a $m-th$ derivative of the Hamiltonian are linked to $k_B T$.
Also, quadratic terms are important because you can always expand your Hamiltonian as a power series around its minimum value, i.e. around the most probable one, as it's the one with lower energy. For simplicity, let's start with an Hamiltonian of a single variable $q$ with a minimum in $q_0$: $$H(q)=H(q=q_0)+dH/dq|_{q_0}(q-q_0)+d^2H/dq^2|_{q_0}(q-q_0)^2+...$$ where we throw away all higher order terms.
If the Hamiltonian has a minumum in $q_0$, then $dH/dq|_{q_0}=0$ and $$H(q)\approx H(q=q_0)+d^2H/dq^2|_{q_0}(q-q_0)^2$$ which is an Hamiltonian which is quadratic in $q$. So quadratic hamiltonians have a very big importance, and that is why we focus a lot on their properties, such as equipartition.
This means that at equilibrium, close to a minimum, every Hamiltonian is quadratic so knowing about equipartition helps!
You can also make some "symmetry" arguments, such as:
$k_B T$ is the scale of energy at the molecular level, so that squared terms (which are usually the symmetrical under $q->-q$ transformations and are more resistent to translations and renormalisations - remember that most energies are quadratic, as this is the first term after a Taylor expansion) must be linked to it. Why $1/2$? Math!
Finally: equipartition is a name which is sort of self-explanatory! The phyiscal concept underneath it is simply: at equilibrium, close to a minumum, the system populates each term of the Hamiltonian (which are quadratic close to the minimum) with the same energy, i.e. the energy is equi-partioned amongst all the terms!
Just adding an example here, but when going through the Ultraviolet Catastrophe derivation (Black Body Radiation derivation) we discover that applying the equipartition theory to each mode of vibration of the wave yields an obviously incorrect result for the radiated energy from the black box. The solution is to use Planck's discretized energy function, applied to the canonical ensemble (enter QM).
The equipartition theory can be derived by applying the canonical ensemble to a collection of solid Newtonian particles.