Recall that the idea of the partition function is to normalize the Boltzmann distribution. The probability of a state with energy $E_r$ should be proportional to $\exp(-\beta E_r)$, so we need to divide by $Z$ to get a proper probability distribution that sums to $1$, $$P(r) = \frac{\exp(-\beta E_r)}{\sum_k \exp(-\beta E_k) }. \tag 1$$
Now let's try to do this quantum mechanically with operators. A general state is a superposition of energy eigenstates, so it doesn't have a definite energy. But let's take the idea that the probability is proportional to $\exp(-\beta E_r)$ to hold for energy eigenstates. This is implemented by the operator
$$\tilde{\rho} = \sum_r |r\rangle e^{-\beta E_r} \langle r| \tag 2
$$
if the probability of being in a certain state is proportional to $\operatorname{Tr} (|\psi\rangle\langle \psi| \tilde{\rho})$. It is certainly true for energy eigenstates, because they are orthogonal, so $$\operatorname{Tr}(|s\rangle\langle s|\tilde{\rho}) = \sum_r \langle s| r\rangle \exp(-\beta E_r)\langle r|s\rangle = \sum_r \delta_{rs} \exp(-\beta E_r) = \exp(-\beta E_s).$$
Since any state $|\psi\rangle$ is a superposition of energy eigenstates, $|\psi\rangle = \sum c_s |s\rangle$, we can easily generalize to $$\operatorname{Tr}(|\psi\rangle\langle \psi|\tilde{\rho}) = \sum_s |c_s|^2 \exp(-\beta E_s) .$$
This is certainly reasonable because $|c_s|^2$ is the probability of the energy being $E_s$ when in the state $|\psi\rangle$, so we are just weighting the Boltzmann factors in a superposition accordingly.
Now we need to normalize $\tilde \rho$ so that we get an actual probability distribution from $\tilde \rho$. The normalization should be that if we sum over a complete orthonormal set of states, the probabilities add up to $1$. From (2), because the energy eigenstates are complete and orthonormal, we get that the normalization factor is $$Z = \sum_r \exp(-\beta E_r)$$
i.e. analogous to the classical partition function function.
We can express all this more basis-independently by noting that $\tilde \rho = \exp(-\beta \hat{H})$ where $\hat H$ is the Hamiltonian operator, and the sum over a complete orthonormal set is the trace, i.e., $Z = \operatorname{Tr} \exp(-\beta \hat{H}).$ Then, that no superpositions of energy states appear is simply because the trace is the sum of diagaonal elements, and we use an energy eigenbasis. If we were to express the partition function in some other basis, it would contain superpositions.
Try working out the simplest case, spin $\frac 1 2$: compute $\tilde{\rho}$ and $Z$ in the basis $\{ |+z\rangle, |-z\rangle \}$ with Hamiltonians $\hbar \omega_c \sigma_z$ and $\hbar \omega_c \sigma_x$.