I would like to add a supplement to the accepted answer. The accepted answer is an excellent derivation based on physical principles, but after reading it, I couldn't help but feel that the answer is much more abstract than the question. This makes it feel like the mystery is being moved from the original question "How can magnetic forces do work?" to "Why is the potential energy of a magnetic dipole given by $-\vec{\mu} \cdot \vec{B}$?" Now, of course it is possible to address the latter question in a variety of ways (for example, one could take this answer for a current loop, and then make the loop infinitesimally small). However, I want to address this issue from a purely intuitive point of view and use nothing but the Lorentz force and Maxwell's equations.
Before I begin, I want to comment that multiple answers both here and elsewhere have addressed the fundamental issue, which is that a dipole is not the same as a charged particle. (Speaking very loosely, a dipole can be imagined as a current, or multiple moving charged particles, whose net electric charge is zero, and which are stuck in a certain configuration. So one can attribute the "work" done to whatever forces hold the dipole together, or, depending on the case considered, to forces that hold a collection of dipoles together, or to other forces specific to a particular situation.) Although this clearly indicates the fundamental error in the argument, it still does not give a concrete mental picture for how the Lorentz force leads to the work being done.
Intuitive Argument
For simplicity, imagine that a bar magnet consists of a collection of magnetic dipoles that are all parallel to the $z$ axis. Each of these magnetic dipoles can be replaced by an infinitesimal current loop, so it is reasonable to imagine the entire magnet replaced by one large current loop in a plane defined by $z=\textrm{const}$. This gives us the benefit of imagining the magnet as a collection of moving particles, so that we can refer to the Lorentz force. The simplest model of such a current loop is a resistive wire attached to a battery.
Instead of considering the interaction between this current loop and a paper clip or a second bar magnet, let's simplify the problem by postulating that there is an external, time-invariant magnetic field whose source is far away. The question is whether this magnetic field can do work on the current loop.
Consider three cases:
The external magnetic field $\vec{B}$ has a component that is not in the $z$ direction.
In this case, it is very easy to draw a picture and confirm, using the Lorentz law, that the loop will typically experience a net force or a net torque and, therefore, work will be done on it. This is explored quantitatively here.
Consider the case where the loop is a circle and $\vec{B}$ contains a component that goes radially outwards. There will be a net force in the $z$ direction.
Consider the case where $\vec{B}$ is uniform. There will be a net torque but no net force.
The external magnetic field $\vec{B}$ is parallel to $z$, but varies in value throughout space.
This is a violation of Maxwell's equations, which require $\vec{\nabla} \cdot \vec{B} = \frac{\partial B_z}{\partial z} = 0$. So, we can ignore this case.
The external magnetic field $\vec{B}$ is parallel to $z$ and uniform. In this case, there is no net force nor a net torque, so there is no mystery to be explained.
Additional Comments
There are a couple of details that writing the argument above forced me to consider.
More details on case 2
First, the interpretation of case 2 can be somewhat more complicated than what I wrote above. This is due to the idealization of the wire as being one-dimensional. One could arrange the wire so that, on the wire, $\vec{B}$ is parallel to $z$ and has constant magnitude, but $\vec{B}$ depends on $x$ and $y$ so that $\vec{\nabla} \cdot \vec{B} = 0$, and $B_z$ changes off of the wire. (I am not entirely sure whether constructing such a $\vec{B}$ that globally satisfies $\vec{\nabla} \cdot \vec{B} = 0$ is possible, but I am assuming for the moment that it is.)
The force in this case would be the same as case 3, i.e.,
$$\vec{F} = I \oint \left( d\vec{\ell} \times \vec{B} \right) = I \left( \oint d\vec{\ell} \right) \times \vec{B} = \vec{0},$$
since $\vec{B}$ is constant along the path of integration. A similar argument shows the torque to be zero. So, once again, there is no mystery to be explained. Note that, for an object with finite volume, the situation described in this paragraph would probably be categorized as part of case 1.
Relation to the model of a dipole as an infinitesimal wire loop
Another point to consider is how the types of arguments presented in this answer relate to the expression given for an infinitesimal dipole in the accepted answer. There, for a dipole with $\vec{\mu} = \mu \hat{z}$, the force on the dipole is given by $\mu \partial_z B_z \hat{z}$. However, in the last paragraph I admitted that, at least for a finite loop, it may be possible to come up with a magnetic field where similar conditions are met, but the force is zero.
The key here is to realize that, for an infinitesimal dipole, Maxwell's equations put a very strong constraint on the relationship between $\partial_z B_z$ and the other field components along the wire. For simplicity, take the case where the wire is a loop of radius $a$ in the plane $z=0$ and we will be interested in taking the limit where $a$ is very small. In the center of the loop, $\vec{B}(\vec{0})=B_0 \hat{z}$. eMoreover, assume that the magnetic field is radially symmetric. I will use cylindrical coordinates and suppress $\phi$ (due to symmetry) and $z$ (because we are only interested in the plane $z=0$). On the loop, the field has the form
$$ \vec{B}(a) = \left( B_0 + a \partial_r B_z \right) \hat{z} + a \partial_r B_r \hat{r} $$
The derivatives are all evaluated at $\vec{0}$. Now, the $z$-component will not contribute a net force, as discussed above, but notice that there may be a radial component, which we know can cause a net force on the loop. We get
$$ \vec{F} = \oint d\vec{\ell} \times \vec{B} = I \int_0^{2\pi} ( a d\phi \hat{\phi} ) \times ( a \partial_r B_r \hat{r} ) = -2 \pi a^2 I \partial_r B_r \hat{z} = -2 \mu \partial_r B_r \hat{z} $$
The critical observation is that $\vec{\nabla} \cdot \vec{B} = 0$ implies that $\partial_r B_r = -\frac{1}{2} \partial_z B_z$. So the expression above becomes
$$ \vec{F} = \mu \partial_z B_z \hat{z} $$
which was derived above.
The point is that Maxwell's equations require that you cannot have a nonzero $\partial_z B_z$ without also having a nonzero derivative for other components of the magnetic field, and we have already seen how the other components can cause a force in the $z$ direction.