In QAOA algorithm, two terms are being discussed; 1) clause or cost (C) Hamiltonian and 2) mixer consisting of pauli X gates.
What is the role of this mixer? Not clear why it comes after the C. Doesn't it cause the state to flip after evaluating C?
In QAOA algorithm, two terms are being discussed; 1) clause or cost (C) Hamiltonian and 2) mixer consisting of pauli X gates.
What is the role of this mixer? Not clear why it comes after the C. Doesn't it cause the state to flip after evaluating C?
 
    
    Probably the easiest way to understand this is to pretend that the mixer is NOT there and see what happens. So, let's assume you have some initial state $\lvert \psi \rangle = \sum_x \psi_x \lvert x \rangle$ and you want to use QAOA to find the ground state of some cost Hamiltonian $H_C$. I'm using the notation $\big\{\lvert x \rangle : x \in \{\pm 1\}^n \big\}$ for the $\sigma^z$-basis (the computational basis). Note that the cost Hamiltonian $H_C$ will be diagonal in this basis. $$ H_C = \sum_x E_x \lvert x \rangle \langle x \rvert $$ Applying the phase-shifter $U(\gamma) = \exp(-i \gamma H_C)$ to $\lvert \psi \rangle$ one obtains $$ \lvert \psi(\gamma) \rangle \equiv U(\gamma) \lvert \psi \rangle = \sum_x e^{-i\gamma E_x} \psi_x \lvert x \rangle $$ Let's stop here for now. If you measure the new state $\lvert \psi(\gamma) \rangle$ with respect to the computational basis you get a configuration $x \in \{\pm 1\}^n$ with probability $$ p_x = \lvert \langle x|\psi(\gamma) \rangle \rvert^2 = \lvert e^{-i\gamma E_x} \psi_x \rvert^2 = \lvert \psi_x \rvert^2 $$ which is exactly the probability you would have got if you had measured the state $\lvert \psi \rangle$ to begin with. Evolution with respect to $H_C$ has not changed this probability distribution so we haven't really gained anything.
Note that this implies that for any observable that is diagonal in the $\sigma^z$-basis, e.g. for the cost Hamiltonian $H_C$, we have $$ \langle \psi(\gamma) \rvert H_C \lvert \psi(\gamma) \rangle = \langle \psi \rvert U^{\dagger}(\gamma) H_C U(\gamma) \lvert \psi \rangle = \langle \psi \rvert H_C \lvert \psi \rangle $$ since $\langle x \rvert H_C \lvert y \rangle = E_x \delta_{xy}$. This is bad since the average energy functional (here I'm suppressing the dependence on the mixing time $\beta$ since we're pretending we're not using a mixer) $$ f(\gamma) = \langle \psi(\gamma) \rvert H_C \lvert \psi(\gamma) \rangle $$ is used by the classical optimization part of the QAOA in order to optimize the state $ \lvert \psi(\gamma) \rangle$: you want to find a value $\gamma$ that minimizes $f(\gamma)$. But we just saw that $f(\gamma)$ is a constant function of $\gamma$ so there's nothing to optimize here. You can't go down in energy (that is, in "cost") by choosing different values of $\gamma$.
From a physical point of view this is perfectly obvious: the system is evolving under closed-system dynamics and the energy (represented here by $H_C$) is conserved.
All of this changes if you then apply a mixer (where e.g. $B= \sum_i \sigma_i^x$) $$ U(\beta) \equiv \exp(-i\beta B) $$ to your state $\lvert \psi(\gamma) \rangle$ so that you get $$ \lvert \psi(\gamma,\beta) \rangle \equiv \exp(-i\beta B)\exp(-i\gamma H_C) \lvert \psi \rangle $$ Since $H_C$ and $B$ do not commute, the dynamics generated by $B$ will not in general conserve the "energy" (i.e. the cost) $H_C$.
Now the (correct) QAOA energy functional $$ f(\gamma,\beta) = \langle \psi(\gamma,\beta) \rvert H_C \lvert \psi(\gamma,\beta) \rangle $$ is no longer a constant function of $\gamma,\beta$ and you can use your favourite classical optimizer to minimize its value. That is, you will (in general) be able to find values $\gamma^*,\beta^*$ such that $$ \langle \psi(\gamma^*,\beta^*) \rvert H_C \lvert \psi(\gamma^*,\beta^*) \rangle < \langle \psi \rvert H_C \lvert \psi \rangle. $$ The exact same argument applies to any depth $p$ of the QAOA variational Ansatz.
