If a were a scalar you could not have generated that term in the first place.
OK, I would defer to Peccei's 2006 review of which this is essentially an annotation. I'll be cavalier about precise normalizations and such, and use lots of ~ s and hand waving analogy, since the issues are conceptual and strategic, not hard calculations relying on impossible strong coupling QCD answers.
Let me first review the SSB of the standard axial $J_\mu ^{5 ~3}$ in low energy chiral symmetry breaking in QCD. To simplify things, ignore the charged pions and isospin, so π means $\pi^0 = \pi^3$, to focus on the heart of the process, chiral symmetry in the σ-model, through a mock axial U(1), really cartooning the three axials flanking the isospin SU(2) vector currents:
$$
\delta \sigma = \pi , \qquad \delta \pi = -\sigma .
$$
QCD interactions can be summarized by a mock effective potential, $\lambda ((\pi^2+\sigma^2)-f_\pi ^2)^2$, whose minimum dictates that, at the ground state $\langle \sigma\rangle=f_\pi$; so we redefine $\sigma= \sigma'+f_\pi$, so that the v.e.v.s of both π and $\sigma'$ are now at 0, as they should be.
You can work out the σ' has a mass, but not the π, as required by the Goldstone theorem. The goldston is always the particle that rotates to a constant,
$$
\langle \delta \sigma'\rangle = \langle \pi\rangle=0 , \qquad \langle \delta \pi\rangle = -\langle \sigma \rangle =-f,
$$
so π is the goldston, and by construction it is the pseudoscalar, not the scalar--That's how it couples to fermions and that's what QCD does (SSBreaks axials) and is simulated here.
Its more formal hallmark is the linear piece in the SSB conserved current,
$$
J_\mu ^{5} \sim \sigma \partial_\mu \pi - \pi \partial_\mu \sigma =f \partial_\mu \pi +\sigma' \partial_\mu \pi - \pi \partial_\mu \sigma',
$$
and so on. I am eschewing the radial language, since nobody gets Higgsed here, but you might also think of π as the goldston prepackaged in the current to be eaten and the σ′ as the physical scalar higgs.
It further happens that this current is anomalous, and triangle quark diagrams violate conservation of this current,
$$
\partial \cdot J^5 \sim \alpha \tilde F \cdot F ,
$$
leading to the decay of this goldston to two photons, summarized in the effective lagrangian as
$$
\frac{\pi}{f} \alpha F\cdot \tilde F ,
$$
a term that was not there classically, but arose quantum mechanically.
Note this term preserves parity --chiral symmetry breaking did not violate parity. And doing an axial transformation shifts the π in this lagrangian term, so it does not leave it invariant: that's what an anomaly is. Vector currents could not be anomalous, however, so SSBreaking a vector current with a scalar goldston would not give rise to such a term.
End of toy analogy. Much of it will be replicated below, with one astounding contrasting twist.
In the Peccei-Quinn mechanism, the object is to get rid of an anomalous P and CP violating term involving gluon fields,
$$
\theta \operatorname {Tr} G\cdot \tilde G .
$$
It summarizes the anomaly of a flavor axial $U(1)_A$ analogous to above,
and it seems to be absent (as per neutron EDM limits).
P&Q get rid of it by positing another chiral $U(1)_{PQ}$ which is also SSBroken spontaneously/dynamically whatever, and its goldston is likewise a pseudoscalar (with the quantum numbers of its current), called an axion a, at a different scale fa.
Its σ is massive, broad, invisible, one doesn't much care (as in the case of the above elusive low-energy QCD analog meson σ). As long as the pseudoscalar axion shifts under a PQ rotation.
The PQ current is also anomalous--that's why the posited an axial current-- and, as in the case of the π, the relevant term in the quantum effective lagrangian is ~
$$
\xi \alpha_s \frac{a}{f_a}\operatorname {Tr} G\cdot \tilde G ~,
$$
now involving strong fields and couplings, since it has axial couplings to quarks, just like the (oversimplified) pion caricature above; \xi is a number.
For the higglet a, even though it started out life as a massless goldston, these gluon couplings lead to an effective potential for it, giving it a mass and turning it to a pseudo-Goldstone boson; the PQ symmetry is broken explicitly. PQ argue ingeniously that the topological structure of QCD, very nonperturbative, makes this inevitable/plausible, beyond the range of the question here.
The point is the original embarrassment term $\theta$ and the rescue term $\xi a/f_a$ are now appearing in a sum, and the latter is left to shift at the bottom of the QCD-induced potential and absorb the former (the embarrassment term), $\langle \xi a+ f\theta\rangle =0 \equiv \xi \langle a_{phys}\rangle$.
This is in dramatic contrast to the pion discussion above. The axion shifts both in its nonlinear, Goldstone, transformation law and by itself (like the $\sigma$ did!) to minimize the potential--it's two in one. Two potentials relax: the PQ notional one and the effective potential of the axion, here.
(Actually the pion is also a pseudogoldston, by dint of quark mass terms, breaking its chiral symmetry explicitly, hence it picks up a small mass by Dashen's theorem and PCAC, very analogous to the minuscule axion mass developed here by the curvature of the effective potential.)
A scalar goldston would correspond to a SSBroken vector current, which would then not be anomalous, so the mechanism would fail.