As you point out, self-phase modulation can be thought of as an added chirp, but the crucial point is that this is a local chirp that changes from the front to the back of the pulse, in essence because the phase velocity in the middle of the pulse is faster than at the edges, because of the changed refractive index at its higher intensity, and this compresses and decompresses the wavefronts in the leading and trailing edge of the pulse:

This introduces local frequencies that were simply not present within the original pulse spectrum, leading to a spectral broadening. (The pulse's electric field is in blue, instantaneous frequency is in red.)
Now, that's the wave picture of SPM, but as always in nonlinear optics, there is the wave picture and the 'photon' (spectral) picture, and normally you want to be able to produce a complete explanation within each of the two domains. In that regard, SPM is a third-order process so it is simply a version of four-wave mixing with two photons in and two photons out (so, normally $\omega_1$ and $\omega_2$ in and $\omega_1+\Delta$ and $\omega_2-\Delta$ out), but it's a complicated process because you have a bunch of photon energies available in your original pulse bandwidth and you need all their interactions to get the full picture, so it's not an easy description.
And finally, as to phase matching, if you only have a single spectral component (say, you have a quasi-monochromatic beam in one arm of a Mach-Zehnder interferometer and you're testing how the interference changes with the beam intensity) then the SPM will automatically phase-match. However, if you have a pulse and you're doing spectral broadening, then you need to do the same kinds of phase matching that you do for standard four-wave mixing, with the additional complication that you have a continuum of initial and final frequencies, and there doesn't seem to be any simple description of this other than just jumping into the nitty-gritty.