As a rule, I strongly discourage use of MOSFET avalanche in my design process.
The reason is explained here:
Some key facts about avalanche | Infineon AN_201611_PL11_002
Avalanche creates trapping and stored-charge defects in the gate oxide, which accumulates; eventually, the device fails. Along the way, VGS(th) may shift, drain or gate leakage current may increase, yfs may decrease, etc., but not really beyond tolerances, and, it just pops some time.
The single-pulse avalanche rating is just that: a single pulse, ever, over the entire device lifetime. For such acute conditions, power is the decider of fates, and only peak current and thermal energy need be considered. (Do not ignore peak current!)
Notice they never say how many single-pulse avalanche events a device can withstand. Obviously, at least long enough between shots for TJ to return to normal, but beyond that, how long, how many?
The reason is, they accumulate damage, but it isn't easy to tell how much.
Probably, a real device can withstand tens, hundreds, maybe even thousands of single-shot events, if I had to guess; but that's quite a lot less than a device might expect over a reasonable product lifetime. For example, out of 10 years, that's 3652 pulses if it happens daily on average. A home furnace controller, or various automotive solenoids, would easily use 10 times more, let alone anything that cycles faster still.
Repetitive avalanche ratings are rare, but do exist; I don't know how many cycles go into that rating (millions? billions? unlimited?), or if there's an industry-standard minimum lifetime in such service, or manufacturer-specific, but it's at least encouraging to see such a rating when one finds it. (One could find out by asking a manufacturer's FAE; obviously, I haven't had need to yet.) Downside: it's almost always a tiny amount, say a few mJ; this is no help to you.
Note that endurance may vary between product families, or vintages. Older families like Infineon (née IR) HEXFETs were rated quite generously; if I understand their construction correctly, avalanche occurred away from the channel entirely, largely sparing it (and the gate oxide) from exposure. As a result -- and combined with their low power density, demanding large dies for typical ratings, relative to what we have today that is -- they were famously robust (and bragged as much about it, too!).
As far as I know, modern trench and Superjunction MOSFETs are more vulnerable, and this is reflected by the lower IAS ratings (say, compared to maximum drain current) and frequent lack of repetitive ratings.
In summary: don't. Toss a TVS in there and you're good to go. TVSs are plain junctions and nothing else -- they aren't susceptible to charge trapping, and can be avalanched forever (as far as I know). If avalanche is repetitive, I suppose the TVS could be heatsinked, but more likely you should search for another mitigation, for example using a two-switch configuration to handle an inductive load: with a switch on both high and low sides, and clamping diodes (regular rectifiers), flyback is dumped into the supply rail, saving power rather than dumping it. (Or, an H-bridge can be used to PWM it for turn-on boost + idle/simmer + energy recovery, if that should be of value. Of course, doubling or quadrupling the transistor count might not be an option for you, or various other limitations like having a required common ground.)
As for avalanche in parallel, it is possible, but I don't recommend it.
Avalanche breakdown has a positive tempco, meaning voltage rises as the device heats up. If this margin (the amount by which voltage rises, within nominal operating temperatures) is within the tolerance of paralleled devices, then current will begin to share between them. That is, as one device heats up, its voltage rises until it crosses threshold of the next lowest part, and so on.
The result is, for N devices in parallel, you have a total equivalent capacity (current or power) strictly between 1 and N device ratings.
It's more than one, because you know at the very least, there must be leakage current from the additional devices. (Not at all a useful measure, but "strictly" greater indeed.) And it's less than N, because they're not matched (selected) parts.
Consider a fixed difference in threshold voltages between two devices. The lower one will always carry more current, and run hotter. It will run hotter than its companion by a constant difference, whatever the operating temperature. (Actually, the difference may increase with power dissipation; it depends on the exact temp curve. Which, again, isn't commonly documented.) The limit for the parallel pair comes when that device reaches its maximum temperature. The other device will be at a temperature somewhere between equal, and room temperature, depending on how far apart they are. Therefore the parallel combination has a rating somewhere between 1 and 2 times a single device.
I wouldn't be too afraid of operating 2% zener diodes in parallel (which are "zener" in name only above 6V or so, where the avalanche effect actually dominates; we just call them this out of convenience), at least for ratings above say 15V, where the positive tempco is pretty dominant.
Note that TVS diodes are usually poor accuracy (10%?), so will not share current well in general.
In other words, yes they might share current, but the sharing is inconsistent over time (namely, as temperature varies in use), and at best some fractional amount: that is, you buy and install two diodes on your board yet you've only increased capacity by maybe 20 or 40%, or whatever it is (again, maybe it's zero, maybe it's 100%).
In general, I recommend against paralleling zeners or TVSs. If I saw this at a design review, I would definitely mark it for reconsideration regardless, and at least require justification (electrical and thermal calculations, supported by manufacturer data) before approving it.
There is one remaining condition where paralleling at least isn't too bad: pulsed operation. In this case, internal resistance dominates, and we can expect reasonable current sharing even for fairly different threshold voltages (such as might be typical of TVSs, maybe even MOVs -- but, MOVs are available in quite large sizes, probably for this very reason [poor sharing], so this is again not a necessary use-case for them). I don't feel bad about putting TVSs in parallel specifically for surge application -- but only as long as I've evaluated the remaining conditions (e.g. high-line, inrush, etc.) and ensured that ratings are respected for a single device (i.e. assuming the other(s) don't participate).