Wave pipelining (a technique of removing latch overheads by having two or more "waves" propagate through a multi-cycle pipeline stage) has been used in the past for pipelining cache access (HP PA-7000?). Although this design technique has issues similar to those of asynchronous design (immature design and validation tools, scarcity of experienced designers, device testing issues, increased process variation, complex timing variability based on voltage and temperature, and perhaps even increased quantization granularity of transistors [particularly FinFET?]), the variability is more constrained than a fully asynchronous design.
Using wave pipelining to group pairs of pipelines stages would not only remove latches but would also provide a simple way to halve pipeline depth when frequency is halved. (This could be done with latches by eliding the intermediate latches.)
While wave pipelining would be more problematic for a high frequency design where small variations in propagation delay would be a larger fraction of total delay, intermediate performance processors might have sufficient depth to benefit from reduced latch overhead while having sufficient tolerance for variation.
Are the modest potential benefits simply too small to overcome the above mentioned issues?