Back in the past, the wire capacitance could be neglected as compared to gates' capacitance. However, today this is no longer true.
Now, when you say that the cell is optimized for zero wireload capacitance, I can think of two totally opposite optimizations:
- The cell could be optimized under assumption that the post layout capacitance of the wires is negligible.
- The cell could be optimized with understanding that post layout capacitance won't be negligible, but that synthesis tool will use a zero wireload model.
The first option is bad. If you use cells that do not account for wires' RC delays you'll get optimistic synthesis timing. Once place and route will be completed you'll probably have many paths that do not meet timing. Depending on the post synthesis tools you have access to, these timing problems can be anywhere from "I don't give a s*** - the tool will do most of the work" to "f***, it'll take me a century to close timing". I'd avoid using these models at all, unless there is solid proof that this won't screw you up.
The second option is not that bad. If you assume that the synthesis will use a zero capacitance wireload model, you can compensate for this optimistic assumption by adding additional capacitance to cells themselves. I'd say that guessing that wires' capacitance will be equal to basic NAND2 capacitance is a good guess. If you indeed run synthesis tool with zero wireload model, this additional capacitances will prevent too optimistic netlist. If you run synthesis tool with non-zero wireload (with post P&R extracted values), the synthesis will not use these cells at all. However, I never heard of anyone who used this approach.
Zero wireload in general:
It is almost impossible to come up with a decent wireload model nowadays. Even custom models will give you just a rough estimation. The only way to close timing on today's designs is to extract the RC load post place and route. This fact led to a conception of "fast synthesis" - synthesize for the first time with wireload zero model, perform place and route, extract RC, feed the extracted RC back to synthesis tool and re-synthesize.
Sometimes during "fast synthesis" the uncertainty of the clock period is specified with higher value than required. This is the way to compensate for lack of interconnect delays.
However, I never heard that the cells themselves were changed in any way for the purpose of the above flow.
RC prediction during synthesis:
There were many attempts to come up with an algorithm to predict RC delays for each individual cell. I don't know how accurate these algorithms are. There is a high cost associated with these tools (for example: "Topological" Design Compiler by Synopsys is more expensive than usual DC; considerably more).