Complete formula to get LLM VRAM usage

Question

I would like to find the GPU size required to run an hypothetical LLM, considering all possible factors, like:

P: Model parameters (total or MoE active parameters)
Q: Quantization bits
C: Context length cap (from what I understand, the context can be capped to allow a sort of smaller "batch-size" limit)
ATT: Type of attention used (Full attention, Flash attention...)
Other

I understand how the usual formula I can find around

Space = ((P × 4Bytes) / (32 / Q)) × overhead

does describe some part of the picture, but does not give the full idea down to the details.