It seems like most research regarding stabilizer codes focused on CSS codes, which can be understand as a "tensor product" of two classical linear codes. However, I feel like non-CSS codes tend to render better performance given the same code size. For example, the well-known $[[5, 1, 3]]$ code can correct a single error with only five qubits. If we require the code to be CSS, we then need minimal seven qubits (Steane code) to do the same job (see this answer for why $[[6, 1, 3]]$ code cannot be CSS).
What are the main motivations for pursuing CSS codes over general non-CSS codes? I can imagine that CSS codes might make decoding easier as we are able to decode X and Z errors separately, but what else can we gain from the CSS codes, as it is suboptimal compared with the non-CSS codes?