Indeed from the approximate formula in your reference we get about $28d_{model}^2$ parameters per encoder-decoder transformer block where $d_{model}=256$ for the DETR paper. Though we can ignore parameters of layer-normalization, bias, and all other places within the transformer MHA+FFN blocks, but we cannot ignore parameter counts from input embeddings, position embeddings, output heads, and possibly proprietary modules of the DETR paper's transformer since they all approximately have parameters of non-negligible scale of $d_{model}^2$ depending on the specific embedding model adopted and they're all transformer components. The author of your referenced article also mentioned this.
I will not consider the input embedding layer with positional encoding and final output layer (linear + softmax) as Transformer components, focusing only on Encoder and Decoder blocks. I do so since these components are specific to the task and embedding approach, while both Encoder and Decoder stacks formed the basis of many other architectures later.