Watching this video implementing attention in a transformer. He set query, key, and value biases to False and said "Typically, people don't use biases for these".
Even in official PyTorch code the default bias is False:
add_bias_kv: If specified, adds bias to the key and value sequences at dim=0. Default:
False.
What is the reason behind that?