I would like to give more context to my transformers by adding some metadata related to each token. This metadata is mostly categorical (3 fields, with 3 possible values for each field). In addition of positional embedding (same shape as the tokens, added to them), how can I add this extra information to the tokens before each transformer block ?
I have few options in mind :
- One-hot encoding of the metadata :
t_extrawith shape[9,], like[0,0,1,0,1,0,1,0,0]for the three variables with class{2, 1, 0}.. Then these extra token is concatenated to the existing token. This increasesd_model(from 512 to 521). Is it a good idea to concatt_extraand increase the token size ? - Like the first approach, one-hot encoding, but I repeat (+ 0-padding) the vector of size
[9,]up to havet_extrawith the same size of the tokens (d_model = 512). And I add it to the tokens, like the PE :Add([t, pe, t_extra]). With this option, I don't known if one-hot encoding is good idea, should I scale thet_extraby a factor 2 or 5 ? such that PE encodding adds variations in[-1, 1](sine & cosine) and the categorical encodding (int_extra) adds variations / offsets in{0, 2}(or{0, 5}depending of the scale) at some positions in the token.
Is there some existing approach about categorical embedding in transformer ?