0

I would like to give more context to my transformers by adding some metadata related to each token. This metadata is mostly categorical (3 fields, with 3 possible values for each field). In addition of positional embedding (same shape as the tokens, added to them), how can I add this extra information to the tokens before each transformer block ?

I have few options in mind :

  • One-hot encoding of the metadata : t_extra with shape [9,], like [0,0,1,0,1,0,1,0,0] for the three variables with class {2, 1, 0}.. Then these extra token is concatenated to the existing token. This increases d_model (from 512 to 521). Is it a good idea to concat t_extra and increase the token size ?
  • Like the first approach, one-hot encoding, but I repeat (+ 0-padding) the vector of size [9,] up to have t_extra with the same size of the tokens (d_model = 512). And I add it to the tokens, like the PE : Add([t, pe, t_extra]). With this option, I don't known if one-hot encoding is good idea, should I scale the t_extra by a factor 2 or 5 ? such that PE encodding adds variations in [-1, 1] (sine & cosine) and the categorical encodding (in t_extra) adds variations / offsets in {0, 2} (or {0, 5} depending of the scale) at some positions in the token.

Is there some existing approach about categorical embedding in transformer ?

0 Answers0