0

What are the best methods for generating word embeddings specifically designed to maximise the correspondence between vector similarities (wrt to some measure) and human judgement (or is there perhaps a dataset of such human similarity estimates). It seems that one standard way of getting good correspondence is using GloVe and cosine similarity but I wonder if this can be/has been improved on by combining it with e.g. GPT embeddings or other more context-aware approaches.

Having downloaded OpenAi's embeddings for some sample words ("text-embedding-3-large", 3072 dimensions) model I can see that neither cosine nor Euclidian distances between these correspond to human similarity perception very well if used by themselves.

I would be grateful for any pointers to relevant literature etc

0 Answers0