Word embeddings as close as possible to human similarity estimation

Asked Nov 01 '24 at 23:29

Active Nov 01 '24 at 23:29

Viewed 22 times

What are the best methods for generating word embeddings specifically designed to maximise the correspondence between vector similarities (wrt to some measure) and human judgement (or is there perhaps a dataset of such human similarity estimates). It seems that one standard way of getting good correspondence is using GloVe and cosine similarity but I wonder if this can be/has been improved on by combining it with e.g. GPT embeddings or other more context-aware approaches.

Having downloaded OpenAi's embeddings for some sample words ("text-embedding-3-large", 3072 dimensions) model I can see that neither cosine nor Euclidian distances between these correspond to human similarity perception very well if used by themselves.

I would be grateful for any pointers to relevant literature etc

asked Nov 01 '24 at 23:29

ufghd34

Word embeddings as close as possible to human similarity estimation

0 Answers0