Which would be better approach for getting top k embedding vectors such that they are similar to embedding vector A and dissimilar to vector B.
Approach 1:
- calculate
f(V) = cosine_similarity(A,V) - cosine_similarity(B,V)for each vector V - sort vectors by f(V) value in descending order
- take first k of them.
Approach 2:
- calculate
f(V) = cosine_similarity(A,V),g(V) = cosine_similarity(B,V)for each vector V - sort vectors by f(V) value in descending order
- take first k of them
- sort selected k vectors by g(V) in ascending order.
Approach 3:
- calculate
f(V) = cosine_similarity((A - B),V)for each vector V - sort vectors by f(V) value in descending order
- take first k of them.
Also, suggest better approach if you have other than above two.
Note: embedding vector was calculated using word2vec algorithm
