Given embedding vector A and vector B, how to find top k embedding vectors such that they are similar to vector A and dissimilar to vector B

Question

Which would be better approach for getting top k embedding vectors such that they are similar to embedding vector A and dissimilar to vector B.

Approach 1:

calculate f(V) = cosine_similarity(A,V) - cosine_similarity(B,V) for each vector V
sort vectors by f(V) value in descending order
take first k of them.

Approach 2:

calculate f(V) = cosine_similarity(A,V) , g(V) = cosine_similarity(B,V) for each vector V
sort vectors by f(V) value in descending order
take first k of them
sort selected k vectors by g(V) in ascending order.

Approach 3:

calculate f(V) = cosine_similarity((A - B),V) for each vector V
sort vectors by f(V) value in descending order
take first k of them.

Also, suggest better approach if you have other than above two.

Note: embedding vector was calculated using word2vec algorithm

score 0 · Answer 1 · answered Jan 06 '22 at 16:40

As the objective is to find the most similar to A and disimilar Vector to B approach 2 would be the most appropriate.

Why not Approach 1: It can lead to confusing results. If you look at the example below multiple scenarios lead to same final value. This may lead to few problems :

For same output how do you know which vector should be preferred
Subtracting with each other may cancel out impact of dissimilarity on one with other

Why not Approach 3: Vector Subtraction is not commutative. So you will get different results based on whether you use A first or B first. Moreover lets see an example where it fails completely :

Suppose we have 2 dimension vector A and B, and we want to vector V which is

A = 6,0 B = 0.01,-0.01

V = 3,-1

Cos(A,V) = 0.94 Cos(B,V) = 0.894 Cos(A-B,V) = 0.94

Second Secnario: A = 6,0 B = -0.01,0.01

V = 3,-1

Cos(A,V) = 0.94 Cos(B,V) = -0.894 Cos(A-B) = 0.94

Though cos A and cos B are two extremes in scenario 1 and scenario 2. Cos(A-B) gives the same result. Hence i will not advice to use Cos(A-B)

Approach 2 will work expected in your case.

score 0 · Answer 2 · answered Jan 06 '22 at 21:57

This is just a heuristic, but how about f(V) = cosine_similarity(A,V) * min(1, 1 - cosine_similarity(B,V))? I'm applying the min here so that the multiplier won't get larger than one. Up to you whether you want to include it or not.

This could be tuned further as f(V) = max(0, cosine_similarity(A,V))^a * min(1, 1 - cosine_similarity(B,V))^b, but you'd need to experiment to see which values of a and b works the best for you.

Given embedding vector A and vector B, how to find top k embedding vectors such that they are similar to vector A and dissimilar to vector B

2 Answers2