Questions tagged [cosine-similarity]

6 questions
2
votes
1 answer

Which metric I should use in general for semantic similarity in text embedding?

I know this is a trivial question, but I’m really confused about which metric to choose—whether it depends on the model itself, or if there is a universally agreed-upon metric for computing semantic similarity. Suppose I have a text-to-embedding…
1
vote
1 answer

JD-CV Matching: Cosine Similarity Not Performing Well

im working on a JD-CV matching system using Sentence Transformers (all-MiniLM-L6-v2) for embedding generation. I'm currently calculating cosine similarity between JD and CV embeddings, but the results are not very accurate.
1
vote
1 answer

Cheap differentiable similarity metrics of vectors

I am looking to compute the similarity between a large set of vectors during neural network training - a process that is considerably expensive when choosing the wrong metric. So far, I am making use of cosine similarity, but I found that the…
1
vote
2 answers

Given embedding vector A and vector B, how to find top k embedding vectors such that they are similar to vector A and dissimilar to vector B

Which would be better approach for getting top k embedding vectors such that they are similar to embedding vector A and dissimilar to vector B. Approach 1: calculate f(V) = cosine_similarity(A,V) - cosine_similarity(B,V) for each vector V sort…
0
votes
2 answers

How do I choose a good treshold for classification (using cosine similarity scores)?

I am using openai's text-embedding-ada-002 embeddings model to do a semantic search on a database of articles to find articles that are most related to a given input text. I am looking for a way to define a minimum similarity score to prevent…
0
votes
2 answers

How to reduce the number of clusters produced by the Markov Clustering Algorithm?

I have used the Markov Clustering Algorithm (MCL) to cluster tweets, based on their similarity. However, I got a too high number of clusters, and most of the clusters have only one tweet. Any suggestions to reduce the number of clusters?