Questions tagged [cosine-similarity]
6 questions
2
votes
1 answer
Which metric I should use in general for semantic similarity in text embedding?
I know this is a trivial question, but I’m really confused about which metric to choose—whether it depends on the model itself, or if there is a universally agreed-upon metric for computing semantic similarity.
Suppose I have a text-to-embedding…
Muhammad Ikhwan Perwira
- 800
- 3
- 10
1
vote
1 answer
JD-CV Matching: Cosine Similarity Not Performing Well
im working on a JD-CV matching system using Sentence Transformers (all-MiniLM-L6-v2) for embedding generation. I'm currently calculating cosine similarity between JD and CV embeddings, but the results are not very accurate.
GODGAMER
- 11
- 1
1
vote
1 answer
Cheap differentiable similarity metrics of vectors
I am looking to compute the similarity between a large set of vectors during neural network training - a process that is considerably expensive when choosing the wrong metric. So far, I am making use of cosine similarity, but I found that the…
postnubilaphoebus
- 356
- 2
- 13
1
vote
2 answers
Given embedding vector A and vector B, how to find top k embedding vectors such that they are similar to vector A and dissimilar to vector B
Which would be better approach for getting top k embedding vectors such that they are similar to embedding vector A and dissimilar to vector B.
Approach 1:
calculate f(V) = cosine_similarity(A,V) - cosine_similarity(B,V) for each vector V
sort…
Shubham
- 11
- 3
0
votes
2 answers
How do I choose a good treshold for classification (using cosine similarity scores)?
I am using openai's text-embedding-ada-002 embeddings model to do a semantic search on a database of articles to find articles that are most related to a given input text. I am looking for a way to define a minimum similarity score to prevent…
Stefan
- 1
- 1
0
votes
2 answers
How to reduce the number of clusters produced by the Markov Clustering Algorithm?
I have used the Markov Clustering Algorithm (MCL) to cluster tweets, based on their similarity. However, I got a too high number of clusters, and most of the clusters have only one tweet. Any suggestions to reduce the number of clusters?
Adnan Hussein
- 23
- 3