im working on a JD-CV matching system using Sentence Transformers (all-MiniLM-L6-v2) for embedding generation. I'm currently calculating cosine similarity between JD and CV embeddings, but the results are not very accurate.
1 Answers
Pre-trained models may not fully capture the specific language and context of your specific downstream domain. Fine-tuning the Sentence Transformer on a dataset of JD-CV pairs labeled for relevance can significantly improve embedding quality. This process aligns the model more closely with the specific semantics of job descriptions and resumes.
the nature of the data differs. Most NLP research in text similarity is based on the assumption that two pieces of information are the same but written differently. However, in our case the two documents do not express the same information, but complement each other like pieces of a puzzle. Our goal is to match two complementary pieces of textual information, that may not exhibit direct overlap/similarity.
We propose to fine-tune the BERT model using the Siamese SBERT framework on our large real-world dataset with high quality labels for resume-vacancy matches derived from our consultants' decisions. We show our model beat our unsupervised and supervised baselines based on TF-IDF features and pre-trained BERT embeddings... Finally, we show that using a regression objective to optimize for cosine similarity yields more useful embeddings in our scenario, where we aim to apply the learned embeddings as feature representation in a broader job recommender system.
- 11,000
- 3
- 8
- 17