Why are Siamese Neural Networks used instead of a single neural network?

Question

Siamese Neural Networks are a type of neural network used to compare two instances and infer if they belong to the same object. They are composed by two parallel identical neural networks, whose output is a vector of features. This vector of features is then used to infer the similarity between the two instances by measuring a distance metric.

I was wondering, why not using instead a single neural network that receives as input the two objects that are being compared (e.g. two images) and directly outputs the similarity score? Wouldn't it be better to let the model compare some features of the intermediate layers? Why the Siamese Neural Networks are used for this task and what are the benefits of a Siamese Neural Network over a single neural network that receives as input two instances (e.g. two images) and directly outputs the distance score?

OmG · Accepted Answer · 2022-04-16T18:57:02.353

I come up with multiple advantages for siamese against a single neural network for similarity measuring:

Training Phase. If using a single network to replace Siamese, it might be required a double number of parameters (weights) for learning. Hence, training the network will likely converge slower and the network will be more volatile to noise.

Testing Phase. Note that these similarity measurements are used in the applications like face recognition. Now, suppose we are going to use the model in such a system. If we have implemented the model by the Siamese, we would only need to compute the output of the model for the input once, and then use the cached results for the existing images in the database, and eventually fasly compute the similarity measures. On the other hand, if we have implemented the measurement by a single neural network, we should compute per query the result for all combination of the input and images in the background. Hence, in the latter, we cannot cache the results for the existing data in the database. Therefore, single neural network implementation will have much more intensive query time for massin dataset than Siamese implementation.

score 3 · Answer 2 · answered Apr 16 '22 at 07:28

In addition to @Omg's answer note that Siamese networks are typically used in situations where applying (A,B) to the inputs must generate the same output as applying (B,A) (i.e. the similarity measure of A to B is the same as the similarity of B to A).

With a network with separate weights, this is not guaranteed. One way to get close to this is to not only use samples (A,B) as training input but also (equally often) (B,A). Effectively this doubles the number of training steps (and therefore training time) and the network output is still not guaranteed to be symmetric.

By sharing weights, the symmetry of the response of the network ((A,B) gives the same output as (B,A)) is guaranteed by design.

score 0 · Answer 3 · answered Aug 21 '23 at 01:49

Assuming you want to train an object detector to detect people in an image.

The detector's neural networks learn to identify features that help them detect people. These features range from low-level aspects like edges to high-level characteristics such as the presence of feet, hands, and more.

If you use the features from this network for person re-identification, you won't obtain good results because the learned features do not focus on extracting differences between individuals.

However, if you employ a Siamese neural network with a loss function like Hinge, triplet,..., which involves comparing predictions between positive and negative matches, the network learns to extract features that highlight the differences between individuals, such as clothing color and more.

Why are Siamese Neural Networks used instead of a single neural network?

3 Answers3