1

I have a model for binary classification. The target variable has the different number of labels (instances) in each sample. For example, a batch of size 2 with 2 and 3 instances and correspondingly with 2 and 3 labels (0 and 1):

y_true = np.array([[0., 1., -1., -1.],
                   [1., 0., 1., -1.]])

The maximal number of labels (instances) in a sample is equal to 4. -1 values are used as a mask. I created a function (in TensorFlow) that masks all -1 values and calculates the loss for each unmasked value and then the average of all losses:

def my_loss_fn(y_true, y_pred):
    mask = tf.cast(tf.math.not_equal(y_true, tf.constant(-1.)), tf.float32)
    y_true, y_pred = tf.expand_dims(y_true, axis=-1), tf.expand_dims(y_pred, axis=-1)
    bce = tf.keras.losses.BinaryCrossentropy(reduction='none')
    return tf.reduce_sum(tf.cast(bce(y_true, y_pred), tf.float32) * mask) / tf.reduce_sum(mask)

Is it a correct way mathematically? Should I add some weights depending on number of labels in a sample? Or calculate loss per sample (per row in my example) and then get the average? Should I mask unused labels at all? It looks like my model doesn't learn to classify correctly and predicts 1 for all labels.

Mykola Zotko
  • 111
  • 4

0 Answers0