4

I am very new to machine learning. I am following the course offered by Andrew Ng. I am very confused about how we train our neural network for multi-class classification.

Let's say we have $K$ classes. For $K$ classes, we will be training $K$ different neural networks.

But do we train one neural network at a time for all features, or do we train all $K$ neural networks at a time for one feature?

Please, explain the complete procedures.

nbro
  • 42,615
  • 12
  • 119
  • 217

2 Answers2

2

Let us suppose that you are training a neural network for classfing images of vehicles , then the input vector , image of the "vehicle" will be a 2D array of pixels. This undergoes several transformations at each layer of the neural network , the last layer of the neural network produces another vector whose dimensions are lesser than the original image vector.

So the network is mapping images to some vectors in a high dimensional space. In order to classify the images it is now sufficient to classify the vectors obtained from the network of their corresponding images, you can do this with a simple "linear" classifier using a softmax layer.

So all the layers of the network except the last layer are transforming the image representation to a "vector". This vector is classified by a linear softmax classifier by the last layer of neural network.

2

Let's say we have $K$ classes. For $K$ classes, we will be training $K$ different neural networks.

No, you still train one network.

With binary classification tasks, where you have only two mutually exclusive categories, like "yes/no" or "true/false", you can get away with a single output node with a sigmoid activation. The output of the sigmoid is interpreted as indicating one category for values $> 0.5$ and the other for values $\leq 0.5$.

With multi-class classification, you have $K$ outputs (one for each category). The problem, in this case, is that if the network gets the class wrong, in general, you cannot decide in one step which one of the other $K - 1$ categories is the correct one. So, the output is actually passed through an extra softmax layer, which outputs probabilities for each class.

But do we train one neural network at a time for all features, or do we train all $K$ neural networks at a time for one feature?

You present all features for each training example to the network at the same time. So, for $N$ features you have $N$ input nodes, and you feed all of them into the neural network.

nbro
  • 42,615
  • 12
  • 119
  • 217
cantordust
  • 953
  • 6
  • 10