6

I have an idea to find the optimal number of hidden neurons required in a neural network, but I'm not sure how accurate it is.

Assuming that it has only 1 hidden layer, it is a classification problem with 1 output node (so it's a binary classification task), has N input nodes for N features in the data set, and every node is connected to every node in the next layer.

I'm thinking that to ensure that the network is able to extract all of the useful relations between the data, then every piece of data must be linked to every other piece of data, like in a complete graph. So, if you have 6 inputs, there must therefore be 15 edges to make it complete. Any more and it will be recomputing previously computed information and any less will be not computing every possible relation.

So, if a network has 6 input nodes, 1 hidden node, 1 output node. There will be 6 + 1 connections. With 6 input nodes, 2 hidden nodes, and 1 output node, there will be 12 + 2 connections. With 3 hidden nodes there will be 21 connections. Therefore, the hidden layer should have 3 hidden nodes to ensure all possibilities are covered.

This answer discusses another method. For the sake of argument, I've tried to keep both examples using the same data. If this idea is computed with 6 input features, 1 output node, $\alpha = 2$, and 60 samples in the training set, this would result in a maximum of 4 hidden neurons. As 60 samples is very small, increasing this to 600 would result in a maximum of 42 hidden neurons.

Based on my idea, I think there should be at most 3 hidden nodes and I can't imagine anymore being useful, but would there be any reason to go beyond 3 and up to 42, like in the second example?

nbro
  • 42,615
  • 12
  • 119
  • 217
w13rfed
  • 205
  • 1
  • 5

1 Answers1

7

I have an idea to find the optimal number of hidden neurons required in a neural network but I'm not sure how accurate it is.

It's a complete non-starter, and there is a no such calculation possible in the general case (real-valued inputs to a neural network).

Even with one input neuron it is not possible. That is because even with one input, the output can be an arbitrarily complex mapping to classes. A good example with two inputs that would require an infinite number of hidden neurons to supply a simple classifier would be classifying x,y points as being in the Mandelbrot set.

In some, more constrained, examples, with well-defined functions, you can construct a minimal neural network that solves the problem perfectly. For instance a neural network model of XOR can be made with two hidden neurons (and six links). However, this kind of analysis is limited to simple problems. You might be able to come up with some variation of your idea if all inputs were boolean, and the neural network limited to some combined bitwise logic on all the inputs.

Your idea of matching number of edges to number of possible interactions between inputs does not work because you are only considering the most basic kind of interaction between two variables, whilst variables can in practice combine in all sorts of ways to form a function.

In addition, each neuron in a hidden layer works with a linear weighted sum, plus a fixed transformation function. This is in no way guaranteed to match the function shape that you are trying to approximate with the neural network. An analogy that you might be aware of is discrete Fourier transforms - it is possible to model any part of a function by combine sine and cosine waves of different frequencies, but some functions will require many such waves in order to be represented accurately.

Your link to the answer in Cross Validated Stack Exchange gives you a rule of thumb that the writers find often works with the kinds of data that they work with. This is useful experience. You can use such rules as the starting point for searching for architecture that works on your problem. This will likely be more useful than your idea based on counting the possible variable interactions. However, in both cases, the most important step is to perform a test with some unseen examples, and to search for the best neural network architecture for your problem.

There are things you can do with variable interactions though. For instance, try looking for linear correlations between simple polynomial combinations of variables and your target variable, e.g. plot $x_1 x_2$ vs $y$ or $x_3^2 x_4$ vs $y$ . . . you may find some combinations have a clear signal implying a relationship. Take care if you do this sort of thing though, if you test very many of these, you will find a linear relationship purely by chance that looks good initially but turns out to be a dud when testing (it's a form of overfitting). So you should generally test a lot less than the size of your dataset, and limit yourself to some modest maximum total power.

Neil Slater
  • 33,739
  • 3
  • 47
  • 66