5

Does NEAT require only connection genes to be marked with a global innovation number?

From the NEAT paper

Whenever a new gene appears (through structural mutation), a global innovation number is incremented and assigned to that gene.

It seems that any gene (both node genes and connection genes) requires an innovation number. However, I was wondering what was the node gene innovation number for. Is it to provide the same node ID across all elements of the population? Isn't the connection gene innovation number sufficient?

Besides, the NEAT paper includes the following image which doesn't show any innovation number on node genes.

enter image description here

nbro
  • 42,615
  • 12
  • 119
  • 217
kuma
  • 341
  • 1
  • 10

2 Answers2

4

It is actually the other way around: connection IDs is what is debated!

Nodes always have innovation IDs (in the image, it is just their identifying number).

Node IDs are sufficient to identify connections. If a connection links nodes 3 and 6, then it is the same as another connection linking nodes 3 and 6: no need for an extra ID. So why the extra innovation IDs then?

On the one hand, this is an implementation choice: maybe these extra IDs would allow you to create a more complex but faster code?

On the other hand, there is a debate around whether a connection between two nodes means the same thing at different times in evolution. If you have no innovation IDs, then you cannot tell apart an old connection from 3 and 6 and another that was independently created later in a different genome (imagine the old connection was removed first). Is this relevant? As said, it is an open debate. Surely, it is not crucial at a basic level!

This question (and my answers) is related to this other question on Stack Overflow.

nbro
  • 42,615
  • 12
  • 119
  • 217
Pablo
  • 226
  • 3
  • 6
1

In the original paper the innovation ID is on the connections only.

The connection is the object that is keeping the information; nodes can be discerned by the connections.

enter image description here

This image represents a possible crossover operator that makes a distinction between disjoint and excess genes and therefore creates children depending on these. It's part of my master thesis, I'd be glad to expand the topic, but for now I'll just use it as example.

In the image we assume that the connections that are joining two nodes have the same innovation number.

As you can see, there is no need of assigning an innovation number to the nodes: nodes are just a result of what the connection say. This also allows for a more dynamic approach that can be used to spot invalid nets even before building them (checking if there are cyclics or nodes that aren't receiving any input or giving any output) and correct them in order to obtain only valid graphs. Nodes are added just because there is a connection that is pointing to that specific node. This is enough to grant its presence (node number 8 in child 2).

As last point, for the data normalization theory (Normalization is a process of organizing the data in database to avoid data redundancy, insertion anomaly, update anomaly & deletion anomaly) we should avoid rendundancy at any cost and this is why we should try to keep track of the smallest amount of object possibles. So if we can deduct the nodes from the connections we should do it.

0xSwego
  • 401
  • 6
  • 10