5

I've been reading up on how NEAT (Neuro Evolution of Augmenting Topologies) works and I've got the main idea of it, but one thing that's been bothering me is how you split the different networks into species. I've gone through the algorithm but it doesn't make a lot of sense to me and the paper I read doesn't explain it very well either so if someone could give an explanation of what each component is and what it's doing then that would be great thanks.

The two equations are:

$\delta = \frac{c_{1}E}{N} + \frac{c_{2}D}{N} + c_{3} .\overline{W}$

$f_{i}^{'} = \frac{f_i}{\sum_{j=1}^{n}sh(\delta(i,j))}$

By the way, I can understand the greek symbols so you don't need to explain those to me

The original paper

Aguy
  • 65
  • 1
  • 7

2 Answers2

6

The first equation deals with distance. Delta, or distance, is the measure of how compatible two genomes are with each other. c1, c2 and c3 are parameters you set to dictate the importance of E, D and W. Note that if you change cc1, c2 or c3, you will most likely also have to change dt, which is the distance threshold, or the maximum distance apart 2 genomes can be before they are separated into different species. E represents the total number of excess genes. D represents the total number of disjoint genes in both genomes, W represents the total weight difference between genes that match, and finally, N represents the number of connections / genes the genome with the larger number of connections has. For example, take the following 2 genomes:

[[1,.25][2,.55],[4,.78],[6,.2]]
and
[[1,.15][3,.92],[5,.37]]

Where the 0 index represents innovation number and the 1 index represents weight value. E would be 1, since there is 1 excess gene, gene 6. D would be 4, since connections 2 and 4 are not in genome 2, and connections 3 and 5 are not in genome 1. W would be .10, since only connection 1 is shared between the two genomes.

The second formula is a bit more complicated. From my understanding, correct me if I'm wrong, this is a formula for adjusting fitness. f′i is the adjusted fitness, which will replace the original fitness, fi. For every genome j in the entire population, yes, entire population and not just every genome in its specie, it will calculate the distance between j and i, i being the genome of fitness fi. Then it will sum up all the distance values, and divide the original fitness fi by the total distance sum, and set f'i to that. Next,

Every species is assigned a potentially different number of offspring in proportion to the sum of adjusted fitnesses f`i of its member organisms.

This assigning of number of species offspring is used so that one specie can't take over the entire population, which is the whole point of speciation in the first place, so in conclusion, these two formulas are vital to the function and efficiency of the NEAT algorithm.

Terry T.
  • 339
  • 2
  • 8
0

Even if this thread is old i want to add something to your answer, regarding to the second formula. The fitness sharing mechanism works like this:

foreach(s in species):
   foreach(i in s.individuals):
      i.fitness /= s.Count;

So in each species, the adjusted fitness of an individual is divided by the "clients" in that species. Why? This is how the formula works:

  • f'[i] is the adjusted/shared fitness of agent i
  • f[i] is the current fitness of agent i
  • sh(i,j) returns 1 if both i & j are compatible (considering the first formula your provided does not exceed the threshold) or 0 otherwise.
  • Sum(sh(i,j)) just counts how many individuals are in the same species with i, plus itself (because i can be equal to j while parsing through Sigma). So if you have a Species class with a List of clients, Sum(...) is just list.Count.

Everything is explained clearly in Stanley's paper at page 110.

In this situation, the average fitness of a species is just the sum of all clients' adjusted fitness.