Why is it believed that a single-layer perceptron can't solve XOR? Doesn't this example disprove that?

Question

def activation_function(number):
  if number%2:
    return 1
  return 0

weights = [1, 1]
for x in range(2):
  for y in range(2):
    print(f"{x}, {y} = {activation_function(weights[0] * x + weights[1] * y)}")

Output:

0, 0 = 0
0, 1 = 1
1, 0 = 1
1, 1 = 0

What am I missing here?

chessprogrammer · Answer 1 · 2024-01-01T02:45:39.837

28

The perceptron has a step activation. This does not.

But at a deeper level, what you are showing is that, if we define the neuron's activation function to be XOR (essentially what the mod 2 addition means), then it can compute XOR. That is a vacuous statement. It's like saying a neuron can compute any function f, as long as we define it's activation as f. Obvious, but useless.

edited Jan 01 '24 at 02:45

answered Dec 31 '23 at 16:47

chessprogrammer

3,050
2
16
26

ChrisoLosoph · Answer 2 · 2024-01-04T22:09:05.333

As far as I remember, the activation functions in our university lecture (which we called transfer function as well) were always (non-strictly) ~~monotonous~~ monotonic (thanks to A.R. for the correction). Modulo is not a monotonic function.

The single layer perceptron (which we also called connectionist neuron) is indeed incapable of XOR-classification if we have a monotonic activation function because in that case the binary classification is one straight line or hyperplane separator in space (ternary would be two parallel hyperplanes, etc.), and the activation function is necessarily a heaviside function, an edge detector.

We did not formally define XOR classification but we considered it to be the capability of dividing space such that 4 disjoint subsets of space are assigned two classes, diagonal subsets belong to the same class. (It would make sense to define XOR as 4 quadrants in a 2D space.) Between every two subsets of the same class, there is one subset of the opposite class. When two same-class subsets are in one half then a third subset also will be in the half. We need at least a second linear classifier in addition to separate both classes in each half.

By the way, modulo is not a binary classifier and therefore cannot be considered to be an XOR or other binary classification. Binary classifiers produce binary output for real-valued input vectors and "mod x" certainly has a ~~domain~~ real-valued range (thanks to John Madden for the correction) from 0 (inclusive) to "x" (exclusive).

Why is it believed that a single-layer perceptron can't solve XOR? Doesn't this example disprove that?

2 Answers2