3

According to Wikipedia

In computability theory, Rice's theorem states that all non-trivial, semantic properties of programs are undecidable. A semantic property is one about the program's behavior (for instance, does the program terminate for all inputs), unlike a syntactic property (for instance, does the program contain an if-then-else statement). A property is non-trivial if it is neither true for every computable function, nor false for every computable function.

A syntactic property asks a question about a computer program like "is there is a while loop?"

A semantic properties asks a question about the behavior of the computer program. For example, does the program loop forever (which is the Halting problem, which is undecidable, i.e., in general, there's no algorithm that can tell you if an arbitrarily given program halts or terminates for a given input)?

So, Rice's theorem proves all non-trivial semantic properties are undecidable (including whether or not the program loops forever).

AI is a computer program (or computer programs). These program(s), like all computer programs, can be modeled by a Turing machine (Church-Turing thesis).

Is safety (for Turing machines, including AI) a non-trivial semantic question? If so, is AI safety undecidable? In other words, can we determine whether an AI program (or agent) is safe or not?

I believe that this doesn't require formally defining safety.

nbro
  • 42,615
  • 12
  • 119
  • 217

1 Answers1

-2
  1. Every program halts, or continues

  2. Given N steps, enough time and space (*), halting within N steps is provable

3 (from 2). Halting always has proof: run program until halt; count number of steps; verify claim of halting (within number of steps)

  1. (Program is safe) implies (program is proved safe)

  2. (Safety proved) implies (public understands the proof)

  3. (Program is safe) implies (program always halts (safely) or continues (safely)) [from 1, 6] and ((public) understands (safety proof)) [from 7]

  4. (Public not understand claimed safety proof this moment) implies (don't run program this moment) [common sense]

(*) This universe is finite. Some numbers too big to be computed in this universe

Have you seen perfect software?

Have you seen software make mistakes?

Why trust life-death decisions to software?

Why trust government decisions to software?

Why trust business decisions to software?

If scientists may not recognize AI is intelligent, what if you don't recognize something beyond AI in front of you?

(After enough doubt, all you can do is TRUST)