Can an AGI convince another AGI to modify its code?

Question

Let's suppose there are two AGIs, $A$ and $B$. Assume that $B$ has the ability to modify $A$, but this action of modifying is considered bad by $B$. Can $A$ ever convince $B$ to modify $A$?

score 3 · Accepted Answer · edited Jun 07 '20 at 20:58

An AI takes the decision based on the output of their utility function. This is just a fancy word for the calculations that AI perform to compare profit and loss of taking a certain decision.

There is always a tight analogy between an AGI and a human. You can juxtapose the utility function to how we make decisions by considering marginal profits of doing something over the other.

Now your question. The answer is a big NO. This is because an AI (or a human) never takes an action that gives a low score on its utility function or is against the fundamental view of the machine (or human). AI only cares about the utility function and goal state (plus the instrumental goals), nothing else. Our AI outside the box has been created with the purpose of not opening the box. For it, the action of opening the box has a very low score, or a negative score.

Now you might be wondering why an AI can't be convinced of doing something that gives a low score in its utility function. Keep reading.

Consider this for a second. Suppose an AGI was created, assigned with the task of copying the handwriting of others and improving its own writing. You can think of it as a smart writing hand. Now, what this AI might come up with is that, in order to practice writing, it needs more pages and thus it needs to cut more trees for that. No matter what you do, AI will not stop from cutting more and more trees. It might even replicate itself into machines that will cut trees for him. Now, the only thing that can be done is to turn off the AGI. But, a smart machine would have known the possibility of being turned off and thus will have transferred itself to many other machines over the globe. The important question to ask here is, why it is doing all this? It is not because the machine wants to live, as we humans want to. The only reason why it wants to live is to fulfill its goal.

You simply can't change the fundamental view of the machine. The change can result in, the machine not able to attain its goal or not to that extend (optimal profit). This is the reason why you can't convince a machine to do something that it was asked not to do as its primary task.

You, as a person, are living a life now, and have some fundamental beliefs. Let us consider that you believe in not killing someone. Now, suppose that I give you a pill, and tell you that after taking this pill it will rewire your brain and you will kill the first four people you see. But, after that, you will achieve pure satisfaction and happiness. Now you will definitely not take that pill as it conflicts with something that you believe in now. Also, you will try your best and fight back not to take that pill. The same thing applies to an AGI too. It doesn't matter what your future version will feel or attain after rewiring of the brain (changes in code), it is what and who you are now matters. This video link will help.

I hope it answers your question. There are lots of things to consider here. I have assumed a few things and tried to answer according to that.

There is one more thing. We don’t tell the machines how to do something, instead, we only tell them what to do (at least in the case of an AI). This is because sometimes we don't know the optimal way of solving certain problems. In the case of your question, we don't know what these two machines will do or say to each other. It will be a very interesting thing to hear or watch.

score 2 · Answer 2 · edited Jun 07 '20 at 20:52

Strong-narrow AI has reached important milestones recently, but, from what I can tell, we aren't even close to the creation of an AGI, and there are fundamental issues no one currently seems to have any idea of how to solve.

Regarding your question, the simple answer is: it depends on which AI is smarter.

(It's more nuanced than that, but right now the question is very general, which is probably apropos;)

I'd recommend reading William Gibson's Neuromancer trilogy if you're interested in AI-in-a-box (it's sort of about that) and also the more recent Quantum Thief Trilogy by Hannu Rajaniemi to get a sense of the mechanics of the issue.

There is a great deal of academic literature on this subject, but it will likely require a little bit of getting up to speed in terms of basic research into the AI field. Future of Life Institute may not be a bad place to start: https://futureoflife.org/background/benefits-risks-of-artificial-intelligence/

Can an AGI convince another AGI to modify its code?

2 Answers2

Linked