I recently read some introductions to AI alignment, AIXI and decision theory things.
As far as I understood, one of the main problems in AI alignment is how to define a utility function well, not causing something like the paperclip apocalypse.
Then a question comes to my mind that whatever the utility function would be, we need a computer to compute the utility and reward, so that there is no way to prevent AGI from seeking it to manipulate the utility function to always give the maximum reward.
Just like we humans know that we can give happiness to ourselves in chemical ways and some people actually do so.
Is there any way to prevent this from happening? Not just protecting the utility calculator physically from AGI (How can we sure it works forever?), but preventing AGI from thinking of it?
 
     
     
    