Great answers already, I'd just like to add a few other things that you should take into consideration. Like hardlib and Goufalite have already mentioned, the way to do this is trigonometrically. I've drawn out a 2-d depiction of the camera and the IoT object:

As you can see, the camera's field of view is going to be larger than the object - if not in close range, when the object moves further away.
Now, you may want the camera always centred on the object. In that case, you can simply take the calculations that hardlib referenced:
ϴ = arctan(y/x)
...which will be the angle counterclockwise from the x-axis, per convention. You'll also need the angle away from level:
α = arctan(z / ((y^2+x^2)^1/2))
Obviously, you'll have to calculate based off of the camera position being at the origin in all three axes.
On the other hand, you may prefer to not make the camera move more than necessary, that is, to make the camera only move once the object appears to be about to move out of the frame. In that case, you'll probably want a "pressure" variable which will make the camera more likely to change its angle based on how close the object is to the edge of the frame.
If you go that route, you'll need to know the angle of the camera's field of view in both fields of view, so that you can determine where the object is compared to the camera's field of view.