3

So recently I've noticed that I have a process that will randomly crash and become a zombie with a PPID of 1 (init). I've been told that the only way to fix this is to reboot the PC (or send SIGCHLD to init, which is....dicey/useless, from what i understand. )

Essentially, what I'm looking to do is write a bash script that will just look for a zombie process and if there is one, reboot the PC.

Currently, i use this script to monitor the process itself:

 ps auxw | grep ethminer | grep -v grep > /dev/null

 if [ $? != 0 ]
 then
    sudo reboot
 fi

Now, this script seems to work fine when ethminer is either RUNNING, or NOT RUNNING; it will reboot the machine if it does not see ethminer in the process table, and it does nothing if it doesn't see it.

However, (from my admittedly loose understanding) since there is no exit code when the process becomes a zombie if [ $? != 0 ] doesn't get any input, and therefore doesn't do anything.

Is there anyway I can fix/modify this script so it does what i want it to do? Or am I way off track here?

Thanks!

heemayl
  • 93,925

1 Answers1

5

You don't have to reboot when they are zombie processes. Here's why:

  • A process becomes zombie when the process is finished, but it's parent has not called wait(2) to get it's return code

  • The zombie does not take any physical or virtual resources except only an entry in the kernel's process table

  • Once the parent calls wait(2) the zombie will be properly reaped and the process table entry will be removed

  • If the zombie becomes an orphan i.e. if it's parent dies, then init (PID 1) will inherit the process and will reap it by calling wait(2)

As you can see it's a matter of time till the wait(2) is called and the zombie is reaped. If you have many zombies over the time, consider it's a programming flaw, you should look at fixing (or ask for fixing) the code instead rather than rebooting, which is absolutely unnecessary and should not be done.


To find the zombie processes, get the STATE of the process, if it's Z, then the process is a zombie:

ps -eo pid,ppid,state,cmd | awk '$3=="Z"'

Here i have taken only selective fields namely the PID, PPID, STATE and COMMAND.

heemayl
  • 93,925