I have a bash script
#!/bin/bash
# Enable nvidia-smi settings so they are persistent the whole time the system is on.
nvidia-smi -pm 1
# Define the various overclocking settings (powerLimit in watts)
powerLimit="100"
coreOffset="150"
memoryOffset="1000"
targetFanSpeed="40"
TOTAL_GPU=5
GPU_INDEX=0
while [ $GPU_INDEX -lt $TOTAL_GPU ]; do
nvidia-smi -i $GPU_INDEX -pl $powerLimit
nvidia-settings -a [gpu:$GPU_INDEX]/GpuPowerMizerMode=1
nvidia-settings -a [gpu:$GPU_INDEX]/GPUMemoryTransferRateOffset[3]=$memoryOffset
nvidia-settings -a [gpu:$GPU_INDEX]/GPUGraphicsClockOffset[3]=$coreOffset
nvidia-settings -a [gpu:$GPU_INDEX]/GPUFanControlState=1
nvidia-settings -a [fan:$GPU_INDEX]/GPUTargetFanSpeed=$targetFanSpeed
let GPU_INDEX=GPU_INDEX+1
done
To set an overclock on the GPUs installed in my system. I am trying to run this script on startup to automate the system on a reboot. To do so I have edited my root crontab with the entries
0 0 * * * reboot -h now
@reboot bash /home/rig0/Documents/startup/1060OC.sh > /home/rig0/Documents/startup/1060OC.log
I am piping the output of the bash script into a log file to make sure that each setting is successful. My first, minor problem, is that some of the text that is output when I run the bash script form a terminal, is not caught in this file
For instance the log output from the cron job is (for just one GPU)
Enabled persistence mode for GPU 00000000:07:00.0.
...
Power limit for GPU 00000000:07:00.0 was set to 100.00 W from 150.00 W.
All done.
But when ran in the console the text reads (for just one GPU)
Power limit for GPU 00000000:07:00.0 was set to 100.00 W from 100.00 W.
All done.
Attribute 'GPUPowerMizerMode' (rig0-System-Product-Name:0[gpu:4]) assigned
value 1.
Attribute 'GPUMemoryTransferRateOffset' (rig0-System-Product-Name:0[gpu:4])
assigned value 1000.
Attribute 'GPUGraphicsClockOffset' (rig0-System-Product-Name:0[gpu:4])
assigned value 150.
Attribute 'GPUFanControlState' (rig0-System-Product-Name:0[gpu:4]) assigned
value 1.
Attribute 'GPUTargetFanSpeed' (rig0-System-Product-Name:0[fan:4]) assigned
value 40.
Why do I not see the "extra" text when the output is piped to the log file in the cron job?
I assume I need to do something like this U&L answer and add the redirect 2>&1. Which from earlier reading I think means to pipe std.err to std.out for the command?.
Though... I think the real issue, and the main question of this post is, none of the settings from the cron job bash script are actually set (I need to check, I think the wattage is set, but I forget).
Are these settings not actually set because the cron job runs before the Nvidia X server starts up?
Basically the calls to nvidia-settings ... don't take effect because the X server using the nvidia driver is not yet running at the time the cron job is ran?
Is there a way I could check, and wait, for the X server to be running in my bash script? This would allow the cron job to wait until the settings it tries to effect are available.
Maybe I can add the proposed solution from this SO answer ?
EDIT: I found and old RedHat archive that gave me a way to allow the bash script to wait for the X server to startup
# Wait for the X server to startup
echo "Waiting for the X server to startup..."
XON=""
while [ "$XON" == "" ]; do
/bin/sleep 5
echo " Checking for X server."
XON=$(ps ax | grep -v grep | grep -i xorg)
echo " Result:[$XON]"
done
echo "X server started. Setting overclock settings..."
Which does seem to wait for the X server to start up based on the log file output
Waiting for the X server to startup...
Checking for X server.
Result:[ 978 tty7 Rs+ 0:01 /usr/lib/xorg/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch]
X server started. Setting overclock settings...
Enabled persistence mode for GPU 00000000:01:00.0.
Though the same issues persist. The settings do not take hold and the output I would expect from the settings commands, are not present in the log.
Do I need to wait for nvidia-something to start maybe..?
EDIT2: First, I've moved the same cron command into my rc.local because it seems as a more correct solution.
Next, I've redirected the output 2>&1 and found the error message
Failed to connect to Mir: Failed to connect to server socket: No such file or directory
Unable to init server: Could not connect: Connection refused
ERROR: The control display is undefined; please run `nvidia-settings
--help` for usage information.
Which seems to be a common problem on headless systems trying to run the nvidia-settings command, and seems to be solved with export DISPLAY:0. Though, given that I have a head'ed system, I am not sure if this is the correct solution, or exactly the effects of export DISPLAY:0, especially on a head'ed system.
Doing some research...