varnish.service: A process of this unit has been killed by the OOM killer

Question

I have varnish installed on the Ubuntu server 22.04.

The environment is Php8.1 and Varnish 7.3.0.

The server has 16GB RAM.

For me the Varnish keep on crashing intermittently and I get following message.

:~$ sudo service varnish status
× varnish.service - Varnish Cache, a high-performance HTTP accelerator
   Loaded: loaded (/etc/systemd/system/varnish.service; enabled; vendor preset: enabled)
   Active: failed (Result: oom-kill) since Mon 2023-11-20 13:18:05 UTC; 18s ago
   Process: 198053 ExecStart=/usr/sbin/varnishd -a :80 -a localhost:8443,PROXY -p feature=+http2 -p http_resp_hdr_len=35M -p http_resp_size=40M -p workspac>
   Main PID: 198054 (code=exited, status=64)
        CPU: 14min 39.533s
Nov 20 13:18:05    varnishd[198054]: Manager got SIGTERM
Nov 20 13:18:05    systemd[1]: varnish.service: A process of this unit has been killed by the OOM killer.
Nov 20 13:18:05    varnishd[198054]: Stopping Child
Nov 20 13:18:05    varnishd[198054]: Child (198068) died signal=9
Nov 20 13:18:05    varnishd[198054]: Child cleanup complete
Nov 20 13:18:05    varnishd[198054]: manager stopping child
Nov 20 13:18:05    varnishd[198054]: manager dies
Nov 20 13:18:05    systemd[1]: varnish.service: Main process exited, code=exited, status=64/USAGE
Nov 20 13:18:05    systemd[1]: varnish.service: Failed with result 'oom-kill'.
Nov 20 13:18:05    systemd[1]: varnish.service: Consumed 14min 39.533s CPU time.
lines 1-17/17 (END)

I have following setting in the /etc/systemd/system/varnish.service

ExecStart=/usr/sbin/varnishd \
      -a :80 \
      -a localhost:8443,PROXY \
      -p feature=+http2 \
      -p http_resp_hdr_len=35M \
      -p http_resp_size=40M \
      -p workspace_backend=40M \
      -p workspace_client=40M \
      -f /etc/varnish/default.vcl \
      -s malloc,512m

I have increased the assigned memory to Varnish and also created a swap memory

sudo fallocate -l 2G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

but it still crashes.

How can I fix this issue?

score 1 · Accepted Answer · answered Dec 05 '23 at 14:41

The -s malloc,512m statement limits the cache storage to 512 MB, but that doesn't mean Varnish will only consume 512 MB of RAM.

Runtime cost

There is a runtime cost to Varnish that grows as the number of active threads increases. Each worker thread that handles a request consumes some memory. The amount is defined through the workspace_client runtime parameter.

Workspace memory

If a backend request is made the workspace_backend value is used to set the size. There is even a workspace_session to set the memory used for handling new TCP connections.

Normally workspace_client and workspace_backend default to 64 KB, which means each thread can use that amount of memory.

In your case, you've set the values of workspace_client and workspace_backend extremely high. If you allow 40 MB of workspace memory, you should multiple that number with the amount of active threads to figure out how much memory Varnish will consume a the minimum.

Threads

Run the following varnishstat command to figure out how many active threads you currently have:

varnishstat -f "MAIN.threads" -1

The number of threads varies between the value of the thread_pool_min runtime parameter and the thread_pool_max runtime parameter. Out of the box this number ranges between 100 and 5000. Keep in mind that there are 2 thread pools, so you have to multiple that number by 2.

If we do the math on a hypothetical use case where all threads are used, it's 40 MB times 10.000 threads. That's 390 GB.

Conclusion

It's clear that you've set the workspace parameters too high. Based on the runtime configuration you need this to handle extremely large request and response headers.

There are 3 ways forward:

Reduce thread_pool_max, but risk queueing requests when the limit has been reached
Reduce the workspace memory size, which also means you won't be able to handle huge request/response headers
Add a lot more memory to your server and maybe add multiple Varnish servers to distribute the load

varnish.service: A process of this unit has been killed by the OOM killer

1 Answers1

Runtime cost

Workspace memory

Threads

Conclusion