Apparantly its possible to pool the memory of two 3090 using NVLink (although not with 4090). This would make it possible to run large LLM's on consumer hardware.
https://huggingface.co/transformers/v4.9.2/performance.html
Although before I invest into a new GPU, I would like to verify that it actually works, since conventional wisdom used to be that SLI only doubled performance, not memory.
So has anyone tried yet? Whats the token rate?