[Bug 287453] x11/nvidia-driver: Vulkan and OpenGL are broken on RTX 5000 series

From: <bugzilla-noreply_at_freebsd.org>
Date: Sat, 08 Nov 2025 16:35:13 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=287453

--- Comment #29 from Sean Farley <scf@FreeBSD.org> ---
(In reply to Tomasz "CeDeROM" CEDRO from comment #28)
Actually, llama.cpp uses Vulkan successfully without needing CUDA:

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 4070 (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1
| warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
build: 6795 (unknown) with FreeBSD clang version 19.1.7
(https://github.com/llvm/llvm-project.git llvmorg-19.1.7-0-gcd708029e0b2) for
x86_64-unknown-freebsd14.3
system info: n_threads = 28, n_threads_batch = 28, total_threads = 28

llama.cpp works for me on my RTX 4070 (12GB) even though I have the same issue
of having to restart it at least once to actually start processing requests. 
The first time, it will claim to be processing but never succeed.  I wonder if
it is a timing issue due to a loading a large LLM.  The second time it is run,
the LLM file has a good portion cached in memory to make loading faster.

The options I pass to llama.cpp:  --device Vulkan0 --no-warmup --n-gpu-layers 2
--threads 28 --mlock
The LLM is about 46GB and takes awhile to load in a 14700K system with 128GB
RAM, but longer the first time.

-- 
You are receiving this mail because:
You are the assignee for the bug.