[Bug 287453] x11/nvidia-driver: Vulkan and OpenGL are broken on RTX 5000 series
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sat, 08 Nov 2025 16:35:13 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=287453 --- Comment #29 from Sean Farley <scf@FreeBSD.org> --- (In reply to Tomasz "CeDeROM" CEDRO from comment #28) Actually, llama.cpp uses Vulkan successfully without needing CUDA: ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = NVIDIA GeForce RTX 4070 (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2 build: 6795 (unknown) with FreeBSD clang version 19.1.7 (https://github.com/llvm/llvm-project.git llvmorg-19.1.7-0-gcd708029e0b2) for x86_64-unknown-freebsd14.3 system info: n_threads = 28, n_threads_batch = 28, total_threads = 28 llama.cpp works for me on my RTX 4070 (12GB) even though I have the same issue of having to restart it at least once to actually start processing requests. The first time, it will claim to be processing but never succeed. I wonder if it is a timing issue due to a loading a large LLM. The second time it is run, the LLM file has a good portion cached in memory to make loading faster. The options I pass to llama.cpp: --device Vulkan0 --no-warmup --n-gpu-layers 2 --threads 28 --mlock The LLM is about 46GB and takes awhile to load in a 14700K system with 128GB RAM, but longer the first time. -- You are receiving this mail because: You are the assignee for the bug.