[Bug 287895] graphics/nvidia*: cuda/opencl does not work on RTX 5000 series (5070) card.

From: <bugzilla-noreply_at_freebsd.org>
Date: Sat, 05 Jul 2025 04:18:43 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=287895

--- Comment #8 from Tomoaki AOKI <junchoon@dec.sakura.ne.jp> ---
(In reply to Tomasz "CeDeROM" CEDRO from comment #7)
Hm, tried the two you suggested on my ThinkPad P52 with Quadro P1000 (notebook)
which does not have GSP in it.
stable/14, amd64 at commit 8d93877c013fa3bc00b8b9841e545a941e80b2ca.
nvidia-driver-devel and its families at 575.64.03.

Additionally installed ports to test:
  emulators/libc6-shim
  science/linux-ai-ml-env
  benchmarks/clpeak
including dependencies pulled into by the above.

Note that I've modified science/linux-ai-ml-env/Makefile to depend upon -devel
versions of drivers (without it, it tries to pull in x11/nvidia-driver and
x11/linux-nvidia-libs and failed with conflicts with -devel).

As I've just finished massive rebuilding (except too huge leaves), my ports
tree is still at 2a1c595f050d6fb8fb73e14dd0eadf14ed90db58 which is before
latest -devel lands.


% nv-sglrun nvidia-smi
/usr/local/lib/libc6-shim/libc6.so: shim init
Sat Jul  5 13:07:00 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.64.03              Driver Version: 575.64.03      CUDA
Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile
Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util 
Compute M. |
|                                         |                        |           
   MIG M. |
|=========================================+========================+======================|
|   0  Quadro P1000                   Off |   00000000:01:00.0  On |           
      N/A |
| N/A   57C    P3            N/A  / 5001W |     830MiB /   4096MiB |      4%   
  Default |
|                                         |                        |           
      N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                   
          |
|  GPU   GI   CI              PID   Type   Process name                       
GPU Memory |
|        ID   ID                                                              
Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
% nv-sglrun clpeak
/usr/local/lib/libc6-shim/libc6.so: shim init

Platform: NVIDIA CUDA
  Device: Quadro P1000
    Driver version  : 575.64.03 (FreeBSD)
    Compute units   : 4
    Clock frequency : 1518 MHz

    Global memory bandwidth (GBPS)
      float   : 73.85
      float2  : 76.71
      float4  : 77.74
      float8  : 76.96
      float16 : 51.58

    Single-precision compute (GFLOPS)
      float   : 1341.53
      float2  : 1433.50
      float4  : 1435.92
      float8  : 1417.77
      float16 : 1417.83

    No half precision support! Skipped

    Double-precision compute (GFLOPS)
      double   : 45.57
      double2  : 45.84
      double4  : 45.72
      double8  : 45.57
      double16 : 45.25

    Integer compute (GIOPS)
      int   : 477.09
      int2  : 488.65
      int4  : 484.29
      int8  : 473.53
      int16 : 472.34

    Integer compute Fast 24bit (GIOPS)
      int   : 482.61
      int2  : 486.80
      int4  : 487.19
      int8  : 482.19
      int16 : 479.13

    Integer char (8bit) compute (GIOPS)
      char   : 1202.87
      char2  : 1365.46
      char4  : 1380.50
      char8  : 1368.70
      char16 : 1246.58

    Integer short (16bit) compute (GIOPS)
      short   : 1193.72
      short2  : 1359.15
      short4  : 1362.80
      short8  : 1340.00
      short16 : 1360.69

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer              : 11.43
      enqueueReadBuffer               : 11.60
      enqueueWriteBuffer non-blocking : 11.02
      enqueueReadBuffer non-blocking  : 11.15
      enqueueMapBuffer(for read)      : 11.29
        memcpy from mapped ptr        : 11.04
      enqueueUnmap(after write)       : 13.05
        memcpy to mapped ptr          : 10.86

    Kernel launch latency : 5.07 us

%

-- 
You are receiving this mail because:
You are the assignee for the bug.