Re: really slow problem with nvme

From: Bjoern A. Zeeb <bzeeb-lists_at_lists.zabbadoz.net>
Date: Fri, 23 Feb 2024 19:03:00 UTC
On Fri, 23 Feb 2024, Warner Losh wrote:

> On Fri, Feb 23, 2024, 10:46 AM Bjoern A. Zeeb <
> bzeeb-lists@lists.zabbadoz.net> wrote:
>
>> Hi,
>>
>> this is a Samsung SSD 970 EVO Plus 1TB nvme and gpart and newfs
>> were already slow (it took like two hours for newfs).
>>
>> Here's another example now:
>>
>> # /usr/bin/time mkdir foo
>>          1.82 real         0.00 user         0.00 sys
>>
>> How does one debug this?
>>
>
> What filesystem? Sounds like UFS but just making sure. .

yes, ufs

> So what's the link speed and number of lanes?  If it's bad i might reseat
> (though that might not help) that looks good...

pciconf I had checked:

nvme0@pci4:1:0:0:       class=0x010802 rev=0x00 hdr=0x00 vendor=0x144d device=0xa808 subvendor=0x144d subdevice=0xa801
     class      = mass storage
     subclass   = NVM
     bar   [10] = type Memory, range 64, base 0x40000000, size 16384, enabled
     cap 01[40] = powerspec 3  supports D0 D3  current D0
     cap 05[50] = MSI supports 1 message, 64 bit
     cap 10[70] = PCI-Express 2 endpoint max data 128(256) FLR RO NS
                  max read 512
                  link x2(x4) speed 8.0(8.0) ASPM disabled(L1) ClockPM disabled
     cap 11[b0] = MSI-X supports 33 messages, enabled
                  Table in map 0x10[0x3000], PBA in map 0x10[0x2000]
     ecap 0001[100] = AER 2 0 fatal 0 non-fatal 0 corrected
     ecap 0003[148] = Serial 1 0000000000000000
     ecap 0004[158] = Power Budgeting 1
     ecap 0019[168] = PCIe Sec 1 lane errors 0
     ecap 0018[188] = LTR 1
     ecap 001e[190] = L1 PM Substates 1


> Though I'd bet money that this is an interrupt issue. I'd do a vmstat. -i
> to watch how quickly they accumulate...

That I am waiting for a full world to get onto it.  I wish I could have
netbooted but not possible there currently.

Only took 15 minutes to extract the tar now.  Should have used ddb...
hadn't thought of that before...

# vmstat -ai | grep nvme
its0,0: nvme0:admin                                       0          0
its0,1: nvme0:io0                                         0          0
its0,2: nvme0:io1                                         0          0
its0,3: nvme0:io2                                         0          0
its0,4: nvme0:io3                                         0          0
its0,5: nvme0:io4                                         0          0
its0,6: nvme0:io5                                         0          0
its0,7: nvme0:io6                                         0          0
its0,8: nvme0:io7                                         0          0

How does this even work?  Do we poll?

And before you ask:

[1.000407] nvme0: <Generic NVMe Device> mem 0x40000000-0x40003fff at device 0.0 on pci5
[1.000409] nvme0: attempting to allocate 9 MSI-X vectors (33 supported)
[1.000410] nvme0: using IRQs 106-114 for MSI-X
[1.000411] nvme0: CapLo: 0x3c033fff: MQES 16383, CQR, AMS WRRwUPC, TO 60
[1.000412] nvme0: CapHi: 0x00000030: DSTRD 0, NSSRS, CSS 1, CPS 0, MPSMIN 0, MPSMAX 0
[1.000413] nvme0: Version: 0x00010300: 1.3


> How old is the drive? Fresh install? Do other drives have this same issue
> in the same slot? Dies this drive have issues in other maxhines or slots?

The drive is a few months old but only in the box until it went on this
board.

I checked nvmecontrol for anything obvious but didn't see.

> Oh, and what's its temperature? Any message in dmesg?

Nothing in dmesg, temp seems not too bad.  Took a while to get smartmontools;
we have no way to see this in nvmecontrol in human readable form, do we?

Temperature Sensor 1:               51 Celsius
Temperature Sensor 2:               48 Celsius



Ok I got a 2nd identical machine netbooted remotely (pressue with
problems often helps) -- slightly different freebsd version and kernel,
same baord, same type of nvme bought together:

# /usr/bin/time dd if=/dev/zero of=/dev/nda0 bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 1.657316 secs (647879880 bytes/sec)
         1.66 real         0.00 user         0.94 sys

and ddb> show intrcnt
..
its0,0: nvme0:admin                     24
its0,1: nvme0:io0                       126
its0,2: nvme0:io1                       143
its0,3: nvme0:io2                       131
its0,4: nvme0:io3                       128
its0,5: nvme0:io4                       135
its0,6: nvme0:io5                       147
its0,7: nvme0:io6                       143
its0,8: nvme0:io7                       144
..


I'll try to make sure I can safely access both over the weekend remotely
from a more comforting place and I know where to start looking now...

Thanks!
/bz

-- 
Bjoern A. Zeeb                                                     r15:7