Re: really slow problem with nvme
- Reply: Warner Losh : "Re: really slow problem with nvme"
- In reply to: Warner Losh : "Re: really slow problem with nvme"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 23 Feb 2024 19:03:00 UTC
On Fri, 23 Feb 2024, Warner Losh wrote:
> On Fri, Feb 23, 2024, 10:46 AM Bjoern A. Zeeb <
> bzeeb-lists@lists.zabbadoz.net> wrote:
>
>> Hi,
>>
>> this is a Samsung SSD 970 EVO Plus 1TB nvme and gpart and newfs
>> were already slow (it took like two hours for newfs).
>>
>> Here's another example now:
>>
>> # /usr/bin/time mkdir foo
>> 1.82 real 0.00 user 0.00 sys
>>
>> How does one debug this?
>>
>
> What filesystem? Sounds like UFS but just making sure. .
yes, ufs
> So what's the link speed and number of lanes? If it's bad i might reseat
> (though that might not help) that looks good...
pciconf I had checked:
nvme0@pci4:1:0:0: class=0x010802 rev=0x00 hdr=0x00 vendor=0x144d device=0xa808 subvendor=0x144d subdevice=0xa801
class = mass storage
subclass = NVM
bar [10] = type Memory, range 64, base 0x40000000, size 16384, enabled
cap 01[40] = powerspec 3 supports D0 D3 current D0
cap 05[50] = MSI supports 1 message, 64 bit
cap 10[70] = PCI-Express 2 endpoint max data 128(256) FLR RO NS
max read 512
link x2(x4) speed 8.0(8.0) ASPM disabled(L1) ClockPM disabled
cap 11[b0] = MSI-X supports 33 messages, enabled
Table in map 0x10[0x3000], PBA in map 0x10[0x2000]
ecap 0001[100] = AER 2 0 fatal 0 non-fatal 0 corrected
ecap 0003[148] = Serial 1 0000000000000000
ecap 0004[158] = Power Budgeting 1
ecap 0019[168] = PCIe Sec 1 lane errors 0
ecap 0018[188] = LTR 1
ecap 001e[190] = L1 PM Substates 1
> Though I'd bet money that this is an interrupt issue. I'd do a vmstat. -i
> to watch how quickly they accumulate...
That I am waiting for a full world to get onto it. I wish I could have
netbooted but not possible there currently.
Only took 15 minutes to extract the tar now. Should have used ddb...
hadn't thought of that before...
# vmstat -ai | grep nvme
its0,0: nvme0:admin 0 0
its0,1: nvme0:io0 0 0
its0,2: nvme0:io1 0 0
its0,3: nvme0:io2 0 0
its0,4: nvme0:io3 0 0
its0,5: nvme0:io4 0 0
its0,6: nvme0:io5 0 0
its0,7: nvme0:io6 0 0
its0,8: nvme0:io7 0 0
How does this even work? Do we poll?
And before you ask:
[1.000407] nvme0: <Generic NVMe Device> mem 0x40000000-0x40003fff at device 0.0 on pci5
[1.000409] nvme0: attempting to allocate 9 MSI-X vectors (33 supported)
[1.000410] nvme0: using IRQs 106-114 for MSI-X
[1.000411] nvme0: CapLo: 0x3c033fff: MQES 16383, CQR, AMS WRRwUPC, TO 60
[1.000412] nvme0: CapHi: 0x00000030: DSTRD 0, NSSRS, CSS 1, CPS 0, MPSMIN 0, MPSMAX 0
[1.000413] nvme0: Version: 0x00010300: 1.3
> How old is the drive? Fresh install? Do other drives have this same issue
> in the same slot? Dies this drive have issues in other maxhines or slots?
The drive is a few months old but only in the box until it went on this
board.
I checked nvmecontrol for anything obvious but didn't see.
> Oh, and what's its temperature? Any message in dmesg?
Nothing in dmesg, temp seems not too bad. Took a while to get smartmontools;
we have no way to see this in nvmecontrol in human readable form, do we?
Temperature Sensor 1: 51 Celsius
Temperature Sensor 2: 48 Celsius
Ok I got a 2nd identical machine netbooted remotely (pressue with
problems often helps) -- slightly different freebsd version and kernel,
same baord, same type of nvme bought together:
# /usr/bin/time dd if=/dev/zero of=/dev/nda0 bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 1.657316 secs (647879880 bytes/sec)
1.66 real 0.00 user 0.94 sys
and ddb> show intrcnt
..
its0,0: nvme0:admin 24
its0,1: nvme0:io0 126
its0,2: nvme0:io1 143
its0,3: nvme0:io2 131
its0,4: nvme0:io3 128
its0,5: nvme0:io4 135
its0,6: nvme0:io5 147
its0,7: nvme0:io6 143
its0,8: nvme0:io7 144
..
I'll try to make sure I can safely access both over the weekend remotely
from a more comforting place and I know where to start looking now...
Thanks!
/bz
--
Bjoern A. Zeeb r15:7