Re: really slow problem with nvme
- Reply: Bjoern A. Zeeb: "Re: really slow problem with nvme"
- In reply to: Bjoern A. Zeeb: "Re: really slow problem with nvme"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 23 Feb 2024 19:16:20 UTC
On Fri, Feb 23, 2024 at 12:03 PM Bjoern A. Zeeb < bzeeb-lists@lists.zabbadoz.net> wrote: > On Fri, 23 Feb 2024, Warner Losh wrote: > > > On Fri, Feb 23, 2024, 10:46 AM Bjoern A. Zeeb < > > bzeeb-lists@lists.zabbadoz.net> wrote: > > > >> Hi, > >> > >> this is a Samsung SSD 970 EVO Plus 1TB nvme and gpart and newfs > >> were already slow (it took like two hours for newfs). > >> > >> Here's another example now: > >> > >> # /usr/bin/time mkdir foo > >> 1.82 real 0.00 user 0.00 sys > >> > >> How does one debug this? > >> > > > > What filesystem? Sounds like UFS but just making sure. . > > yes, ufs > > > So what's the link speed and number of lanes? If it's bad i might reseat > > (though that might not help) that looks good... > > pciconf I had checked: > > nvme0@pci4:1:0:0: class=0x010802 rev=0x00 hdr=0x00 vendor=0x144d > device=0xa808 subvendor=0x144d subdevice=0xa801 > class = mass storage > subclass = NVM > bar [10] = type Memory, range 64, base 0x40000000, size 16384, > enabled > cap 01[40] = powerspec 3 supports D0 D3 current D0 > cap 05[50] = MSI supports 1 message, 64 bit > cap 10[70] = PCI-Express 2 endpoint max data 128(256) FLR RO NS > max read 512 > link x2(x4) speed 8.0(8.0) ASPM disabled(L1) ClockPM > disabled > cap 11[b0] = MSI-X supports 33 messages, enabled > Table in map 0x10[0x3000], PBA in map 0x10[0x2000] > ecap 0001[100] = AER 2 0 fatal 0 non-fatal 0 corrected > ecap 0003[148] = Serial 1 0000000000000000 > ecap 0004[158] = Power Budgeting 1 > ecap 0019[168] = PCIe Sec 1 lane errors 0 > ecap 0018[188] = LTR 1 > ecap 001e[190] = L1 PM Substates 1 > x4 card in a x2 slot. If that's intentional, then this looks good. > > > Though I'd bet money that this is an interrupt issue. I'd do a vmstat. -i > > to watch how quickly they accumulate... > > That I am waiting for a full world to get onto it. I wish I could have > netbooted but not possible there currently. > > Only took 15 minutes to extract the tar now. Should have used ddb... > hadn't thought of that before... > > # vmstat -ai | grep nvme > its0,0: nvme0:admin 0 0 > its0,1: nvme0:io0 0 0 > its0,2: nvme0:io1 0 0 > its0,3: nvme0:io2 0 0 > its0,4: nvme0:io3 0 0 > its0,5: nvme0:io4 0 0 > its0,6: nvme0:io5 0 0 > its0,7: nvme0:io6 0 0 > its0,8: nvme0:io7 0 0 > > How does this even work? Do we poll? > Yes. We poll, and poll slowly. You have an interrupt problem. On an ARM platform. Fun. ITS and I are old.... foes? Friends? frenemies? As for why, I don't know. I've been fortunate never to have to chase interrupts not working on arm problems.... > And before you ask: > > [1.000407] nvme0: <Generic NVMe Device> mem 0x40000000-0x40003fff at > device 0.0 on pci5 > [1.000409] nvme0: attempting to allocate 9 MSI-X vectors (33 supported) > [1.000410] nvme0: using IRQs 106-114 for MSI-X > [1.000411] nvme0: CapLo: 0x3c033fff: MQES 16383, CQR, AMS WRRwUPC, TO 60 > [1.000412] nvme0: CapHi: 0x00000030: DSTRD 0, NSSRS, CSS 1, CPS 0, MPSMIN > 0, MPSMAX 0 > [1.000413] nvme0: Version: 0x00010300: 1.3 > Yea, that's what I'd expect. > > How old is the drive? Fresh install? Do other drives have this same issue > > in the same slot? Dies this drive have issues in other maxhines or slots? > > The drive is a few months old but only in the box until it went on this > board. > > I checked nvmecontrol for anything obvious but didn't see. > OK. So not 'super old nand in its death throes being slow" > > Oh, and what's its temperature? Any message in dmesg? > > Nothing in dmesg, temp seems not too bad. Took a while to get > smartmontools; > we have no way to see this in nvmecontrol in human readable form, do we? > > Temperature Sensor 1: 51 Celsius > Temperature Sensor 2: 48 Celsius > A little warm, but not terrible. 50 is where I start to worry a bit, but the card won't thermal throttle until more like 60. We don't currently have a nvmecontrol identify field to tell you this (I should add it, this is the second time in as many weeks I've wanted it). Ok I got a 2nd identical machine netbooted remotely (pressue with > problems often helps) -- slightly different freebsd version and kernel, > same baord, same type of nvme bought together: > > # /usr/bin/time dd if=/dev/zero of=/dev/nda0 bs=1M count=1024 > 1024+0 records in > 1024+0 records out > 1073741824 bytes transferred in 1.657316 secs (647879880 bytes/sec) > 1.66 real 0.00 user 0.94 sys > > and ddb> show intrcnt > .. > its0,0: nvme0:admin 24 > its0,1: nvme0:io0 126 > its0,2: nvme0:io1 143 > its0,3: nvme0:io2 131 > its0,4: nvme0:io3 128 > its0,5: nvme0:io4 135 > its0,6: nvme0:io5 147 > its0,7: nvme0:io6 143 > its0,8: nvme0:io7 144 Yea, that's what I'd expect/ Dozens to hundreds of interrupts. I'll try to make sure I can safely access both over the weekend remotely > from a more comforting place and I know where to start looking now... > > Thanks! > No problem! Warner