Re: really slow problem with nvme

From: Bjoern A. Zeeb <bzeeb-lists_at_lists.zabbadoz.net>
Date: Fri, 23 Feb 2024 23:45:55 UTC
On Fri, 23 Feb 2024, Warner Losh wrote:

>> How does this even work?  Do we poll?
>>
>
> Yes. We poll, and poll slowly.  You have an interrupt problem.
>
> On an ARM platform. Fun. ITS and I are old.... foes? Friends? frenemies?
>
> As for why, I don't know. I've been fortunate never to have to chase
> interrupts not working on arm problems....

I do now.  Someone's been clever and loaded a dtb from loader after we
had done changes and apparently firmware hadn't picked it up.
And that doesn't go well if your firmware runs fixups and we do rely
on these in FreeBSD.

Makes me wonder if these FDT regions end up in the excluded memory list or
if that late fixup as we leave boot services could possibly cause some
memory changes we don't want but that's not for here...


>>> Oh, and what's its temperature? Any message in dmesg?
>>
>> Nothing in dmesg, temp seems not too bad.  Took a while to get
>> smartmontools;
>> we have no way to see this in nvmecontrol in human readable form, do we?
>>
>> Temperature Sensor 1:               51 Celsius
>> Temperature Sensor 2:               48 Celsius
>>
>
> A little warm, but not terrible. 50 is where I start to worry a bit, but
> the card won't thermal
> throttle until more like 60.

Yeah, haven't checked but closing the box and pushing it back into the
rack probably helped.

And was it's running in power state 0, which makes me wonder how helpful
that is in the 2-lane setup...

# nvmecontrol power nvme0
Current Power State is 0
Current Workload Hint is 0
# nvmecontrol power -l nvme0

Power States Supported: 5

  #   Max pwr  Enter Lat  Exit Lat RT RL WT WL Idle Pwr  Act Pwr Workloadd
--  --------  --------- --------- -- -- -- -- -------- -------- --
  0:  7.8000W    0.000ms   0.000ms  0  0  0  0  0.0000W  0.0000W 0
  1:  6.0000W    0.000ms   0.000ms  1  1  1  1  0.0000W  0.0000W 0
  2:  3.4000W    0.000ms   0.000ms  2  2  2  2  0.0000W  0.0000W 0
  3:  0.0700W*   0.210ms   1.200ms  3  3  3  3  0.0000W  0.0000W 0
  4:  0.0100W*   2.000ms   8.000ms  4  4  4  4  0.0000W  0.0000W 0

>  We don't currently have a nvmecontrol identify
> field to tell you this
> (I should add it, this is the second time in as many weeks I've wanted it).

Would be awesome :)


-- 
Bjoern A. Zeeb                                                     r15:7