Re: Trying to boot a supermicro H8DMT board

From: Willem Jan Withagen <wjw_at_digiware.nl>
Date: Mon, 17 Jan 2022 18:45:13 UTC
On 17-1-2022 18:14, Tomoaki AOKI wrote:
> On Mon, 17 Jan 2022 15:04:16 +0100
> Willem Jan Withagen <wjw@digiware.nl> wrote:
>
>> On 17-1-2022 14:46, Eugene Grosbein wrote:
>>> 17.01.2022 20:24, Willem Jan Withagen wrote:
>>>
>>>>> Well, perform independent hardware (memory) testing with something like memtest86+
>>>>> and if it is all right, you show ask someone more knowledgeable. Maybe CC: arch@freebsd.org
>>>> Perhaps should have done that when I started, but supplier assured me that
>>>> the they just retired the boards with out any issues.
>>>> Memtest86 found the faulty DIMM in 30 secs...
>>>>
>>>> Not sure if we could/want educate vm_mem_init() to actually detect this.
>>>> It is still in the part where everthing is still running on the first CPU.
>>>> Making things a bit easier to understand what is going on.
>>>>
>>>> Lets see if the box will run on 3 DIMMs for the rime being.
>>>> Then figure out with DMIdecode what we need expand again.
>>> Is it ECC memory or non-ECC?
>>> The kernel already have full memory testing performed at boot time
>>> unless disabled with another loader knob:
>>>
>>> hw.memtest.tests=0
>>>
>>> Try booting it with memory testing disabled and without hw.physmem limitation.
>>> Maybe it will boot.
>>>
>>> With ECC, it could be hardware interrupt while kernel runs that test
>>> and wrong in-kernel processing of the interrupt.
>> Swapped the DIMM with 3 others, but still the same errors.
>> Then I changed DIMM slot, and the errors went away.
>> So definitely a hardware issue
>>
>> when booted FreeBSD reported already only 12Gb in system ( there are 4
>> 4GB dimms)
>> Using 8Gb. DIMMs are ECC.
>> But then still it would only boot when mem set to 8G.
>>
>> Waiting for memtest to finish at least one pass.
>> Usually that will take quite some time.
>>
>> --WjW
>>
>>
> Not sure this is the case, but some motherboards have severe limitation
> about DIMM slot usage, if not fully used.
>
> For example, assuming slot No. are B0-0, 1, 2, 3 and B1-0, 1, 2, 3,
>
>   *Must use "interleaved. If 4 in 8 slots are to be used,
>    B0-0, B0-2, B1-0, B1-2 shall be used.
>    (Some forced B0-1, B0-3, B1-1, B1-3, IIRC)
>
>   *Must NOT use "interleaved.
>    B0-0, B0-1, B1-0, B1-1 shall be used.
>
>   *Must NOT use B1 unless B0 is full of DIMs.
>    B0-0. B0-1, B0-2, B0-3 shall be used.
>
> and so on, depending on motherboard vendor (at worst, per model.)

Yup, I know... I used the board in the configuration I got it.
And its a DUAL processor board with 2 opterons.
The config works correct for the first Opteron (Called CPU1)
using slots: CPU1/DIMM1A and CPU1/DIMM1B
But on the second CPU I have to use the third slot....
so using slots: CPU2/DIMM1B and CPU2/DIMM2B

And my memtest86 has complete 1 full pass over 16G without errors.
So I'm guessing that the order is not majorly picky.

But you are correct in noting this, so I will read up ont this in the 
manual.

Thanx,
--WjW