Server doesn't boot when 3 PCIe slots are populated
Mehmet Erol Sanliturk
m.e.sanliturk at gmail.com
Mon Jan 15 16:59:37 UTC 2018
On Mon, Jan 15, 2018 at 7:31 PM, Valeri Galtsev <galtsev at kicp.uchicago.edu>
wrote:
>
> On Mon, January 15, 2018 3:44 am, Mehmet Erol Sanliturk wrote:
> > On Mon, Jan 15, 2018 at 9:44 AM, Grzegorz Junka <list1 at gjunka.com>
> wrote:
> >
> >>
> >> On 15/01/2018 06:18, Warner Losh wrote:
> >>
> >>>
> >>>
> >>> On Jan 14, 2018 11:05 PM, "Grzegorz Junka" <list1 at gjunka.com <mailto:
> >>> list1 at gjunka.com>> wrote:
> >>>
> >>>
> >>> On 14/01/2018 16:18, Mehmet Erol Sanliturk wrote:
> >>>
> >>>
> >>>
> >>> On Sun, Jan 14, 2018 at 5:46 PM, Grzegorz Junka
> >>> <list1 at gjunka.com <mailto:list1 at gjunka.com>
> >>> <mailto:list1 at gjunka.com <mailto:list1 at gjunka.com>>> wrote:
> >>>
> >>>
> >>> On 13/01/2018 17:56, Mehmet Erol Sanliturk wrote:
> >>>
> >>>
> >>>
> >>> On Sat, Jan 13, 2018 at 7:21 PM, Grzegorz Junka
> >>> <list1 at gjunka.com <mailto:list1 at gjunka.com>
> >>> <mailto:list1 at gjunka.com <mailto:list1 at gjunka.com>>
> >>> <mailto:list1 at gjunka.com <mailto:list1 at gjunka.com>
> >>> <mailto:list1 at gjunka.com <mailto:list1 at gjunka.com>>>> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I am installing a FreeBSD server based on
> >>> Supermicro H8SML-iF.
> >>> There are three PCIe slots to which I installed 2
> >>> NVMe
> >>> drives and
> >>> one network card Intel I350-T4 (with 4 Ethernet
> >>> slots).
> >>>
> >>> I am observing a strange behavior where the system
> >>> doesn't
> >>> boot if
> >>> all three PCIe slots are populated. It shows this
> >>> message:
> >>>
> >>> nvme0: <Generic NVMe Device> mem
> >>> 0xfd8fc000-0xfd8fffff irq
> >>> 24 at
> >>> device 0.0 on pci1
> >>> nvme0: controller ready did not become 1 within
> >>> 30000 ms
> >>> nvme0: did not complete shutdown within 5 seconds
> >>> of
> >>> notification
> >>>
> >>> The I see a kernel panic/dump and the system
> >>> reboots after
> >>> 15 seconds.
> >>>
> >>> If I remove one card, either one of the NVMe
> >>> drives or the
> >>> network
> >>> card, the system boots fine. Also, if in BIOS I
> >>> set PnP OS
> >>> to YES
> >>> then sometimes it boots (but not always). If I set
> >>> PnP OS
> >>> to NO,
> >>> and all three cards are installed, the system
> >>> never boots.
> >>>
> >>> When the system boots OK I can see that the
> >>> network card is
> >>> reported as 4 separate devices on one of the PCIe
> >>> slots. I
> >>> tried
> >>> different NVMe drives as well as changing which
> >>> device is
> >>> installed to which slot but the result seems to be
> >>> the
> >>> same in any
> >>> case.
> >>>
> >>> What may be the issue? Amount of power drawn by the
> >>> hardware? Too
> >>> many devices not supported by the motherboard? Too
> >>> many
> >>> interrupts
> >>> for the FreeBSD kernel to handle?
> >>>
> >>> Any help would be greatly appreciated.
> >>>
> >>> GregJ
> >>>
> >>> _______________________________________________
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> From my experience from other trade marked main boards
> >>> , an
> >>> action may be to check manual of your server board to
> >>> see
> >>> whether there are rules about use of these slots :
> >>> Sometimes
> >>> differently shaped slots are supplied with same ports
> >>> : If one
> >>> slot is occupied , the other slot should be left open ,
> >>> or
> >>> rules about not to insert such a kind of device into a
> >>> slot ,
> >>> for example , graphic cards .
> >>>
> >>>
> >>> Mehmet Erol Sanliturk
> >>>
> >>>
> >>> I checked the manual but couldn't find any restrictions
> >>> regarding
> >>> PCIe ports. It only says how many lanes are available in
> >>> each
> >>> slot. Would there be any obvious BIOS setting that could
> >>> cause
> >>> this issue? I tried after resetting BIOS to default
> >>> settings but
> >>> maybe something is set incorrectly by default?
> >>>
> >>> GregJ
> >>> _______________________________________________
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> http://www.supermicro.com/Aplus/motherboard/Opteron3000/SR56
> >>> x0/H8SML-iF.cfm
> >>> <http://www.supermicro.com/Aplus/motherboard/Opteron3000/SR5
> >>> 6x0/H8SML-iF.cfm>
> >>> H8SML-iF
> >>>
> >>>
> >>> On the above page , click "OS Compatibility"
> >>>
> >>>
> >>> On the following page , click "SR5650"
> >>>
> >>> http://www.supermicro.com/Aplus/support/resources/OS/OS_Comp
> >>> _SR5650.cfm
> >>> <http://www.supermicro.com/Aplus/support/resources/OS/OS_Com
> >>> p_SR5650.cfm>
> >>> OS Compatibility Chart
> >>>
> >>>
> >>> On the column ( third )
> >>>
> >>> H8SML-7F
> >>> H8SML-7
> >>> H8SML-iF
> >>> H8SML-i
> >>>
> >>>
> >>> there listed only *
> >>> *
> >>> **
> >>> *
> >>> *
> >>> *
> >>> *
> >>>
> >>> FreeBSD 8.0
> >>> FreeBSD 9.1
> >>>
> >>> From this list , it may be said that , this mother board date
> >>> is old , means , it seems that the new OS versions are not
> >>> tested after currently tested OS versions .
> >>>
> >>>
> >>> To check interaction between operating system and your
> >>> Supermicro H8SML-iF , select one of the suitable operating
> >>> system ( Unix class OSes are more suitable ) for you and
> >>> tested on this card , and try to install it as you like your
> >>> installed components . If it boots successfully , it means
> >>> that there is an incompatibility between your FreeBSD and the
> >>> main board . If no one of them boots , then you may conclude
> >>> that , there is a problem in your settings .
> >>>
> >>>
> >>> BIOS settings are important , because , OS communicates with
> >>> the main board through these settings .
> >>>
> >>>
> >>> In manual ( downloaded from the above page :
> >>> Manual Revision 1.0c
> >>> Release Date: March 12, 2014 ) , page 4-9 , "PCI/PnP
> >>> Configuration" is defined .
> >>> If PnP is selected YES. OS adjusts some device settings . If
> >>> NO is selected , BIOS adjusts some device settings . When BIOS
> >>> adjusted device settings are not conforming to OS parameters ,
> >>> the result will be "FAIL" .
> >>>
> >>> Therefore , more suitable selection is YES .
> >>>
> >>>
> >>> Another point is that , there are many more BIOS selectable
> >>> parameters and jumpers about PCI slots and others .
> >>> There are some BIOS settings for PCI slots :
> >>>
> >>> PCI X4 Slot 6 ( page 4-9 )
> >>> PCI x8 Slot 7 ( page 4-10 )
> >>>
> >>>
> >>>
> >>> Please review these BIOS settings in your manual and set them
> >>> with respect to your requirements .
> >>>
> >>>
> >>> Thanks Mehmet for looking into this. It's an old motherboard but
> >>> my point is that it boots fine when either: one NVMe and the
> >>> network card, or both NVMe are installed, but not when all three
> >>> are installed. How would that be related to FreeBSD compatibility?
> >>> The chipset and all devices that I am trying to install are
> >>> supported by FreeBSD 11.x.
> >>>
> >>> I just tried booting into a Debian live system and it also didn't
> >>> enumerate NVMe drives properly. This means that it's not FreeBSD
> >>> related and is no longer relevant for this list. I will try to
> >>> play with BIOS settings to see if I can make it work that way.
> >>> Thanks for all the help.
> >>>
> >>>
> >>>
> >>> Nvme drives are weird about power. I distrust the power estimate of
> >>> 5-9w
> >>> earlier in the thread... given the oddity with debian, it's not too
> >>> crazy
> >>> to think that. How far does FreeBSD boot though?
> >>>
> >>>
> >> I tried with a different power supply but the outcome was exactly the
> >> same. Sometimes FreeBSD boots fine but one of the NVMe drives is not
> >> visible (i.e. dmesg grep shows only one NVMe). When it doesn't work it
> >> boots up to the point of enumerating drives (SATA, USB, NVMe). Then it
> >> stops at the first NVMe and reboots.
> >>
> >> The funny thing is that very often it's enough to pull out one of the
> >> cards and put it back in. Then the system boots fine with all three
> >> cards.
> >> I had that a few times. Once it's booted it works, I can restart the
> >> system
> >> and it boots every time. As soon as I power off, unplug from the power
> >> main, wait a few minutes and power it on again, the issue comes back -
> >> can't boot as NVMe can't be enumerated.
> >>
> >> I though it might be caused by the hardware being too cold. I left the
> >> server once overnight but it didn't boot up, it was trying and
> >> restarting
> >> the whole night.
> >>
> >> GregJ
> >>
> >>
> >> _______________________________________________
> >>
> >>
> >
> >
> >
> > The above explanation brings mind to the "impedance mismatch in
> > electronics" problem .
>
> Hm, I wouldn't say so. First of all, I will seriously doubt that sane
> cards are out of specs as far as impedance is concerned.
>
> But before going further, let's make sure we talk about the same thing. I
> assume impedance mismatch is what is related to impedance of the load
> attached to transmission line to be different from impedance of
> transmission line itself. In such case part of transmitted signal is
> reflected from the load back into transmission line. This can make mess as
> transmitted signal is mixed with this reflected at different positions of
> the loads along the same transmission line. One has to have really large
> mismatch (over 20% at least) to make that matter. Many of us remember this
> in at least two computer related cases: 1. we used terminators at the end
> of SCSI cables (or attached "self-terminating SCSI device to the end of
> line). 2. In some system boards in which memory buses had no terminators
> the manual would say to populate slots beginning from the fartherst away
> from CPU (to defeat reflection from open end of memory bus lines).
>
> I have never heard of anything like that on PCI express bus. If I am
> wrong, could you give some pointer so I can read about it.
>
> Thanks in advance for pointers! (I know: you learn something every day -
> which I bet I am about to ;-)
>
> Valeri
>
> >
> > ( Please search
> >
> >
> > impedance mismatch in electronics
> > impedance matching in electronics
> >
> >
> > in Internet if you want explanations about them . )
> >
> >
> > When all of these cards are inserted into slots simultaneously , their
> > accumulated electronic effect may distort behaviour of your mother board
> > circuits or attached card circuit(s) .
> >
> >
> > Therefore , if you can find another NVMe and/or network card , please
> test
> > their effect .
> > Such tests may be inconclusive because mother board circuits may be
> > affected negatively from "properly" operating add on cards when they are
> > inserted together .
> >
> >
> > If it is feasible for you , you may use USB attached network card(s) to
> > eliminate network card attachment .
> > Or you may use a more capable one NVMe card instead of two smaller NVMe
> > cards , or you may use only one of them , or/and select an SATA SSD .
> > Such a choice would save your investment and produces a working server
> > with
> > a "little" loss when compared to "all" .
> >
> >
> >
> >
> > Mehmet Erol Sanliturk
> > _______________________________________________
> > freebsd-questions at freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-questions
> > To unsubscribe, send any mail to
> > "freebsd-questions-unsubscribe at freebsd.org"
> >
>
>
> ++++++++++++++++++++++++++++++++++++++++
> Valeri Galtsev
> Sr System Administrator
> Department of Astronomy and Astrophysics
> Kavli Institute for Cosmological Physics
> University of Chicago
> Phone: 773-702-4247
> ++++++++++++++++++++++++++++++++++++++++
>
The problem of "impedance matching" occurs between any two interacting
circuits : When a circuit gives its "output" to another circuit as "input"
there exists this problem irrespective of subjects and kinds of circuits .
Obviously , behaviours are not exactly the same .
If you search the following phrase in Internet , you will find a large
amount of links :
impedance matching circuit design
If we think a computer main board slots , the following may occur :
Assume a slot has a voltage level for triggering input into an add on card
, i.e. , add on card is affected when it senses a voltage level equal or
greater than that level . The lower level values will not trigger the add
on card .
Assume an add on card is working .
Assume a new add on card is also working alone .
When both of these add on cards are inserted into slots , the power drawn
will lower the voltage level of the surrounding circuit more than a single
card .
If this lowered voltage level is less than threshold level of the added
cards ( one of them , or both of them ) it ( they ) will not sense the
signals from the surrounding circuits . Therefore , it (they) will not
respond to the action requesting signals .
In one of the previous messages ,
https://lists.freebsd.org/pipermail/freebsd-questions/2018-January/280455.html
it is said that
"
I am observing a strange behavior where the system doesn't boot if all
three PCIe slots are populated. It shows this message:
nvme0: <Generic NVMe Device> mem 0xfd8fc000-0xfd8fffff irq 24 at device
0.0 on pci1
nvme0: controller ready did not become 1 within 30000 ms
nvme0: did not complete shutdown within 5 seconds of notification
The I see a kernel panic/dump and the system reboots after 15 seconds.
If I remove one card, either one of the NVMe drives or the network card,
the system boots fine.
"
A good example may be the above message .
Mehmet Erol Sanliturk
More information about the freebsd-questions
mailing list