CURRENT: massive em0 NIC problems since IFLIB changes/introduction
Alexander Leidinger
Alexander at leidinger.net
Fri Mar 17 13:15:24 UTC 2017
Quoting "O. Hartmann" <ohartmann at walstatt.org> (from Fri, 17 Mar 2017
12:20:18 +0100):
> Since the introduction of the IFLIB changes, I realise severe problems on
> CURRENT.
I already reported something like this to sbruno@ and M. Macy (in copy).
> Running the most recent CURRENT (FreeBSD 12.0-CURRENT #27 r315442: Fri Mar 17
> 10:46:04 CET 2017 amd64), the problems on a workstation got severe
> within the
> past two days:
>
> since a couple of weeks the em0 NIC (Intel i217-LM, see below) dies on heavy
> I/O. I realised this first when "rsync"ing poudriere repositories to a remote
> NFSv4 (automounted) folder. The em0 device could be revived by
> ifconfig down/up
> procedure.
> But not the i217-LM chip is affected. On another box equipted with a
> i350 dual
> port GBit NIC I observed a similar behaviour under (artificially)
> high I/O load
> (but I didn't investigate that further since it occured very seldom).
It's not only those chipsets.
It may be beneficial if you could provide the pciconf output for those
devices. Mine is:
---snip---
em0 at pci0:2:6:0: class=0x020000 card=0x13768086 chip=0x107c8086
rev=0x05 hdr=0x00
vendor = 'Intel Corporation'
device = '82541PI Gigabit Ethernet Controller'
---snip---
> Now, since around yesterday, the i217-LM dies without being reviveable with
> ifconfig down/up: Doing so, my FreeBSD CURRENT machine (Fujitsu Celsius M740)
I don't know if for the chip I see this issue with a simple down/up
would help (it's a headless server in a remote datacenter). For the
moment I'm using the workaround of something like "ping -C 1 <gateway>
|| shutdown -r now" in crontab.
The system in question is at r314137.
> remains with a dead em0 device, reporting "no route" in some occasions but
> stuck in the dead state. Every attempt to establish manually the route again
> fails, only rebooting the box gives some relief.
>
> On the console, I have some very strange reports:
>
> - ping reports suddenly about no buffer space
> - or I see sometimes massive occurences of "em0: TX(0) desc avail =
> 1024, pidx
> = 0" on the console
I don't see this in messages or console log, but I see that ntpd can't
resolve hostnames in the logs.
> Either way, sending/receiving large files on an established network GBit line
> which could be saturated by approx 100 MBytes/s tend to make the NIC fail.
I can report that the "svnlite update" on the box of of the FreeBSD
src tree is able to trigger the issue in my case.
I have to add that before the iflib changes I've seen frequent
em-watchdog timeouts in the logs / dmesg. So for me we have two issues
here:
- the hardware wasn't 100% supported before the iflib changes (it seems)
- the iflib changes have lost some watchdog functionality /
auto-failure-recovery feature
Bye,
Alexander.
--
http://www.Leidinger.net Alexander at Leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.org netchild at FreeBSD.org : PGP 0x8F31830F9F2772BF
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: Digitale PGP-Signatur
URL: <http://lists.freebsd.org/pipermail/freebsd-current/attachments/20170317/473ea2ff/attachment.sig>
More information about the freebsd-current
mailing list