CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable
on 6.2]
Guy Brand
gb at isis.u-strasbg.fr
Wed Oct 4 10:34:13 UTC 2006
Craig Boston (craig at feniz.gank.org) on 29/09/2006 at 20:19 wrote:
> One thing this patch definitely did do though, is break the nvidia
> driver pretty badly. Couldn't keep the X server running for more than a
> minute before it froze solid. Lots of Xid: blah blah blah messages.
> Yes I remembered to rebuild the kernel module ;)
Hi,
Since rebuilding to 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #1: Mon
Oct 2 15:24:04 CEST 2006 DEBUG i386 on a box having em sharing
IRQ with nvidia (NVIDIA-FreeBSD-x86-1.0-8756):
interrupt total rate
irq1: atkbd0 5 0
irq14: ata0 47 0
irq16: nvidia0 em+ 86545 185
irq17: fwohci0 7 0
irq21: twe0 6426 13
cpu0: timer 927735 1986
Total 1020765 2185
I freeze the box by starting firefox which reloads a few tabs I keep
open in my session when under X. This is perfectly reproductible.
From the logs, first I see:
Oct 2 16:47:39 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 00010597
Oct 2 16:47:43 mojito kernel: NVRM: Xid (0001:00): 8, Channel 00000000
Oct 2 16:47:47 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 00010598
Oct 2 16:47:55 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 00010599
Oct 2 16:48:03 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059a
Oct 2 16:48:11 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059b
Oct 2 16:48:19 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059c
Oct 2 16:48:27 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059d
Oct 2 16:48:35 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059e
Oct 2 16:48:43 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059f
Oct 2 16:48:52 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a0
then come the watchdogs:
Oct 2 16:48:56 mojito kernel: em0: watchdog timeout -- resetting
Oct 2 16:48:56 mojito kernel: em0: link state changed to DOWN
Oct 2 16:48:58 mojito kernel: em0: link state changed to UP
Oct 2 16:49:00 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a1
Oct 2 16:49:06 mojito kernel: em0: watchdog timeout -- resetting
Oct 2 16:49:06 mojito kernel: em0: link state changed to DOWN
Oct 2 16:49:08 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a2
Oct 2 16:49:08 mojito kernel: em0: link state changed to UP
Oct 2 16:49:16 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a3
Oct 2 16:49:16 mojito kernel: em0: watchdog timeout -- resetting
Oct 2 16:49:16 mojito kernel: em0: link state changed to DOWN
Oct 2 16:49:18 mojito kernel: em0: link state changed to UP
Oct 2 16:49:24 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a4
Oct 2 16:49:26 mojito kernel: em0: watchdog timeout -- resetting
Oct 2 16:49:26 mojito kernel: em0: link state changed to DOWN
Oct 2 16:49:29 mojito kernel: em0: link state changed to UP
Oct 2 16:49:32 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a5
Oct 2 16:49:36 mojito kernel: em0: watchdog timeout -- resetting
Oct 2 16:49:36 mojito kernel: em0: link state changed to DOWN
Oct 2 16:49:39 mojito kernel: em0: link state changed to UP
Oct 2 16:49:47 mojito kernel: em0: watchdog timeout -- resetting
Oct 2 16:49:47 mojito kernel: em0: link state changed to DOWN
Oct 2 16:49:49 mojito kernel: em0: link state changed to UP
and the box ends up frozen less than a minute later. The traffic
on the Intel card can be low (pinging a host for a few dozen of
seconds), medium (reloading a few pages in the tabs of Firefox) or
high (downloading several iso images from our local FTP mirror):
whatever I do, if both nvidia and em0 are used, the box freezes.
Note that I can't freeze the box when doing several simultaneous big
downloads or taring up a lot of files but NOT running X. So I guess
it is a shared nvidia/em IRQ issue.
FreeBSD 6.1-STABLE #0: Fri Jun 23 17:00:43 CEST 2006 had no such problem.
The "DEBUG" kernconf is GENERIC + witness options enabled (but they
do not help in this case).
I traced back to find which changeset introduced the trouble. The
results are:
#*default release=cvs tag=RELENG_6 date=2006.06.23.17.00.00
# OK
...
#*default release=cvs tag=RELENG_6 date=2006.08.08.09.12.56
# OK
#
#*default release=cvs tag=RELENG_6 date=2006.08.08.09.21.00
# BROKEN
...
#*default release=cvs tag=RELENG_6
# BROKEN
From sys commitlogs the culprit commits are:
glebius 2006-08-08 09:19:25 utc
freebsd src repository
modified files: (branch: releng_6)
sys/dev/em if_em.c
log:
sync with head. this includes the following changes in chronological
order:
o a significant performance improvements. the interrupt handler
schedules work to a private taskqueue. the em_rxeof() function
runs lockless.
rev. 1.98 - 1.101 by scottl.
rev. 1.103 by mux
rev. 1.106 by glebius, from andrey v. elsukov <bu7cher yandex.ru>
rev. 1.116 by glebius
o style cleanups:
- rev. 1.102, 1.108, 1.109 by glebius
- rev. 1.124 by pdeuskar
o vendor merges:
- merged with vendor driver version 5.1.5 by jack vogel.
rev. 1.115 by glebius
- merged with vendor driver version 6.0.5 by jack vogel.
rev. 1.123 by glebius
o various fixes:
- invalid use of bus_dma_allocnow
rev. 1.104 by scott, 1.121 by yongari
- link state handling cleanup.
rev. 1.110 by glebius
- fix if_baudrate handling.
rev. 1.111 by glebius
- honor iff_drv_oactive in em_start_locked().
rev. 1.117 by yongari
- protect eeprom access with the driver lock.
rev. 1.118 by yongari
- fix link flap on siocgifaddr.
rev. 1.119 by yongari
- fix dma map handling in em_encap().
rev. 1.120,1.122 by yongari
revision changes path
1.65.2.17 +1587 -1443 src/sys/dev/em/if_em.c
glebius 2006-08-08 09:20:26 utc
freebsd src repository
modified files: (branch: releng_6)
sys/dev/em license readme if_em.h if_em_hw.c
if_em_hw.h if_em_osdep.h
log:
sync with head, merging vendor drivers updates 5.1.5, 6.0.5 by jack vogel.
revision changes path
1.3.2.1 +1 -1 src/sys/dev/em/license
1.10.2.1 +71 -30 src/sys/dev/em/readme
1.32.2.3 +133 -157 src/sys/dev/em/if_em.h
1.16.2.2 +3186 -906 src/sys/dev/em/if_em_hw.c
1.15.2.3 +712 -48 src/sys/dev/em/if_em_hw.h
1.14.2.2 +46 -15 src/sys/dev/em/if_em_osdep.h
I confirmed that by building a kernel from 2006.08.08.09.21.00 which
shows the problem and a kernel from 2006.08.08.09.18.00 which works
like a charm.
Dunno if this could be linked to the em* watchdogs reported in this
thread. Let me know if I can do something useful to help fixing this
issue.
--
bug
More information about the freebsd-stable
mailing list