Strange TCP-related hang on startup w/ recent CVSUP

Terry Kennedy terry at tmk.com
Wed Jun 8 05:06:59 GMT 2005


  I have a number of boxes which have been running well w/ 5.4-stable. The 
last kernel and userland builds on those boxes were on May 26th, from a 
CVSUP done that same day.

  Last night I did another CVSUP and the usual buildworld/buildkernel/in-
stallkernel/installworld/mergemaster, and the boxes wouldn't come back up,
hanging in sbwait state at the point where they would normally do their
first net access.

  In one box, this is a batch of NFS mounts; on the other it is the initial
ntpdate query. At this point the boxes don't respond to pings and sit for-
ever (at least an hour) not doing anything. Typing ^C on the console aborts
the hanging process, and then the startup proceeds. With a ping going from
another system, I see that the net doesn't come up on the problem boxes until
several seconds into the execution of the next network-related command - it
is almost as if the previous ifconfig (done from rc.conf) didn't have any
effect (despite the "link up" message on the console). The other oddity is
that I get a "rpc.lockd: 100024 RPC: Port mapper failure" right after lockd
starts, even though other net commands have completed successfully.

  Booting from kernel.old (the May 26th one) boots normally, so it is a ker-
nel issue, not something in userland.

  Hardware is a Tyan S2721-533 (dual Xeon) board w/ onboard Intel "em" Gig-
abit Ethernet. To keep this message short, I'm omitting the dmesg output,
but it can be found at http://www.tmk.com/raidzilla.

  Here's the relevant parts of the console output from the two systems:

first system:

/dev/da1s3h: clean, 199137891 free (819 frags, 24892134 blocks, 0.0% fragmentation)
Setting hostname: rz1.tmk.com.
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        options=b<RXCSUM,TXCSUM,VLAN_MTU>
        inet 204.141.35.40 netmask 0xffffff00 broadcast 204.141.35.255
        inet6 fe80::2e0:81ff:fe28:94d6%em0 prefixlen 64 tentative scopeid 0x1 
        ether 00:e0:81:28:94:d6
        media: Ethernet 1000baseTX <full-duplex> (autoselect)
        status: no carrier
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
        inet 127.0.0.1 netmask 0xff000000 
        inet6 ::1 prefixlen 128 
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 
add net default: gateway 204.141.35.1
Additional routing options:.
Starting devd.
Mounting NFS file systems:em0: Link is up 1000 Mbps Full Duplex
load: 1.10  cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k
load: 1.10  cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k
load: 1.10  cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k
load: 1.10  cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k
load: 1.01  cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k
load: 1.01  cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k
load: 0.56  cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k
load: 0.56  cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k
load: 0.07  cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k
^CScript /etc/rc.d/mountcritremote interrupted
Starting syslogd.
Checking for core dump on /dev/da0s1b...
savecore: no dumps found
Setting date via ntp.
Looking for host gate.tmk.com and service ntp
host found : gate.tmk.com
Looking for host 204.141.35.61 and service ntp
host found : gate.tmk.com
Looking for host 199.224.0.146 and service ntp
host found : ns1.ispnetinc.net
Looking for host 199.224.0.154 and service ntp
host found : ns2.ispnetinc.net
Looking for host 204.141.40.135 and service ntp
host found : ns3.ispnetinc.net
 8 Jun 00:30:34 ntpdate[416]: step time server 199.224.0.146 offset 1.526419 sec
Starting rpcbind.
NFS access cache time=2
ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/X11R6/lib /usr/local/lib
a.out ldconfig path: /usr/lib/aout /usr/lib/compat/aout /usr/X11R6/lib/aout
Starting mountd.
Starting nfsd.
Starting statd.
Starting lockd.
Jun  8 00:30:35 rz1 rpc.lockd: 100024 RPC: Port mapper failure
Starting usbd.
Starting local daemons:.
Updating motd.
Starting ntpd.
Starting rwhod.
Configuring syscons: blanktime screensaver.

second system:

/dev/da1s3h: clean, 199127439 free (1095 frags, 24890793 blocks, 0.0% fragmentation)
Setting hostname: rz2.tmk.com.
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        options=b<RXCSUM,TXCSUM,VLAN_MTU>
        inet 204.141.35.41 netmask 0xffffff00 broadcast 204.141.35.255
        inet6 fe80::2e0:81ff:fe2e:67b0%em0 prefixlen 64 tentative scopeid 0x1 
        ether 00:e0:81:2e:67:b0
        media: Ethernet autoselect
        status: no carrier
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
        inet 127.0.0.1 netmask 0xff000000 
        inet6 ::1 prefixlen 128 
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 
add net default: gateway 204.141.35.1
Additional routing options:.
Starting devd.
Mounting NFS file systems:.
Starting syslogd.
Checking for core dump on /dev/da0s1b...
savecore: no dumps found
Setting date via ntp.
Looking for host gate.tmk.com and service ntp
em0: Link is up 1000 Mbps Full Duplex
load: 0.75  cmd: ntpdate 420 [sbwait] 0.00u 0.00s 0% 844k
load: 0.75  cmd: ntpdate 420 [sbwait] 0.00u 0.00s 0% 844k
load: 0.75  cmd: ntpdate 420 [sbwait] 0.00u 0.00s 0% 844k
load: 0.75  cmd: ntpdate 420 [sbwait] 0.00u 0.00s 0% 844k
load: 0.69  cmd: ntpdate 420 [sbwait] 0.00u 0.00s 0% 844k
load: 0.64  cmd: ntpdate 420 [sbwait] 0.00u 0.00s 0% 844k
load: 0.64  cmd: ntpdate 420 [sbwait] 0.00u 0.00s 0% 844k
load: 0.02  cmd: ntpdate 420 [sbwait] 0.00u 0.00s 0% 844k
^CScript /etc/rc.d/ntpdate interrupted
Starting rpcbind.
NFS access cache time=2
ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/X11R6/lib /usr/local/lib
a.out ldconfig path: /usr/lib/aout /usr/lib/compat/aout /usr/X11R6/lib/aout
Starting mountd.
Starting nfsd.
Starting statd.
Starting lockd.
Jun  8 00:58:41 rz2 rpc.lockd: 100024 RPC: Port mapper failure
Starting usbd.
Starting local daemons:.
Updating motd.
Starting ntpd.
Starting rwhod.
Configuring syscons: blanktime screensaver.

  Here are the list of changed kernel files since the previous (working) 
kernel:

find . -newer /tmp/date
./alpha/alpha/busdma_machdep.c
./amd64/amd64/identcpu.c
./amd64/amd64/machdep.c
./conf/NOTES
./conf/files
./conf/options
./dev/ata/ata-chipset.c
./dev/ata/ata-pci.h
./dev/bktr/bktr_reg.h
./dev/sound/pci/ich.c
./kern/uipc_socket.c
./kern/kern_event.c
./kern/sysv_shm.c
./kern/subr_unit.c
./modules/netgraph/device
./modules/netgraph/device/Makefile
./modules/netgraph/Makefile.inc
./modules/udbp/Makefile
./netgraph/bluetooth/drivers/ubt
./netgraph/bluetooth/drivers/ubt/ng_ubt.c
./netgraph/ng_eiface.c
./netgraph/netgraph.h
./netgraph/ng_base.c
./netgraph/ng_iface.c
./netgraph/ng_device.c
./netgraph/ng_ksocket.c
./netinet/tcp_subr.c
./netinet/ip_divert.c
./netinet/ip_icmp.c
./netinet/ip_icmp.h
./netinet6/ipsec.c
./sys/systm.h

  Any ideas? Also, if there is additional information needed, let me know
what to do and I'll report back.

        Terry Kennedy             http://www.tmk.com
        terry at tmk.com             New York, NY USA


More information about the freebsd-stable mailing list