rdump stuck in sbwait state (RELENG_7)

Terry Kennedy terry at tmk.com
Tue Dec 30 02:03:25 UTC 2008


  I upgraded a box (Dell Poweredge 1550, dual PIII processors) from a kernel +
world of December 8th to one from today (December 29th) and I am experiencing
a new problem with rdump.

  The symptom is that rdump stops sending data to the remote system. It is
responsive to ^T and can be aborted with ^C. Here's the ^T status on the
sending box (the aforementioned Dell RELENG_7 system):

  DUMP: dumping (Pass IV) [regular files]
  DUMP: 20.49% done, finished in 0:19 at Mon Dec 29 19:58:57 2008
  DUMP: 38.00% done, finished in 0:16 at Mon Dec 29 20:00:52 2008
  DUMP: 55.45% done, finished in 0:12 at Mon Dec 29 20:01:37 2008
load: 0.00  cmd: rdump 1493 [sbwait] 2.32u 11.25s 0% 34616k
load: 0.00  cmd: rdump 1493 [sbwait] 2.32u 11.25s 0% 34616k
load: 0.00  cmd: rdump 1495 [pause] 2.37u 11.25s 0% 34616k
load: 0.00  cmd: rdump 1492 [running] 2.46u 4.89s 0% 34800k
load: 0.00  cmd: rdump 1494 [pause] 2.30u 11.22s 0% 34616k
load: 0.00  cmd: rdump 1492 [running] 2.46u 4.89s 0% 34800k
load: 0.00  cmd: rdump 1492 [running] 2.46u 4.89s 0% 34800k
load: 0.00  cmd: rdump 1495 [pause] 2.37u 11.25s 0% 34616k
load: 0.00  cmd: rdump 1493 [sbwait] 2.32u 11.25s 0% 34616k
load: 0.00  cmd: rdump 1495 [pause] 2.37u 11.25s 0% 34616k
load: 0.00  cmd: rdump 1492 [sbwait] 2.46u 4.89s 0% 34800k
load: 0.02  cmd: rdump 1492 [running] 2.46u 4.89s 0% 34800k
load: 0.02  cmd: rdump 1492 [running] 2.46u 4.89s 0% 34800k
load: 0.02  cmd: rdump 1495 [pause] 2.37u 11.25s 0% 34616k
load: 0.02  cmd: rdump 1492 [running] 2.46u 4.89s 0% 34800k

  A tcpdump on both the sending and receiving systems shows no packets
between them from the rdump processes. However, I can rshell both ways
and get the expected output, so the link isn't down.

  ps shows the same thing as ^T. The sbwait process looks like this:

    0  1492  1489   0   4  0 36024 34808 sbwait I+    p0    0:07.35 rdump: /dev/amrd0s1f: pass 4: 69.66% done, finished in 0:08 at Mon Dec 29 20:01:53 2008 (rdump)

  and the status never changes.

  The remote (receiving) system is a HP DS10 running OpenVMS 8.3 with
MultiNet 5.1A as the TCP stack. Despite this being a rather rare envir-
onment, I haven't had any problems until this most recent kernel build.
I have a large number (over a dozen) other systems running a variety
of releases (6.4, 7.0, 7.1-PRERELEASE) which can do this same dump oper-
ation without difficulty.

  I have the offending dump process still in this stuck state, so I can
generate whatever sort of debugging information is needed. The box is a
test box, so I can crash it and get a core dump if that's what is needed.

        Terry Kennedy             http://www.tmk.com
        terry at tmk.com             New York, NY USA


More information about the freebsd-stable mailing list