dump hangs on 7.1

Mel Flynn mel.flynn+fbsd.questions at mailing.thruhere.net
Sun Jul 12 23:56:19 UTC 2009


On Sunday 12 July 2009 13:20:49 Len Conrad wrote:
> At 04:04 PM 7/12/2009, you wrote:
> >On Sunday 12 July 2009 11:03:00 Len Conrad wrote:
> >> >On Friday 10 July 2009 08:29:01 Len Conrad wrote:
> >> >> FreeBSD 7.1-RELEASE #0: Thu Jan  1 14:37:25 UTC 2009
> >> >> root at logan.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  i386
> >> >>
> >> >> CPU: Intel(R) Xeon(R) CPU           E5420  @ 2.50GHz (2496.26-MHz
> >> >> 686-class CPU) Origin = "GenuineIntel"  Id = 0x1067a  Stepping = 10
> >> >>   AMD Features=0x20100000<NX,LM>
> >> >>   AMD Features2=0x1<LAHF>
> >> >>   Cores per package: 4
> >> >> real memory  = 3484745728 (3323 MB)
> >> >> avail memory = 3405537280 (3247 MB)
> >> >> ACPI APIC Table: <DELL   PE_SC3  >
> >> >> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
> >> >>  cpu0 (BSP): APIC ID:  0
> >> >>  cpu1 (AP): APIC ID:  1
> >> >>  cpu2 (AP): APIC ID:  2
> >> >>  cpu3 (AP): APIC ID:  3
> >> >>
> >> >>
> >> >> /sbin/dump -0uanL -f - / | ssh dump_images at xxx.net dd
> >> >> of=/var/ftp/dump_images/mx1-root-test
> >> >>
> >> >> dump has completed only once. Several other dumps have all gotten
> >> >> under way, target file is created and increases until the hang.
> >> >>
> >> >> CTRL-C gets back to shell,eg:
> >> >>
> >> >>   DUMP: Date of this level 0 dump: Fri Jul 10 10:25:33 2009
> >> >>   DUMP: Date of last level 0 dump: the epoch
> >> >>   DUMP: Dumping snapshot of /dev/da0s1d (/usr) to standard output
> >> >>   DUMP: mapping (Pass I) [regular files]
> >> >>   DUMP: mapping (Pass II) [directories]
> >> >>   DUMP: estimated 1713942 tape blocks.
> >> >>   DUMP: dumping (Pass III) [directories]
> >> >>   DUMP: dumping (Pass IV) [regular files]
> >> >> ^C  DUMP: Interrupt received.
> >> >>   DUMP: Do you want to abort dump?: ("yes" or "no") Killed by signal
> >> >> 2. DUMP: Broken pipe
> >> >>   DUMP: The ENTIRE dump is aborted.
> >> >>
> >> >> Hangs always in Pass IV
> >> >
> >> >What's the output ps -auwwx|grep dump at the time of the dump.
> >>
> >> when the dump hangs:
> >>
> >> ps auxww | grep dump
> >>
> >> root    61360  0.0  0.0  3128  1168  p0  I+    1:47PM   0:00.06
> >> /sbin/dump -0uanL -f - / (dump)
> >>
> >> root    61361  0.0  0.1  5560  2768  p0  I+    1:47PM   0:03.65 ssh
> >> xxx at xxx.net dd of=/var/ftp/dump_images/mx1-root-test
> >>
> >> root    61364  0.0  0.0  3128  1528  p0  I+    1:47PM   0:00.36 dump:
> >> /dev/da0s1a: pass 4: 92.66% done, finished in 0:00 at Sun Jul 12
> >> 13:47:52 2009 (dump)
> >
> >procstat -k 61364 please?
>
> I ran it again, diff pid:
>
> procstat -k 67765
>   PID    TID COMM             TDNAME           KSTACK
> 67765 100159 dump             -                mi_switch sleepq_switch
> sleepq_catch_signals sleepq_wait_sig _sleep sbwait soreceive_generic
> soreceive soo_read dofileread kern_readv read syscall Xint0x80_syscall

It looks like it's waiting ssh/dd to report. Is the same happening when you 
dump to a local file (on a different partition obviously)? This would rule out 
inter process communications within dump itself.

FYI, I'm using this daily through periodic with a few 7.1-STABLE machines and 
-current. Although, I do compress (with gzip and bzip2 on faster CPU's) before 
transfer. The only difference is that I don't use then -n flag to dump. Worth 
a try, though I doubt the so_receive it's waiting on is because it's unable to 
notify a human in the operator group.

If you're comfortable doing so, you could grab a 7.2-RELEASE livefs CD to see 
if this issue persists using the dump tools from there, though I don't know of 
any particular fixes in this area.

> >Is the percentage always the same for the same disk?
>
> no, it varies widely.
>
> >If you kill dd on the other side, does dump notice it?
>
> yes, I kill dd on the target, and the dump shows:
>
>   DUMP: dumping (Pass IV) [regular files]
> Terminated
>   DUMP: Broken pipe
>   DUMP: The ENTIRE dump is aborted.

-- 
Mel


More information about the freebsd-questions mailing list